Difference between revisions of "stoney conductor: VM Backup"

From stoney cloud
Jump to: navigation, search
[unchecked revision][unchecked revision]
(Snapshot)
(Communication through backend)
 
(123 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
= Overview =
 
= Overview =
This page describes how the VMs and VM-Templates are backed-up inside the [http://www.stoney-cloud.org stoney cloud].
+
This page describes how the VMs and VM-Templates are backed-up and restored inside the [http://www.stoney-cloud.org stoney cloud].
  
= Basic idea =
+
= Requirements =
 +
* sstBackupRootDirectory: file:///var/backup/virtualization
 +
** This directory might be a single partition which needs to have the same size as your partition for the live images (it's a "copy" of the live partition)
 +
* sstBackupRetainDirectory: file:///var/virtualization/retain
 +
** This directory must be on the same partition as your life images are
 +
* A working stoney cloud, installed according to [[stoney cloud: Single-Node Installation]] or [[stoney cloud: Multi-Node Installation]].
 +
* The backup configuration must be set: [[stoney_conductor:_OpenLDAP_directory_data_organisation#Backup | stoney conductor: OpenLDAP directory data organisation]].
 +
 
 +
= Backup =
 +
== Basic idea ==
 
The main idea to backup a VM or a VM-Template is, to divide the task into three subtasks:  
 
The main idea to backup a VM or a VM-Template is, to divide the task into three subtasks:  
* Snapshot: Save the machines state (CPU, Memory and Disk)
+
* createSnapshot: Create a disk only snapshot. A new overlay file is created, all write operations are performed to this file. The underlying disk-image is now read only.
* Merge: Merge the Disk-Snapshot with the live-image
+
* exportSnapshot: Copy the read only disk-image to the backup location.
* Retain: Export the snapshot files
+
* commitSnapshot: Commit the performed write operations from the overlay back to the underlying (original) disk image. Now the underlying image is read-write again and the overlay image can be deleted.
A more detailed and technical description for these three sub-processes can be found in the following sub-chapters.
+
A more detailed and technical description for these three sub-processes can be found [[#Sub-Processes | here]].
 +
 
 +
Furthermore there is an control instance, which can independently call these three sub-processes for a given machine. Like that, the stoney cloud is able to handle different cases:
 +
=== Backup a single machine ===
 +
The procedure for backing up a single machine is very simple. Just call the three sub-processes (create-, export- and commitSnapshot) one after the other. So the control instance would do some very basic stuff:
 +
<source lang="c">
 +
object machine = args[0];
 +
 
 +
if( createSsnapshot( machine ) )
 +
{
 +
 
 +
    if ( exportSnapshot( machine ) )
 +
    {
 +
 
 +
        if ( commitSnapshot( machine ) )
 +
        {
 +
            printf("Successfully backed up machine %s\n", machine);
 +
 
 +
        } else
 +
        {
 +
            printf("Error while committing snapshot for machine %s: %s\n", machine, error);
 +
        }
 +
 
 +
    } else
 +
    {
 +
        printf("Error while exporting snapshot for machine %s: %s\n", machine, error);
 +
    }
 +
 
 +
} else
 +
{
 +
    printf("Error while snapshotting machine %s: %s\n", machine, error);
 +
}
 +
</source>
 +
 
 +
=== Backup multiple machines at the same time ===
 +
When backing up multiple machines at the same time, we need to make sure that the snapshots for the machines are as close together as possible. Therefore the control instance should call first the createSnapshot process for all machines. After every machine has been snapshotted, the control instance can call the exportSnapshot and commitSnapshot process for every machine. The most important part here is, that the control instance somehow remembers, if the snapshot for a given machine was successful or not. Because if the snapshot failed, it must not call the exportSnapshot and commitSnapshot process. So the control instance needs a little bit more logic:
 +
 
 +
<source lang="c">
 +
object machines[] = args[0];
 +
object successful_snapshots[];
 +
 
 +
# Snapshot all machines
 +
for( int i = 0; i <  sizeof(machines) / sizeof(object) ; i++ )
 +
{
 +
    # If the snapshot was successful, put the machine into the
 +
    # successful_snapshots array
 +
    if ( createSnapshot( machines[i] ) )
 +
    {
 +
        successful_snapshots[machines[i]];
 +
    } else
 +
    {
 +
        printf("Error while snapshotting machine %s: %s\n", machines[i],error);
 +
    }
 +
}
 +
 
 +
# export and commit all successful_snapshot machines
 +
for ( int i = 0; i <  sizeof(successful_snapshots) / sizeof(object) ; i++ ) )
 +
{
 +
    # Check if the element at this position is not null, then the snapshot
 +
    # for this machine was successful
 +
    if ( successful_snapshots[i] )
 +
    {
 +
        if ( exportSnapshot( successful_snapshots[i] ) )
 +
        {
 +
            if ( commitSnapshot( successful_snapshots[i] ) )
 +
            {
 +
              printf("Successfully backed-up machine %s\n", successful_snapshots[i]);
 +
            } else
 +
            {
 +
                printf("Error while committing snapshot for machine %s: %s\n", successful_snapshots[i],error);
 +
            }
 +
 
 +
        } else
 +
        {
 +
            printf("Error while exporting snapshot for machine %s: %s\n", successful_snapshots[i],error);
 +
        }
 +
    }
 +
}
 +
</source>
 +
 
 +
=== Sub-Processes ===
 +
See also [[Libvirt_external_snapshot_with_GlusterFS]]
 +
==== createSnapshot ====
 +
For the commands see [[Libvirt_external_snapshot_with_GlusterFS#Part_2:_Create_the_snapshot_using_virsh]]
 +
 
 +
For the workflow see [[stoney_conductor:_prov-backup-kvm#createSnapshot]]
 +
 
 +
==== exportSnapshot ====
 +
# Simply copy the underlying image to the backup location
 +
#* <source lang="bash">cp -p /<path>/<to>/<image>.qcow2 /<path>/<to>/<backup>/<location>/.</source>
 +
 
 +
For the workflow see [[stoney_conductor:_prov-backup-kvm#exportSnapshot]]
 +
 
 +
==== commitSnapshot ====
 +
For the commands see [[Libvirt_external_snapshot_with_GlusterFS#Cleanup.2FCommit_.28Online.29]]
 +
 
 +
For the workflow see [[stoney_conductor:_prov-backup-kvm#commitSnapshot]]
 +
 
 +
== Communication through backend ==
 +
Since the stoney cloud is (as the name says already) a cloud solution, it makes sense to have a backend (in our case openLDAP) involved in the whole process. Like that it is possible to run the backup jobs decentralized on every vm-node. The control instance can then modify the backend, and theses changes are seen by the diffenrent backup daemons on the vm-nodes. So the communication could look like shown in the following picture (Figure 1):
 +
 
 +
[[File:Daemon-communication.png|800px|thumbnail|none|Figure 1: Communication between the control instance and the prov-backup-kvm daemon through the LDAP backend]]
 +
 
 +
You can modify/update this workflow by editing [[File:Daemon-communication.xmi]] (you may need [http://uml.sourceforge.net/ Umbrello UML Modeller] diagram programme for KDE to display the content properly).
 +
 
 +
=== Control-Instance Daemon Interaction for creating a Backup with LDIF Examples ===
 +
The step numbers correspond with the graphical overview from above.
 +
 
 +
==== Step 00: Backup Configuration for a virtual machine ====
 +
<pre>
 +
# The following backup configuration says, that the backup should be done daily, at 03:00 hours (localtime).
 +
# * * * * * command to be executed
 +
# - - - - -
 +
# | | | | |
 +
# | | | | +----- day of week (0 - 6) (Sunday=0)
 +
# | | | +------- month (1 - 12)
 +
# | | +--------- day of month (1 - 31)
 +
# | +----------- hour (0 - 23)
 +
# +------------- min (0 - 59)
 +
# localtime in the crontab entry
 +
dn: ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
objectclass: top
 +
objectclass: organizationalUnit
 +
objectclass: sstVirtualizationBackupObjectClass
 +
objectclass: sstCronObjectClass
 +
ou: backup
 +
description: This sub tree contains the backup plan for the virtual machine kvm-005.
 +
sstCronMinute: 0
 +
sstCronHour: 3
 +
sstCronDay: *
 +
sstCronMonth: *
 +
sstCronDayOfWeek: *
 +
sstCronActive: TRUE
 +
sstBackupRootDirectory: file:///var/backup/virtualization
 +
sstBackupRetainDirectory: file:///var/virtualization/retain
 +
sstBackupRamDiskLocation: file:///mnt/ramdisk-test
 +
sstVirtualizationDiskImageFormat: qcow2
 +
sstVirtualizationDiskImageOwner: root
 +
sstVirtualizationDiskImageGroup: vm-storage
 +
sstVirtualizationDiskImagePermission: 0660
 +
sstBackupNumberOfIterations: 1
 +
sstVirtualizationVirtualMachineForceStart: FALSE
 +
sstVirtualizationBandwidthMerge: 0
 +
</pre>
 +
 
 +
==== Step 01: Initialize Backup Sub Tree (Control instance daemon) ====
 +
The sub tree ''' ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch''' reflects the time, when the backup is planned (in the form of [YYYY][MM][DD]T[hh][mm][ss]Z ([http://en.wikipedia.org/wiki/ISO_8601 ISO 8601]) and it should be written at the time, when the backup is planned and should be executed. The section '''20121002T010000Z''' means the following:
 +
* Year: 2012
 +
* Month: 10
 +
* Day of Month: 02
 +
* Hour of Day: 01
 +
* Minutes: 00
 +
* Seconds: 00
 +
Please be aware the the time is to be written in UTC (see also the comment in the LDIF example below).
 +
<pre>
 +
# This entry is the place holder for the backup, which is to be executed at 03:00 hours (localtime with daylight-saving). This
 +
# leads to the 20121002T010000Z timestamp (which is written in UTC).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
objectclass: top
 +
objectclass: sstProvisioning
 +
objectclass: organizationalUnit
 +
ou: 20121002T010000Z
 +
sstProvisioningExecutionDate: 0
 +
sstProvisioningMode: initialize
 +
sstProvisioningReturnValue: 0
 +
sstProvisioningState: 20121002T014513Z
 +
</pre>
 +
 
 +
==== Step 02: Finalize the Initialization (Control instance daemon) ====
 +
<pre>
 +
# The attribute sstProvisioningState is updated with current time by the fc-brokerd, when sstProvisioningMode is modified.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T010001Z
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: initialized
 +
</pre>
 +
 
 +
==== Step 03: Start the Snapshot Process (Control instance daemon) ====
 +
With the setting of the '''sstProvisioningMode''' to '''snapshot''', the actual backup process is kicked off by the Control instance daemon.
 +
<pre>
 +
# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to
 +
# snapshot (this way the Provisioning-Backup-VKM daemon knows, that it must start the snapshotting process).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: snapshot
 +
</pre>
 +
 
 +
==== Step 04: Starting the Snapshot Process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon receives the snapshot command, it sets the '''sstProvisioningMode''' to '''snapshotting''' to tell the Control instance daemon and other interested parties, that it is snapshotting the virtual machine or virtual machine template.
 +
<pre>
 +
# The attribute sstProvisioningMode is set to snapshotting by the Provisioning-Backup-VKM daemon.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: snapshotting
 +
</pre>
 +
 
 +
==== Step 05: Finalizing the Snapshot Process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon has executed the snapshot command, it sets the '''sstProvisioningMode''' to '''snapshotted''', the '''sstProvisioningState''' to the current timestamp (UTC) and '''sstProvisioningReturnValue''' to zero to tell the Control instance daemon and other interested parties, that the snapshot of the virtual machine or virtual machine template is finished.
 +
<pre>
 +
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
 +
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
 +
# With this combination, the fc-brokerd knows, that it can proceed.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T010011Z
 +
-
 +
replace: sstProvisioningReturnValue
 +
sstProvisioningReturnValue: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: snapshotted
 +
</pre>
 +
 
 +
==== Step 06: Start the export Process (Control instance daemon) ====
 +
With the setting of the '''sstProvisioningMode''' to '''export''', the Control instance daemon tells the Provisioning-Backup-KVM daemon to export the disk image to the backup location.
 +
<pre>
 +
# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to
 +
# export (this way the Provisioning-Backup-VKM daemon knows, that it must start the export process).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: export
 +
</pre>
 +
 
 +
==== Step 07: Starting the export Process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon receives the export command, it sets the '''sstProvisioningMode''' to '''exporting''' to tell the Control instance daemon and other interested parties, that it is exporting the virtual machine or virtual machine template disk images.
 +
<pre>
 +
# The attribute sstProvisioningMode is set to exporting by the Provisioning-Backup-VKM daemon.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: exporting
 +
</pre>
 +
 
 +
==== Step 08: Finalizing the export Process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon has executed the export command, it sets the '''sstProvisioningMode''' to '''exported''', the '''sstProvisioningState''' to the current timestamp (UTC) and '''sstProvisioningReturnValue''' to zero to tell the Control instance daemon and other interested parties, that the export of the virtual machine or virtual machine template disk-images is finished.
 +
<pre>
 +
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
 +
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
 +
# With this combination, the fc-brokerd knows, that it can proceed.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T010500Z
 +
-
 +
replace: sstProvisioningReturnValue
 +
sstProvisioningReturnValue: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: exported
 +
</pre>
 +
 
 +
==== Step 09: Start the commit Process (Control instance daemon) ====
 +
With the setting of the '''sstProvisioningMode''' to '''commit''', the Control instance daemon tells the Provisioning-Backup-KVM daemon to commit the changes from the overlay file to the underlying disk-image
 +
<pre>
 +
# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to
 +
# commit (this way the Provisioning-Backup-VKM daemon knows, that it must start the commit process).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: commit
 +
</pre>
 +
 
 +
==== Step 10: Starting the commit Process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon receives the commit command, it sets the '''sstProvisioningMode''' to '''comitting''' to tell the Control instance daemon and other interested parties, that it is committing changes from the overlay disk-images back to the underlying ones.
 +
<pre>
 +
# The attribute sstProvisioningMode is set to comitting by the Provisioning-Backup-VKM daemon.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: committing
 +
</pre>
 +
 
 +
==== Step 11: Finalizing the commit Process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon has executed the commit command, it sets the '''sstProvisioningMode''' to '''comitted''', the '''sstProvisioningState''' to the current timestamp (UTC) and '''sstProvisioningReturnValue''' to zero to tell the Control instance daemon and other interested parties, that the comitting of the changes from the overlay disk-images back to the underlying ones is done.
 +
<pre>
 +
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
 +
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
 +
# With this combination, the fc-brokerd knows, that it can proceed.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T012000Z
 +
-
 +
replace: sstProvisioningReturnValue
 +
sstProvisioningReturnValue: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: comitted
 +
</pre>
 +
 
 +
==== Step 12: Finalizing the Backup Process (Control instance daemon) ====
 +
As soon as the Control instance daemon notices, that the attribute '''sstProvisioningMode''' ist set to '''committed''', it sets the '''sstProvisioningMode''' to '''finished''' and the '''sstProvisioningState''' to the current timestamp (UTC). All interested parties now know, that the backup process is finished, there for a new backup process could be started.
 +
<pre>
 +
# The attribute sstProvisioningState is updated with current time by the fc-brokerd, when sstProvisioningMode is
 +
# set to finished.
 +
# All interested parties now know, that the backup process is finished, there for a new backup process could be started.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T012001Z
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: finished
 +
</pre>
 +
 
 +
== Current Implementation (Backup) ==
 +
Since we do not have a working control instance, we need to have a workaround for backing up the machines:
 +
 
 +
* We do already have a BackupKVMWrapper.pl script (File-Backend) which executes the three [[#Sub-Processes | sub-processes ]] in the correct order for a given list of machines (see [[#Backup multiple machines at the same_time]]).
 +
* We do already have the implementation for the whole backup with the LDAP-Backend (see [[ stoney conductor: prov backup kvm ]]).
 +
* We can now combine these two existing scripts and create a wrapper (lets call it LDAPKVMWrapper) which, in some way, adds some logic to the BackupKVMWrapper.pl. In fact the LDAPKVMWrapper wrapper will generate the list of machines which need a backup.
 +
 
 +
The behaviour on our servers is as follows (c.f. Figure 2):
 +
# The (decentralized) LDAPKVMWrapper wrapper (which is executed everyday via cronjob) generates a list off all machines running on the current host.
 +
#* Currently on the hosts the cronjobs looks like: <code>00 01 * * * /usr/bin/LDAPKVMWrapper.pl | logger -t Backup-KVM</code>
 +
#* For each of these machines:
 +
#** Check if the machine is excluded from the backup, if yes, remove the machine from the list
 +
#** Check if the last backup was successful, if not, remove the machine from the list
 +
# Update the backup subtree for each machine in the list
 +
#* Remove the old backup leaf (the "yesterday-leaf"), and add a new one (the "today-leaf")
 +
#* After this step, the machines are ready to be backed up
 +
# Call the KVMBackupWrapper.pl script with the machines list as a parameter
 +
# Wait for the KVMBackupWrapper.pl script to finish
 +
# Go again through all machines and update the backup subtree a last time
 +
#* Check if the backup was successful, if yes, set sstProvisioningMode = finished (see also TBD)
 +
 
 +
 
 +
[[File:wrapper-interaction.png|650px|thumbnail|none|Figure 2: How the two wrapper interact with the LDAP backend]]
 +
 
 +
You can modify/update this workflow by editing [[File:wrapper-interaction.xmi]] (you may need [http://uml.sourceforge.net/ Umbrello UML Modeller] diagram programme for KDE to display the content properly).
 +
 
 +
* If for some reason something does not work at all, the whole backup process can be deactivated by simply disabling the LDAPKVMWrapper cronjob
 +
** <code>crontab -e</code>
 +
** Comment the LDAPKVMWrapper cronjob line: <code>#00 01 * * * /usr/bin/LDAPKVMWrapper.pl | logger -t Backup-KVM</code>
 +
=== How to exclude a machine from the backup ===
 +
Login to one of the [[VM-Node | vm-nodes]] and execute the following command
 +
 
 +
If you want to exclude a machine from the backup run you simply need to add the following entry to your LDAP directory:
 +
<source lang="bash">
 +
machineuuid="<UUID OF THE MACHINE-NAME>" # e.g.: b9d13dbc-9ab7-4948-9daa-a5709de83dc2
 +
cat << EOF | ldapadd -D cn=Manager,o=stepping-stone,c=ch -H ldaps://ldapm.stepping-stone.ch/ -W -x
 +
dn: ou=backup,sstVirtualMachine=${machineuuid},ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
objectclass: top
 +
objectclass: organizationalUnit
 +
objectclass: sstVirtualizationBackupObjectClass
 +
ou: backup
 +
sstbackupexcludefrombackup: TRUE
 +
EOF
 +
</source>
 +
 
 +
If the backup subtree in the LDAP directory already exists, you need to add the sstbackupexcludefrombackup attribute:
 +
<source lang="bash">
 +
machineuuid="<UUID OF THE MACHINE-NAME>" # e.g.: b9d13dbc-9ab7-4948-9daa-a5709de83dc2
 +
cat << EOF | ldapadd -D cn=Manager,o=stepping-stone,c=ch -H ldaps://ldapm.stepping-stone.ch/ -W -x
 +
dn: ou=backup,sstVirtualMachine=${machineuuid},ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
add: objectClass
 +
objectClass: sstVirtualizationBackupObjectClass
 +
-
 +
add: sstbackupexcludefrombackup
 +
sstbackupexcludefrombackup: TRUE
 +
EOF
 +
</source>
 +
 
 +
==== Re-include the machine to the backup ====
 +
If you want to re include a machine, simply delete the machines whole backup subtree. It will be recreated during the next backup run.
 +
 
 +
== Next steps ==
 +
 
 +
= Restore =
 +
'''Attention:''' The restore process is not yet defined / nor implemented. The following documentation is about the old restore process.
 +
== Basic idea ==
 +
The restore process, similar to the backup process, can be divided into three sub-processes:
 +
* Unretain the small files: Copy the small files (backend entry, XML description) from the backup directory to the retain directory
 +
* Unretain the big files: Copy the big files (state file, disk image(s)) form the backup directory to the retain directory
 +
* Restore the machine: Replace the live disk image(s) by the one(s) from the backup and restore the machine from the state file
 +
 
 +
Additionally the restore process can also be divided into two phases:
 +
* User-Interaction phase: After the "unretain small files" the user needs to decide two things:
 +
** On conflicts between the backend entry file and the XML description, the user need to decide how to resolve this conflict(s)
 +
** The user can also abort the restore process up to this point. After that the restore can not be aborted or undone!
 +
* Non-User-Interaction phase: The daemons communicate through the backend between each other and the restore process continues without further user input (c.f. [[#Communication_through_backend_2 | Communication through backend]])
 +
 
 +
=== Sub Processes ===
 +
==== Unretain small files ====
 +
This workflow assumes that the backup directory is on the same physical server as the retain directory (protocol is file://)
 +
# Copy the backend-entry file from the backup directory to the retain directory:
 +
#* <source lang="bash">cp -p /path/to/backup/vm-001.backend /path/to/retain/vm-001.backend</source>
 +
# Copy the XML description from the from the backup directory to the retain directory:
 +
#* <source lang="bash">cp -p /path/to/backup/vm-001.xml /path/to/retain/vm-001.xml</source>
 +
# Compare the backend-entry file (the one in the retain directory) with the live-backend entry
 +
#* Resolve all conflicts between these two backend entries
 +
#** Modify the backend entry at the retain location accordingly
 +
# Apply the same changes for the XML description at the retain location (backend entry and XML description need to be consistent).
 +
 
 +
==== Unretain large files ====
 +
# Copy the state file from the backup directory to the retain directory:
 +
#* <source lang="bash">cp -p /path/to/backup/vm-001.state /path/to/retain/vm-001.state</source>
 +
# Copy the disk image(s) from the backup directory to the retain directory:
 +
#* <source lang="bash">cp -p /path/to/backup/vm-001.qcow2 /path/to/retain/vm-001.qcow2</source>
 +
#** '''Important:''' If a VM has more than just one disk image, repeat this step for every disk image
 +
 
 +
==== Restore the VM ====
 +
# Shutdown the VM if it is running:
 +
#* <source lang="bash">virsh shutdown vm-001</source>
 +
# Undefine the VM if it is still defined:
 +
#* <source lang="bash">virsh undefine vm-001</source>
 +
# Overwrite the original disk image:
 +
#* <source lang="bash">mv /path/to/retain/vm-001.qcow2 /path/to/images/vm-001.qcow2</source>
 +
#** '''Important:''' If a VM has more than just one disk image, repeat this step for every disk image
 +
# Restore the VMs backend entry:
 +
#* Write the backend entry from the retain location (<code>/path/to/retain/vm-001.backend</code>) to the backend
 +
# Overwrite the VMs XML description with the one from the retain location
 +
#* <source lang="bash">cp -p /path/to/retain/vm-001.xml /path/to/xmls/vm-001.xml</source>
 +
# Restore the VM from the state file with the corrected XML
 +
#* <source lang="bash">virsh restore /path/to/retain/vm-001.state --xml /path/to/xmls/vm-001.xml</source>
 +
 
 +
== Communication through backend ==
 +
The actual KVM-Restore process is controlled completely by the Control instance daemon via the OpenLDAP directory. See [[#OpenLDAP Directory Integration|OpenLDAP Directory Integration]] the involved attributes and possible values.
 +
 
 +
[[File:Daemon-interaction-restore.png|thumb|650px|none|Figure 3: Communication between all involved parties during the restore process]]
 +
 
 +
You can modify/update these interactions by editing [[File:Restore-Interaction.xmi]] (you may need [http://uml.sourceforge.net/ Umbrello UML Modeller] diagram programme for KDE to display the content properly).
 +
 
 +
=== Control instance Daemon Interaction for restoring a Backup with LDIF Examples ===
 +
==== Step 01: Start the unretainSmallFiles process (Control instance daemon) ====
 +
The first step of the restore process is to copy the small files (in this case the XML file and the LDIF) from the configured backup location to the configured retain location.
 +
 
 +
<pre>
 +
# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to
 +
# unretainSmallFiles (this way the Provisioning-Backup-VKM daemon knows, that it must start the unretainSmallFiles process).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: unretainSmallFiles
 +
</pre>
 +
 
 +
==== Step 02: Starting the unretainSmallFiles process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon receives the command to unretain the small files, it sets the '''sstProvisioningMode''' to '''unretainingSmallFiles''' to tell the Control instance daemon and other interested parties, that it is unretaining the small files for the virtual machine or virtual machine template.
 +
<pre>
 +
# The attribute sstProvisioningMode is set to unretainingSmallFiles by the Provisioning-Backup-VKM daemon.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: unretainingSmallFiles
 +
</pre>
 +
 
 +
==== Step 03: Finalizing the unretainSmallFiles process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon has executed the commands to unretain the small files, it sets the '''sstProvisioningMode''' to '''unretainedSmallFiles''', the '''sstProvisioningState''' to the current timestamp (UTC) and '''sstProvisioningReturnValue''' to zero to tell the Control instance daemon and other interested parties, that the unretaining of all the small files from the configured backup location is finished.
 +
<pre>
 +
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
 +
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
 +
# With this combination, the Control instance daemon knows, that it can proceed.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T012000Z
 +
-
 +
replace: sstProvisioningReturnValue
 +
sstProvisioningReturnValue: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: unretainedSmallFiles
 +
</pre>
 +
 
 +
==== Step 05: Start the unretainLargeFiles process (Control instance daemon) ====
 +
Next step in the restore process is to copy the large files (state file and disk images) from the configured backup directory to the configured retain directory.
 +
 
 +
<pre>
 +
# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to
 +
# unretainLargeFiles (this way the Provisioning-Backup-VKM daemon knows, that it must start the unretainLargeFiles process).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: unretainLargeFiles
 +
</pre>
 +
 
 +
==== Step 06: Starting the unretainLargeFiles process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon receives the command to unretain the large files, it sets the '''sstProvisioningMode''' to '''unretainingLargeFiles''' to tell the Control instance daemon and other interested parties, that it is unretaining the large files for the virtual machine or virtual machine template.
 +
 
 +
In the meantime the vm-manager merges the LDIF we have unretained in [[#Step_02:_Starting_the_unretainSmallFiles_process_.28Provisioning-Backup-KVM_daemon.29 | step 02]] with the one in the live directory to sort out possible differences in the configuration of the virtual machine. 
 +
<pre>
 +
# The attribute sstProvisioningMode is set to unretainingSmallFiles by the Provisioning-Backup-VKM daemon.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: unretainingLargeFiles
 +
</pre>
 +
 
 +
==== Step 07: Finalizing the unretainLargeFiles process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon has executed the commands to unretain the large files, it sets the '''sstProvisioningMode''' to '''unretainedLargeFiles''', the '''sstProvisioningState''' to the current timestamp (UTC) and '''sstProvisioningReturnValue''' to zero to tell the Control instance daemon and other interested parties, that the unretaining of all the large files from the configured backup location is finished.
 +
<pre>
 +
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
 +
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
 +
# With this combination, the Control instance daemon knows, that it can proceed.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T012000Z
 +
-
 +
replace: sstProvisioningReturnValue
 +
sstProvisioningReturnValue: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: unretainedLargeFiles
 +
</pre>
 +
 
 +
==== Step 09: Start the restore process (Control instance daemon) ====
 +
Since we now have all necessary files in the configured retain location, the restore process can be started. There we simply copy the disk images back to their original location and restore the VM from the state file (which is also at the configured retain location)
 +
 
 +
<pre>
 +
# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to
 +
# restore (this way the Provisioning-Backup-VKM daemon knows, that it must start the restore process).
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: restore
 +
</pre>
 +
 
 +
==== Step 10: Starting the restore process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon receives the restore command, it sets the '''sstProvisioningMode''' to '''restoring''' to tell the Control instance daemon and other interested parties, that it is restoring the virtual machine or virtual machine template.
 +
<pre>
 +
# The attribute sstProvisioningMode is set to restoring by the Provisioning-Backup-VKM daemon.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: restoring
 +
</pre>
 +
 
 +
==== Step 11: Finalizing the restore process (Provisioning-Backup-KVM daemon) ====
 +
As soon as the Provisioning-Backup-KVM daemon has executed the restore command, it sets the '''sstProvisioningMode''' to '''restored''', the '''sstProvisioningState''' to the current timestamp (UTC) and '''sstProvisioningReturnValue''' to zero to tell the Control instance daemon and other interested parties, that the restore process is finished.
 +
<pre>
 +
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
 +
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
 +
# With this combination, the Control instance daemon knows, that it can proceed.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T012000Z
 +
-
 +
replace: sstProvisioningReturnValue
 +
sstProvisioningReturnValue: 0
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: restored
 +
</pre>
 +
 
 +
==== Step 12: Finalizing the restore process (Control instance daemon) ====
 +
As soon as the Control instance daemon notices, that the attribute '''sstProvisioningMode''' ist set to '''restored''', it sets the '''sstProvisioningMode''' to '''finished''' and the '''sstProvisioningState''' to the current timestamp (UTC). All interested parties now know, that the restore process is finished.
 +
<pre>
 +
# The attribute sstProvisioningState is updated with current time by the Control instance daemon, when sstProvisioningMode is
 +
# set to finished.
 +
# All interested parties now know, that the restore process is finished.
 +
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
 +
changetype: modify
 +
replace: sstProvisioningState
 +
sstProvisioningState: 20121002T012001Z
 +
-
 +
replace: sstProvisioningMode
 +
sstProvisioningMode: finished
 +
</pre>
 +
 
 +
== Current Implementation (Restore) ==
 +
'''Attention''': The restore process is not yet defined / nor implemented. The following documentation is about the old restore process.
 +
 
 +
 
 +
* Since the prov-backup-kvm daemon is not running on the vm-nodes (c.f. [[stoney_conductor:_Backup#Current_Implementation_.28Backup.29]]), the restore process does not work when clicking the icon in the webinterface.
 +
 
 +
=== How to manually restore a machine from backup ===
 +
'''Important''': Before you continue with this guide, make sure that you have no other possibility to restore the machine. It might be easier and safer to get lost files from the online backup if the machine has one set up.
 +
 
 +
If you really have to restore the machine from the backup:
 +
# Stop the machine from via the [https://cloud.stepping-stone.ch/vm-manager/ web interface]
 +
# Login (as root) on the [[VM-Node]] the machine was running on
 +
 
 +
As a first step, you would like to set some useful bash variables to be able to copy paste the following guide:
 +
 
 +
'''Double check all variables you are setting here. If one is not correct, you will restore a running machine or overwrite a live-disk image!'''
 +
<source lang='bash'>
 +
machinename="<MACHINE-NAME>" # For example: machinename="b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6"
 +
vmpool="<VM-POOL>" # For example vmpool="0f83f084-8080-413e-b558-b678e504836e"
 +
vmtype="<VM-TYPE>" # For example vmtype="vm-persistent"
 +
</source>
 +
Change to the backup directory for the given machine and check the iterations:
 +
<source lang='bash'>
 +
cd /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}
 +
ls -al
 +
</source>
 +
Change into the most recent iteration
 +
<source lang='bash'>
 +
cd 2014...
 +
ls -al
 +
</source>
 +
In there you should have:
 +
* The state file <MACHINE-NAME>.state.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.state.20140109T134445Z)
 +
* The XML description <MACHINE-NAME>.xml.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.xml.20140109T134445Z)
 +
* The ldif file <MACHINE-NAME>.ldif.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.ldif.20140109T134445Z)
 +
* And at least one disk image <DISK-IMAGE>.qcow2.<BACKUP-DATE> (for example 8798561b-d5de-471b-a6fc-ec2b4831ed12.qcow2.20140109T134445Z)
 +
Now you should save the backup date and the disk image(s) in a variable
 +
<source lang='bash'>
 +
backupdate="<BACKUP-DATE>" # For example: backupdate="20140109T134445Z"
 +
diskimage1="<DISK-IMAGE-1>.qcow2" # For example: diskimage1="8798561b-d5de-471b-a6fc-ec2b4831ed12.qcow2"
 +
diskimage2="<DISK-IMAGE-2>.qcow2" # For example: diskimage2="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.qcow2"
 +
...
 +
</source>
 +
 
 +
Have again a look at the different variables and '''double check them again'''
 +
<source lang='bash'>
 +
echo "Machine Name = ${machinename}"
 +
echo "VM Pool = ${vmpool}"
 +
echo "VM Type = ${vmtype}"
 +
echo "Backup date = ${backupdate}"
 +
echo "Disk Image 1 = ${diskimage1}"
 +
echo "Disk Image 2 = ${diskimage2}"
 +
...
 +
</source>
 +
 
 +
Copy all these files to the retain location:
 +
<source lang='bash'>
 +
currentdate=`date --utc +'%Y%m%dT%H%M%SZ'`
 +
mkdir -p /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}
 +
cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${machinename}.ldif.${backupdate} /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/
 +
</source>
 +
 
 +
<!--Check if there is a difference between the current XML file and the one from the backup
 +
<source lang='bash'>
 +
diff -Naur /etc/libvirt/qemu/${machinename}.xml /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.xml.${backupdate}
 +
</source>
 +
and '''edit the file at the retain location''' according to your needs.-->
 +
 
 +
''' Now you are entering the critical part. You won't be able to undo the following steps'''
 +
 
 +
Check if there is a difference between the current LDAP entry and the one from the backup
 +
<source lang='bash'>
 +
domain="<DOMAIN>" # For example domain="stoney-cloud.org"
 +
ldapbase="<LDAPBASE>" # For expample ldapbase="dc=stoney-cloud,dc=org"
 +
ldapsearch -H ldaps://ldapm.${domain} -b "sstVirtualMachine=${machinename},ou=virtual machines,ou=virtualization,ou=services,${ldapbase}" -s sub -x -LLL -o ldif-wrap=no -D "cn=Manager,${ldapbase}" -W  "(objectclass=*)" > /tmp/${machinename}.ldif
 +
diff -Naur /tmp/${machinename}.ldif /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}
 +
</source>
 +
and '''edit the file at the retain location''' according to your needs.
 +
 
 +
If there are no differences (or the differences are not important) you can skip the following step. Otherwise use the [https://cloud.stepping-stone.ch/phpldapadmin PhpLdapAdmin] to delete the machine from the LDAP directory (do not forget to delete the dhcp entry <code>dn: cn=<MACHINE-NAME>,ou=virtual machines,cn=192.168.140.0,cn=config-01,ou=dhcp,ou=networks,ou=virtualization,ou=services,dc=stoney-cloud,dc=org</code>). Then add the LDIF (the one you just edited) to the LDAP (first do some general replacement)
 +
<source lang='bash'>
 +
sed -i\
 +
-e 's/snapshotting/finished/'\
 +
-e '/member.*/d'\
 +
/var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}
  
== Snapshot ==
+
/usr/bin/ldapadd -H "ldaps://ldapm.${domain}" -x -D "cn=Manager,${ldapbase}" -W -f /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}
# Create a snapshot with state:
+
</source>
#* If the VM <code>vm-001</code> is running:
+
#** Save the state of VM <code>vm-001</code> to the file <code>vm-001.state</code> (This file can either be created on a RAM-Disk or directly in the retain location. This example however saves the file to a RAM-Disk): <syntaxhighlight lang="bash">virsh save vm-001 /path/to/ram-disk/vm-001.state</syntaxhighlight>
+
#** After this command, the VMs CPU and memory state is represented by the file <code>/path/to/ram-disk/vm-001.state</code> and the VM <code>vm-001</code> is shut down.
+
#* If the VM <code>vm-001</code> is shut down:  
+
#** Create a fake state file for the VM: <syntaxhighlight lang="bash">echo "Machine is not runnung, no state file" > /path/to/ram-disk/vm-001.state</syntaxhighlight>
+
# Move the disk image <code>/path/to/images/vm-001.qcow2</code> to the retain location: <syntaxhighlight lang="bash">mv /path/to/images/vm-001.qcow2 /path/to/retain/vm-001.qcow2</syntaxhighlight>
+
#* '''Please note:''' The retain directory (<code>/path/to/retain/</code>) '''has to be''' on the same partition as the images directory (<code>/path/to/images/</code>). This will make the <code>mv</code> operation very fast (only renaming the inode). So the downtime (remember the VM <code>vm-001</code> is shut down) is as short as possible.
+
# Create the new (empty) disk image with the old as backing store file: <syntaxhighlight lang="bash">qemu-img create -f qcow2 -b /path/to/retain/vm-001.qcow2 /path/to/images/vm-001.qcow2</syntaxhighlight>
+
#* <syntaxhighlight lang="bash">chmod 660 /path/to/images/vm-001.qcow2</syntaxhighlight>
+
# Set correct ownership and permission to the newly created image:
+
#* <syntaxhighlight lang="bash">chown root:vm-storage /path/to/images/vm-001.qcow2</syntaxhighlight>
+
# Save the VMs XML description
+
#* Save the current XML description of VM <code>vm-001</code> to a file at the retain location: <syntaxhighlight lang="bash">virsh dumpxml vm-001 > /path/to/retain/vm-001.xml</syntaxhighlight>
+
# Save the backend entry
+
#* There is no generic command to save the backend entry (since the command depends on the backend). Important here is, that the backend entry of the VM <code>vm-001</code> is saved to the retain location: <code>/path/to/retain/vm-001.backend</code>
+
# Restore the VMs <code>vm-001</code> from its saved state (this will also start the VM): <syntaxhighlight lang="bash">virsh restore /path/to/ram-disk/vm-001.state</syntaxhighlight>
+
#* '''Please note:''' After this operation the VM <code>vm-001</code> is running again (continues where we stopped it), and we have a consistent backup for the VM <code>vm-001</code>:
+
#** The file <code>/path/to/ram-disk/vm-001.state</code> contains the CPU and memory state of VM <code>vm-001</code> at time T<sub>1</sub>
+
#** The file <code>/path/to/retain/vm-001.qcow2</code> contains the disk state of VM <code>vm-001</code> at time T<sub>1</sub>
+
#** The file <code>/path/to/retain/vm-001.xml</code> contains the XML description of VM <code>vm-001</code> at time T<sub>1</sub>
+
#** The file <code>/path/to/retain/vm-001.backend</code> contains the backend entry of VM <code>vm-001</code> at time T<sub>1</sub>
+
# Move the state file from the RAM-Disk to the retain location (if you used the RAM-Disk to save the VMs state)
+
#* <syntaxhighlight lang="bash">mv /path/to/ram-disk/vm-001.state /path/to/retain/vm-001.state</syntaxhighlight>
+
  
 +
Undefine the machine
 +
<source lang='bash'>
 +
virsh undefine ${machinename}
 +
</source>
  
 +
Copy all the disk images from the backup location back to their original location
 +
<source lang='bash'>
 +
cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${diskimage1}.${backupdate} /var/virtualization/${vmtype}/${vmpool}/${diskimage1}
 +
cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${diskimage2}.${backupdate} /var/virtualization/${vmtype}/${vmpool}/${diskimage2}
 +
...
 +
</source>
  
See also: [[stoney_conductor:_prov_backup_kvm#Snapshot | Snapshot workflow ]]
+
And restore the domain from the state file from the backup location with the XML from the retain location (the one you might have edited)
 +
<source lang='bash'>
 +
virsh restore /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${machinename}.state.${backupdate}
 +
</source>
  
== Merge ==
+
Now the machine should be up and running again. Continuing where it was stopped when taking the backup.
# Merge the disk images my-vm.qcow2 and my-vm-snap.qcow2 to a single image: <code>virsh qemu-monitor-command my-vm --hmp "block_stream drive-virtio-disk0"</code>.
+
  
== Retain ==
+
If everything is OK, you can cleanup the created files and directories
# Move the files to the backup location:
+
<source lang='bash'>
#* Move the old disk image to the backup location and add the date as suffix to not overwrite older backups: <code>mv my-vm-backup.qcow2 /path/to/backup/my-vm-backup.qcow2.date</code>
+
rm -rf /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}
#* Move the state file to the backup location and add the date as suffix to not overwrite older backups: <code>mv my-vm.state /path/to/backup/my-vm.state.date</code>
+
rm /tmp/${machinename}.ldif
 +
</source>
  
= State of the art =
+
== Next steps ==
  
= Next steps =
 
  
 
[[Category: stoney conductor]]
 
[[Category: stoney conductor]]

Latest revision as of 15:43, 27 June 2014

Contents

Overview

This page describes how the VMs and VM-Templates are backed-up and restored inside the stoney cloud.

Requirements

Backup

Basic idea

The main idea to backup a VM or a VM-Template is, to divide the task into three subtasks:

  • createSnapshot: Create a disk only snapshot. A new overlay file is created, all write operations are performed to this file. The underlying disk-image is now read only.
  • exportSnapshot: Copy the read only disk-image to the backup location.
  • commitSnapshot: Commit the performed write operations from the overlay back to the underlying (original) disk image. Now the underlying image is read-write again and the overlay image can be deleted.

A more detailed and technical description for these three sub-processes can be found here.

Furthermore there is an control instance, which can independently call these three sub-processes for a given machine. Like that, the stoney cloud is able to handle different cases:

Backup a single machine

The procedure for backing up a single machine is very simple. Just call the three sub-processes (create-, export- and commitSnapshot) one after the other. So the control instance would do some very basic stuff:

object machine = args[0];
 
if( createSsnapshot( machine ) )
{
 
    if ( exportSnapshot( machine ) )
    {
 
        if ( commitSnapshot( machine ) )
        {
            printf("Successfully backed up machine %s\n", machine);
 
        } else
        {
            printf("Error while committing snapshot for machine %s: %s\n", machine, error);
        }
 
    } else
    {
        printf("Error while exporting snapshot for machine %s: %s\n", machine, error);
    }
 
} else
{
    printf("Error while snapshotting machine %s: %s\n", machine, error);
}

Backup multiple machines at the same time

When backing up multiple machines at the same time, we need to make sure that the snapshots for the machines are as close together as possible. Therefore the control instance should call first the createSnapshot process for all machines. After every machine has been snapshotted, the control instance can call the exportSnapshot and commitSnapshot process for every machine. The most important part here is, that the control instance somehow remembers, if the snapshot for a given machine was successful or not. Because if the snapshot failed, it must not call the exportSnapshot and commitSnapshot process. So the control instance needs a little bit more logic:

object machines[] = args[0];
object successful_snapshots[];
 
# Snapshot all machines
for( int i = 0; i <  sizeof(machines) / sizeof(object) ; i++ )
{
    # If the snapshot was successful, put the machine into the 
    # successful_snapshots array
    if ( createSnapshot( machines[i] ) )
    {
        successful_snapshots[machines[i]];
    } else
    {
        printf("Error while snapshotting machine %s: %s\n", machines[i],error);
    }
}
 
# export and commit all successful_snapshot machines
for ( int i = 0; i <  sizeof(successful_snapshots) / sizeof(object) ; i++ ) )
{
    # Check if the element at this position is not null, then the snapshot 
    # for this machine was successful
    if ( successful_snapshots[i] )
    {
        if ( exportSnapshot( successful_snapshots[i] ) )
        {
            if ( commitSnapshot( successful_snapshots[i] ) )
            {
              printf("Successfully backed-up machine %s\n", successful_snapshots[i]);
            } else
            {
                printf("Error while committing snapshot for machine %s: %s\n", successful_snapshots[i],error);
            }
 
        } else
        {
            printf("Error while exporting snapshot for machine %s: %s\n", successful_snapshots[i],error);
        }
    }
}

Sub-Processes

See also Libvirt_external_snapshot_with_GlusterFS

createSnapshot

For the commands see Libvirt_external_snapshot_with_GlusterFS#Part_2:_Create_the_snapshot_using_virsh

For the workflow see stoney_conductor:_prov-backup-kvm#createSnapshot

exportSnapshot

  1. Simply copy the underlying image to the backup location
    • cp -p /<path>/<to>/<image>.qcow2 /<path>/<to>/<backup>/<location>/.

For the workflow see stoney_conductor:_prov-backup-kvm#exportSnapshot

commitSnapshot

For the commands see Libvirt_external_snapshot_with_GlusterFS#Cleanup.2FCommit_.28Online.29

For the workflow see stoney_conductor:_prov-backup-kvm#commitSnapshot

Communication through backend

Since the stoney cloud is (as the name says already) a cloud solution, it makes sense to have a backend (in our case openLDAP) involved in the whole process. Like that it is possible to run the backup jobs decentralized on every vm-node. The control instance can then modify the backend, and theses changes are seen by the diffenrent backup daemons on the vm-nodes. So the communication could look like shown in the following picture (Figure 1):

Figure 1: Communication between the control instance and the prov-backup-kvm daemon through the LDAP backend

You can modify/update this workflow by editing File:Daemon-communication.xmi (you may need Umbrello UML Modeller diagram programme for KDE to display the content properly).

Control-Instance Daemon Interaction for creating a Backup with LDIF Examples

The step numbers correspond with the graphical overview from above.

Step 00: Backup Configuration for a virtual machine

# The following backup configuration says, that the backup should be done daily, at 03:00 hours (localtime).
# * * * * * command to be executed
# - - - - -
# | | | | |
# | | | | +----- day of week (0 - 6) (Sunday=0)
# | | | +------- month (1 - 12)
# | | +--------- day of month (1 - 31)
# | +----------- hour (0 - 23)
# +------------- min (0 - 59)
# localtime in the crontab entry
dn: ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
objectclass: top
objectclass: organizationalUnit
objectclass: sstVirtualizationBackupObjectClass
objectclass: sstCronObjectClass
ou: backup
description: This sub tree contains the backup plan for the virtual machine kvm-005.
sstCronMinute: 0
sstCronHour: 3
sstCronDay: *
sstCronMonth: *
sstCronDayOfWeek: *
sstCronActive: TRUE
sstBackupRootDirectory: file:///var/backup/virtualization
sstBackupRetainDirectory: file:///var/virtualization/retain
sstBackupRamDiskLocation: file:///mnt/ramdisk-test
sstVirtualizationDiskImageFormat: qcow2
sstVirtualizationDiskImageOwner: root
sstVirtualizationDiskImageGroup: vm-storage
sstVirtualizationDiskImagePermission: 0660
sstBackupNumberOfIterations: 1
sstVirtualizationVirtualMachineForceStart: FALSE
sstVirtualizationBandwidthMerge: 0

Step 01: Initialize Backup Sub Tree (Control instance daemon)

The sub tree ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch reflects the time, when the backup is planned (in the form of [YYYY][MM][DD]T[hh][mm][ss]Z (ISO 8601) and it should be written at the time, when the backup is planned and should be executed. The section 20121002T010000Z means the following:

  • Year: 2012
  • Month: 10
  • Day of Month: 02
  • Hour of Day: 01
  • Minutes: 00
  • Seconds: 00

Please be aware the the time is to be written in UTC (see also the comment in the LDIF example below).

# This entry is the place holder for the backup, which is to be executed at 03:00 hours (localtime with daylight-saving). This
# leads to the 20121002T010000Z timestamp (which is written in UTC).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
objectclass: top
objectclass: sstProvisioning
objectclass: organizationalUnit
ou: 20121002T010000Z
sstProvisioningExecutionDate: 0
sstProvisioningMode: initialize
sstProvisioningReturnValue: 0
sstProvisioningState: 20121002T014513Z

Step 02: Finalize the Initialization (Control instance daemon)

# The attribute sstProvisioningState is updated with current time by the fc-brokerd, when sstProvisioningMode is modified.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T010001Z
-
replace: sstProvisioningMode
sstProvisioningMode: initialized

Step 03: Start the Snapshot Process (Control instance daemon)

With the setting of the sstProvisioningMode to snapshot, the actual backup process is kicked off by the Control instance daemon.

# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to
# snapshot (this way the Provisioning-Backup-VKM daemon knows, that it must start the snapshotting process).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 0
-
replace: sstProvisioningMode
sstProvisioningMode: snapshot

Step 04: Starting the Snapshot Process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon receives the snapshot command, it sets the sstProvisioningMode to snapshotting to tell the Control instance daemon and other interested parties, that it is snapshotting the virtual machine or virtual machine template.

# The attribute sstProvisioningMode is set to snapshotting by the Provisioning-Backup-VKM daemon.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningMode
sstProvisioningMode: snapshotting

Step 05: Finalizing the Snapshot Process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon has executed the snapshot command, it sets the sstProvisioningMode to snapshotted, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the snapshot of the virtual machine or virtual machine template is finished.

# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
# With this combination, the fc-brokerd knows, that it can proceed.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T010011Z
-
replace: sstProvisioningReturnValue
sstProvisioningReturnValue: 0
-
replace: sstProvisioningMode
sstProvisioningMode: snapshotted

Step 06: Start the export Process (Control instance daemon)

With the setting of the sstProvisioningMode to export, the Control instance daemon tells the Provisioning-Backup-KVM daemon to export the disk image to the backup location.

# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to
# export (this way the Provisioning-Backup-VKM daemon knows, that it must start the export process).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 0
-
replace: sstProvisioningMode
sstProvisioningMode: export

Step 07: Starting the export Process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon receives the export command, it sets the sstProvisioningMode to exporting to tell the Control instance daemon and other interested parties, that it is exporting the virtual machine or virtual machine template disk images.

# The attribute sstProvisioningMode is set to exporting by the Provisioning-Backup-VKM daemon.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningMode
sstProvisioningMode: exporting

Step 08: Finalizing the export Process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon has executed the export command, it sets the sstProvisioningMode to exported, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the export of the virtual machine or virtual machine template disk-images is finished.

# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
# With this combination, the fc-brokerd knows, that it can proceed.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T010500Z
-
replace: sstProvisioningReturnValue
sstProvisioningReturnValue: 0
-
replace: sstProvisioningMode
sstProvisioningMode: exported

Step 09: Start the commit Process (Control instance daemon)

With the setting of the sstProvisioningMode to commit, the Control instance daemon tells the Provisioning-Backup-KVM daemon to commit the changes from the overlay file to the underlying disk-image

# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to
# commit (this way the Provisioning-Backup-VKM daemon knows, that it must start the commit process).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 0
-
replace: sstProvisioningMode
sstProvisioningMode: commit

Step 10: Starting the commit Process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon receives the commit command, it sets the sstProvisioningMode to comitting to tell the Control instance daemon and other interested parties, that it is committing changes from the overlay disk-images back to the underlying ones.

# The attribute sstProvisioningMode is set to comitting by the Provisioning-Backup-VKM daemon.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningMode
sstProvisioningMode: committing

Step 11: Finalizing the commit Process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon has executed the commit command, it sets the sstProvisioningMode to comitted, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the comitting of the changes from the overlay disk-images back to the underlying ones is done.

# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
# With this combination, the fc-brokerd knows, that it can proceed.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T012000Z
-
replace: sstProvisioningReturnValue
sstProvisioningReturnValue: 0
-
replace: sstProvisioningMode
sstProvisioningMode: comitted

Step 12: Finalizing the Backup Process (Control instance daemon)

As soon as the Control instance daemon notices, that the attribute sstProvisioningMode ist set to committed, it sets the sstProvisioningMode to finished and the sstProvisioningState to the current timestamp (UTC). All interested parties now know, that the backup process is finished, there for a new backup process could be started.

# The attribute sstProvisioningState is updated with current time by the fc-brokerd, when sstProvisioningMode is
# set to finished.
# All interested parties now know, that the backup process is finished, there for a new backup process could be started.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T012001Z
-
replace: sstProvisioningMode
sstProvisioningMode: finished

Current Implementation (Backup)

Since we do not have a working control instance, we need to have a workaround for backing up the machines:

  • We do already have a BackupKVMWrapper.pl script (File-Backend) which executes the three sub-processes in the correct order for a given list of machines (see #Backup multiple machines at the same_time).
  • We do already have the implementation for the whole backup with the LDAP-Backend (see stoney conductor: prov backup kvm ).
  • We can now combine these two existing scripts and create a wrapper (lets call it LDAPKVMWrapper) which, in some way, adds some logic to the BackupKVMWrapper.pl. In fact the LDAPKVMWrapper wrapper will generate the list of machines which need a backup.

The behaviour on our servers is as follows (c.f. Figure 2):

  1. The (decentralized) LDAPKVMWrapper wrapper (which is executed everyday via cronjob) generates a list off all machines running on the current host.
    • Currently on the hosts the cronjobs looks like: 00 01 * * * /usr/bin/LDAPKVMWrapper.pl | logger -t Backup-KVM
    • For each of these machines:
      • Check if the machine is excluded from the backup, if yes, remove the machine from the list
      • Check if the last backup was successful, if not, remove the machine from the list
  2. Update the backup subtree for each machine in the list
    • Remove the old backup leaf (the "yesterday-leaf"), and add a new one (the "today-leaf")
    • After this step, the machines are ready to be backed up
  3. Call the KVMBackupWrapper.pl script with the machines list as a parameter
  4. Wait for the KVMBackupWrapper.pl script to finish
  5. Go again through all machines and update the backup subtree a last time
    • Check if the backup was successful, if yes, set sstProvisioningMode = finished (see also TBD)


Figure 2: How the two wrapper interact with the LDAP backend

You can modify/update this workflow by editing File:wrapper-interaction.xmi (you may need Umbrello UML Modeller diagram programme for KDE to display the content properly).

  • If for some reason something does not work at all, the whole backup process can be deactivated by simply disabling the LDAPKVMWrapper cronjob
    • crontab -e
    • Comment the LDAPKVMWrapper cronjob line: #00 01 * * * /usr/bin/LDAPKVMWrapper.pl | logger -t Backup-KVM

How to exclude a machine from the backup

Login to one of the vm-nodes and execute the following command

If you want to exclude a machine from the backup run you simply need to add the following entry to your LDAP directory:

machineuuid="<UUID OF THE MACHINE-NAME>" # e.g.: b9d13dbc-9ab7-4948-9daa-a5709de83dc2
cat << EOF | ldapadd -D cn=Manager,o=stepping-stone,c=ch -H ldaps://ldapm.stepping-stone.ch/ -W -x
dn: ou=backup,sstVirtualMachine=${machineuuid},ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
objectclass: top
objectclass: organizationalUnit
objectclass: sstVirtualizationBackupObjectClass
ou: backup
sstbackupexcludefrombackup: TRUE
EOF

If the backup subtree in the LDAP directory already exists, you need to add the sstbackupexcludefrombackup attribute:

machineuuid="<UUID OF THE MACHINE-NAME>" # e.g.: b9d13dbc-9ab7-4948-9daa-a5709de83dc2
cat << EOF | ldapadd -D cn=Manager,o=stepping-stone,c=ch -H ldaps://ldapm.stepping-stone.ch/ -W -x
dn: ou=backup,sstVirtualMachine=${machineuuid},ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
add: objectClass
objectClass: sstVirtualizationBackupObjectClass
-
add: sstbackupexcludefrombackup
sstbackupexcludefrombackup: TRUE
EOF

Re-include the machine to the backup

If you want to re include a machine, simply delete the machines whole backup subtree. It will be recreated during the next backup run.

Next steps

Restore

Attention: The restore process is not yet defined / nor implemented. The following documentation is about the old restore process.

Basic idea

The restore process, similar to the backup process, can be divided into three sub-processes:

  • Unretain the small files: Copy the small files (backend entry, XML description) from the backup directory to the retain directory
  • Unretain the big files: Copy the big files (state file, disk image(s)) form the backup directory to the retain directory
  • Restore the machine: Replace the live disk image(s) by the one(s) from the backup and restore the machine from the state file

Additionally the restore process can also be divided into two phases:

  • User-Interaction phase: After the "unretain small files" the user needs to decide two things:
    • On conflicts between the backend entry file and the XML description, the user need to decide how to resolve this conflict(s)
    • The user can also abort the restore process up to this point. After that the restore can not be aborted or undone!
  • Non-User-Interaction phase: The daemons communicate through the backend between each other and the restore process continues without further user input (c.f. Communication through backend)

Sub Processes

Unretain small files

This workflow assumes that the backup directory is on the same physical server as the retain directory (protocol is file://)

  1. Copy the backend-entry file from the backup directory to the retain directory:
    • cp -p /path/to/backup/vm-001.backend /path/to/retain/vm-001.backend
  2. Copy the XML description from the from the backup directory to the retain directory:
    • cp -p /path/to/backup/vm-001.xml /path/to/retain/vm-001.xml
  3. Compare the backend-entry file (the one in the retain directory) with the live-backend entry
    • Resolve all conflicts between these two backend entries
      • Modify the backend entry at the retain location accordingly
  4. Apply the same changes for the XML description at the retain location (backend entry and XML description need to be consistent).

Unretain large files

  1. Copy the state file from the backup directory to the retain directory:
    • cp -p /path/to/backup/vm-001.state /path/to/retain/vm-001.state
  2. Copy the disk image(s) from the backup directory to the retain directory:
    • cp -p /path/to/backup/vm-001.qcow2 /path/to/retain/vm-001.qcow2
      • Important: If a VM has more than just one disk image, repeat this step for every disk image

Restore the VM

  1. Shutdown the VM if it is running:
    • virsh shutdown vm-001
  2. Undefine the VM if it is still defined:
    • virsh undefine vm-001
  3. Overwrite the original disk image:
    • mv /path/to/retain/vm-001.qcow2 /path/to/images/vm-001.qcow2
      • Important: If a VM has more than just one disk image, repeat this step for every disk image
  4. Restore the VMs backend entry:
    • Write the backend entry from the retain location (/path/to/retain/vm-001.backend) to the backend
  5. Overwrite the VMs XML description with the one from the retain location
    • cp -p /path/to/retain/vm-001.xml /path/to/xmls/vm-001.xml
  6. Restore the VM from the state file with the corrected XML
    • virsh restore /path/to/retain/vm-001.state --xml /path/to/xmls/vm-001.xml

Communication through backend

The actual KVM-Restore process is controlled completely by the Control instance daemon via the OpenLDAP directory. See OpenLDAP Directory Integration the involved attributes and possible values.

Figure 3: Communication between all involved parties during the restore process

You can modify/update these interactions by editing File:Restore-Interaction.xmi (you may need Umbrello UML Modeller diagram programme for KDE to display the content properly).

Control instance Daemon Interaction for restoring a Backup with LDIF Examples

Step 01: Start the unretainSmallFiles process (Control instance daemon)

The first step of the restore process is to copy the small files (in this case the XML file and the LDIF) from the configured backup location to the configured retain location.

# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to 
# unretainSmallFiles (this way the Provisioning-Backup-VKM daemon knows, that it must start the unretainSmallFiles process).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 0
-
replace: sstProvisioningMode
sstProvisioningMode: unretainSmallFiles

Step 02: Starting the unretainSmallFiles process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon receives the command to unretain the small files, it sets the sstProvisioningMode to unretainingSmallFiles to tell the Control instance daemon and other interested parties, that it is unretaining the small files for the virtual machine or virtual machine template.

# The attribute sstProvisioningMode is set to unretainingSmallFiles by the Provisioning-Backup-VKM daemon.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningMode
sstProvisioningMode: unretainingSmallFiles

Step 03: Finalizing the unretainSmallFiles process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon has executed the commands to unretain the small files, it sets the sstProvisioningMode to unretainedSmallFiles, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the unretaining of all the small files from the configured backup location is finished.

# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
# With this combination, the Control instance daemon knows, that it can proceed.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T012000Z
-
replace: sstProvisioningReturnValue
sstProvisioningReturnValue: 0
-
replace: sstProvisioningMode
sstProvisioningMode: unretainedSmallFiles

Step 05: Start the unretainLargeFiles process (Control instance daemon)

Next step in the restore process is to copy the large files (state file and disk images) from the configured backup directory to the configured retain directory.

# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to 
# unretainLargeFiles (this way the Provisioning-Backup-VKM daemon knows, that it must start the unretainLargeFiles process).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 0
-
replace: sstProvisioningMode
sstProvisioningMode: unretainLargeFiles

Step 06: Starting the unretainLargeFiles process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon receives the command to unretain the large files, it sets the sstProvisioningMode to unretainingLargeFiles to tell the Control instance daemon and other interested parties, that it is unretaining the large files for the virtual machine or virtual machine template.

In the meantime the vm-manager merges the LDIF we have unretained in step 02 with the one in the live directory to sort out possible differences in the configuration of the virtual machine.

# The attribute sstProvisioningMode is set to unretainingSmallFiles by the Provisioning-Backup-VKM daemon.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningMode
sstProvisioningMode: unretainingLargeFiles

Step 07: Finalizing the unretainLargeFiles process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon has executed the commands to unretain the large files, it sets the sstProvisioningMode to unretainedLargeFiles, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the unretaining of all the large files from the configured backup location is finished.

# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
# With this combination, the Control instance daemon knows, that it can proceed.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T012000Z
-
replace: sstProvisioningReturnValue
sstProvisioningReturnValue: 0
-
replace: sstProvisioningMode
sstProvisioningMode: unretainedLargeFiles

Step 09: Start the restore process (Control instance daemon)

Since we now have all necessary files in the configured retain location, the restore process can be started. There we simply copy the disk images back to their original location and restore the VM from the state file (which is also at the configured retain location)

# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to 
# restore (this way the Provisioning-Backup-VKM daemon knows, that it must start the restore process).
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 0
-
replace: sstProvisioningMode
sstProvisioningMode: restore

Step 10: Starting the restore process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon receives the restore command, it sets the sstProvisioningMode to restoring to tell the Control instance daemon and other interested parties, that it is restoring the virtual machine or virtual machine template.

# The attribute sstProvisioningMode is set to restoring by the Provisioning-Backup-VKM daemon.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningMode
sstProvisioningMode: restoring

Step 11: Finalizing the restore process (Provisioning-Backup-KVM daemon)

As soon as the Provisioning-Backup-KVM daemon has executed the restore command, it sets the sstProvisioningMode to restored, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the restore process is finished.

# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when
# the attributes sstProvisioningReturnValue and sstProvisioningMode are set.
# With this combination, the Control instance daemon knows, that it can proceed.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T012000Z
-
replace: sstProvisioningReturnValue
sstProvisioningReturnValue: 0
-
replace: sstProvisioningMode
sstProvisioningMode: restored

Step 12: Finalizing the restore process (Control instance daemon)

As soon as the Control instance daemon notices, that the attribute sstProvisioningMode ist set to restored, it sets the sstProvisioningMode to finished and the sstProvisioningState to the current timestamp (UTC). All interested parties now know, that the restore process is finished.

# The attribute sstProvisioningState is updated with current time by the Control instance daemon, when sstProvisioningMode is
# set to finished.
# All interested parties now know, that the restore process is finished.
dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch
changetype: modify
replace: sstProvisioningState
sstProvisioningState: 20121002T012001Z
-
replace: sstProvisioningMode
sstProvisioningMode: finished

Current Implementation (Restore)

Attention: The restore process is not yet defined / nor implemented. The following documentation is about the old restore process.


How to manually restore a machine from backup

Important: Before you continue with this guide, make sure that you have no other possibility to restore the machine. It might be easier and safer to get lost files from the online backup if the machine has one set up.

If you really have to restore the machine from the backup:

  1. Stop the machine from via the web interface
  2. Login (as root) on the VM-Node the machine was running on

As a first step, you would like to set some useful bash variables to be able to copy paste the following guide:

Double check all variables you are setting here. If one is not correct, you will restore a running machine or overwrite a live-disk image!

machinename="<MACHINE-NAME>" # For example: machinename="b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6"
vmpool="<VM-POOL>" # For example vmpool="0f83f084-8080-413e-b558-b678e504836e"
vmtype="<VM-TYPE>" # For example vmtype="vm-persistent"

Change to the backup directory for the given machine and check the iterations:

cd /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}
ls -al

Change into the most recent iteration

cd 2014...
ls -al

In there you should have:

  • The state file <MACHINE-NAME>.state.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.state.20140109T134445Z)
  • The XML description <MACHINE-NAME>.xml.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.xml.20140109T134445Z)
  • The ldif file <MACHINE-NAME>.ldif.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.ldif.20140109T134445Z)
  • And at least one disk image <DISK-IMAGE>.qcow2.<BACKUP-DATE> (for example 8798561b-d5de-471b-a6fc-ec2b4831ed12.qcow2.20140109T134445Z)

Now you should save the backup date and the disk image(s) in a variable

backupdate="<BACKUP-DATE>" # For example: backupdate="20140109T134445Z"
diskimage1="<DISK-IMAGE-1>.qcow2" # For example: diskimage1="8798561b-d5de-471b-a6fc-ec2b4831ed12.qcow2"
diskimage2="<DISK-IMAGE-2>.qcow2" # For example: diskimage2="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.qcow2"
...

Have again a look at the different variables and double check them again

echo "Machine Name = ${machinename}"
echo "VM Pool = ${vmpool}"
echo "VM Type = ${vmtype}"
echo "Backup date = ${backupdate}"
echo "Disk Image 1 = ${diskimage1}"
echo "Disk Image 2 = ${diskimage2}"
...

Copy all these files to the retain location:

currentdate=`date --utc +'%Y%m%dT%H%M%SZ'`
mkdir -p /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}
cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${machinename}.ldif.${backupdate} /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/


Now you are entering the critical part. You won't be able to undo the following steps

Check if there is a difference between the current LDAP entry and the one from the backup

domain="<DOMAIN>" # For example domain="stoney-cloud.org"
ldapbase="<LDAPBASE>" # For expample ldapbase="dc=stoney-cloud,dc=org"
ldapsearch -H ldaps://ldapm.${domain} -b "sstVirtualMachine=${machinename},ou=virtual machines,ou=virtualization,ou=services,${ldapbase}" -s sub -x -LLL -o ldif-wrap=no -D "cn=Manager,${ldapbase}" -W  "(objectclass=*)" > /tmp/${machinename}.ldif
diff -Naur /tmp/${machinename}.ldif /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}

and edit the file at the retain location according to your needs.

If there are no differences (or the differences are not important) you can skip the following step. Otherwise use the PhpLdapAdmin to delete the machine from the LDAP directory (do not forget to delete the dhcp entry dn: cn=<MACHINE-NAME>,ou=virtual machines,cn=192.168.140.0,cn=config-01,ou=dhcp,ou=networks,ou=virtualization,ou=services,dc=stoney-cloud,dc=org). Then add the LDIF (the one you just edited) to the LDAP (first do some general replacement)

sed -i\
 -e 's/snapshotting/finished/'\
 -e '/member.*/d'\
 /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}
 
/usr/bin/ldapadd -H "ldaps://ldapm.${domain}" -x -D "cn=Manager,${ldapbase}" -W -f /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}

Undefine the machine

virsh undefine ${machinename}

Copy all the disk images from the backup location back to their original location

cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${diskimage1}.${backupdate} /var/virtualization/${vmtype}/${vmpool}/${diskimage1}
cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${diskimage2}.${backupdate} /var/virtualization/${vmtype}/${vmpool}/${diskimage2}
...

And restore the domain from the state file from the backup location with the XML from the retain location (the one you might have edited)

virsh restore /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${machinename}.state.${backupdate}

Now the machine should be up and running again. Continuing where it was stopped when taking the backup.

If everything is OK, you can cleanup the created files and directories

rm -rf /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}
rm /tmp/${machinename}.ldif

Next steps