Difference between revisions of "stoney conductor: VM Backup"
[unchecked revision] | [unchecked revision] |
(→How to manually restore a machine from backup) |
(→How to manually restore a machine from backup) |
||
Line 706: | Line 706: | ||
domain="<DOMAIN>" # For example domain="stoney-cloud.org" | domain="<DOMAIN>" # For example domain="stoney-cloud.org" | ||
ldapbase="<LDAPBASE>" # For expample ldapbase="dc=stoney-cloud,dc=org" | ldapbase="<LDAPBASE>" # For expample ldapbase="dc=stoney-cloud,dc=org" | ||
− | ldapsearch -H ldaps://ldapm.${domain} -b "sstVirtualMachine=${machinename},ou=virtual machines,ou=virtualization,ou=services,${ldapbase}" -s sub -x -LLL -D "cn=Manager,${ldapbase}" -W "(objectclass=*)" > /tmp/${machinename}.ldif | + | ldapsearch -H ldaps://ldapm.${domain} -b "sstVirtualMachine=${machinename},ou=virtual machines,ou=virtualization,ou=services,${ldapbase}" -s sub -x -LLL -o ldif-wrap=no -D "cn=Manager,${ldapbase}" -W "(objectclass=*)" > /tmp/${machinename}.ldif |
diff -Naur /tmp/${machinename}.ldif /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate} | diff -Naur /tmp/${machinename}.ldif /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate} | ||
</source> | </source> | ||
and '''edit the file at the retain location''' according to your needs. | and '''edit the file at the retain location''' according to your needs. | ||
− | If there are no differences (or the differences are not important) you can skip the following step. Otherwise use the [https://cloud.stepping-stone.ch/phpldapadmin PhpLdapAdmin] to delete the machine from the LDAP directory (do not forget to delete the dhcp entry <code>dn: cn=<MACHINE-NAME>,ou=virtual machines,cn=192.168.140.0,cn=config-01,ou=dhcp,ou=networks,ou=virtualization,ou=services,dc=stoney-cloud,dc=org</code>). Then add the LDIF (the one you just edited) to the LDAP | + | If there are no differences (or the differences are not important) you can skip the following step. Otherwise use the [https://cloud.stepping-stone.ch/phpldapadmin PhpLdapAdmin] to delete the machine from the LDAP directory (do not forget to delete the dhcp entry <code>dn: cn=<MACHINE-NAME>,ou=virtual machines,cn=192.168.140.0,cn=config-01,ou=dhcp,ou=networks,ou=virtualization,ou=services,dc=stoney-cloud,dc=org</code>). Then add the LDIF (the one you just edited) to the LDAP (first do some general replacement) |
<source lang='bash'> | <source lang='bash'> | ||
/usr/bin/ldapadd -H "ldaps://ldapm.${domain}" -x -D "cn=Manager,${ldapbase}" -W -f /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate} | /usr/bin/ldapadd -H "ldaps://ldapm.${domain}" -x -D "cn=Manager,${ldapbase}" -W -f /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate} |
Revision as of 14:50, 10 January 2014
Contents
- 1 Overview
- 2 Requirements
- 3 Backup
- 3.1 Basic idea
- 3.2 Communication through backend
- 3.2.1 Control-Instance Daemon Interaction for creating a Backup with LDIF Examples
- 3.2.1.1 Step 00: Backup Configuration for a virtual machine
- 3.2.1.2 Step 01: Initialize Backup Sub Tree (Control instance daemon)
- 3.2.1.3 Step 02: Finalize the Initialization (Control instance daemon)
- 3.2.1.4 Step 03: Start the Snapshot Process (Control instance daemon)
- 3.2.1.5 Step 04: Starting the Snapshot Process (Provisioning-Backup-KVM daemon)
- 3.2.1.6 Step 05: Finalizing the Snapshot Process (Provisioning-Backup-KVM daemon)
- 3.2.1.7 Step 06: Start the Merge Process (Control instance daemon)
- 3.2.1.8 Step 07: Starting the Merge Process (Provisioning-Backup-KVM daemon)
- 3.2.1.9 Step 08: Finalizing the Merging Process (Provisioning-Backup-KVM daemon)
- 3.2.1.10 Step 09: Start the Retain Process (Control instance daemon)
- 3.2.1.11 Step 10: Starting the Retain Process (Provisioning-Backup-KVM daemon)
- 3.2.1.12 Step 11: Finalizing the Retaing Process (Provisioning-Backup-KVM daemon)
- 3.2.1.13 Step 12: Finalizing the Backup Process (Control instance daemon)
- 3.2.1 Control-Instance Daemon Interaction for creating a Backup with LDIF Examples
- 3.3 Current Implementation (Backup)
- 3.4 Next steps
- 4 Restore
- 4.1 Basic idea
- 4.2 Communication through backend
- 4.2.1 Control instance Daemon Interaction for restoring a Backup with LDIF Examples
- 4.2.1.1 Step 01: Start the unretainSmallFiles process (Control instance daemon)
- 4.2.1.2 Step 02: Starting the unretainSmallFiles process (Provisioning-Backup-KVM daemon)
- 4.2.1.3 Step 03: Finalizing the unretainSmallFiles process (Provisioning-Backup-KVM daemon)
- 4.2.1.4 Step 05: Start the unretainLargeFiles process (Control instance daemon)
- 4.2.1.5 Step 06: Starting the unretainLargeFiles process (Provisioning-Backup-KVM daemon)
- 4.2.1.6 Step 07: Finalizing the unretainLargeFiles process (Provisioning-Backup-KVM daemon)
- 4.2.1.7 Step 09: Start the restore process (Control instance daemon)
- 4.2.1.8 Step 10: Starting the restore process (Provisioning-Backup-KVM daemon)
- 4.2.1.9 Step 11: Finalizing the restore process (Provisioning-Backup-KVM daemon)
- 4.2.1.10 Step 12: Finalizing the restore process (Control instance daemon)
- 4.2.1 Control instance Daemon Interaction for restoring a Backup with LDIF Examples
- 4.3 Current Implementation (Restore)
- 4.4 Next steps
Overview
This page describes how the VMs and VM-Templates are backed-up and restored inside the stoney cloud.
Requirements
TBD: PKL. Bitte bitte bei folgenden Attributen den Platzbedarf aufzeigen (wie kommt man auf die minimale Grösse)
- sstBackupRootDirectory: file:///var/backup/virtualization
- sstBackupRetainDirectory: file:///var/virtualization/retain
- sstBackupRamDiskLocation: file:///var/cache/kvmbackup
- A working stoney cloud, installed according to stoney cloud: Single-Node Installation or stoney cloud: Multi-Node Installation.
- The backup configuration must be set: stoney conductor: OpenLDAP directory data organisation.
Backup
Basic idea
The main idea to backup a VM or a VM-Template is, to divide the task into three subtasks:
- Snapshot: Save the machines state (CPU, Memory and Disk)
- Merge: Merge the Disk-Image-Snapshot with the Live-Image
- Retain: Export the snapshot files
A more detailed and technical description for these three sub-processes can be found here.
Furthermore there is an control instance, which can independently call these three sub-processes for a given machine. Like that, the stoney cloud is able to handle different cases:
Backup a single machine
The procedure for backing up a single machine is very simple. Just call the three sub-processes (snapshot, merge and retain) one after the other. So the control instance would do some very basic stuff:
object machine = args[0]; if( snapshot( machine ) ) { if ( merge( machine ) ) { if ( retain( machine ) ) { printf("Successfully backed up machine %s\n", machine); } else { printf("Error while retaining machine %s: %s\n", machine, error); } } else { printf("Error while merging machine %s: %s\n", machine, error); } } else { printf("Error while snapshotting machine %s: %s\n", machine, error); }
Backup multiple machines at the same time
When backing up multiple machines at the same time, we need to make sure that the downtime for the machines are as close together as possible. Therefore the control instance should call first the snapshot process for all machines. After every machine has been snapshotted, the control instance can call the merge and retain process for every machine. The most important part here is, that the control instance somehow remembers, if the snapshot for a given machine was successful or not. Because if the snapshot failed, it must not call the merge and retain process. So the control instance needs a little bit more logic:
object machines[] = args[0]; object successful_snapshots[]; # Snapshot all machines for( int i = 0; i < sizeof(machines) / sizeof(object) ; i++ ) { # If the snapshot was successful, put the machine into the # successful_snapshots array if ( snapshot( machines[i] ) ) { successful_snapshots[machines[i]]; } else { printf("Error while snapshotting machine %s: %s\n", machines[i],error); } } # Merge and reatin all successful_snapshot machines for ( int i = 0; i < sizeof(successful_snapshots) / sizeof(object) ; i++ ) ) { # Check if the element at this position is not null, then the snapshot # for this machine was successful if ( successful_snapshots[i] ) { if ( merge( successful_snapshots[i] ) ) { if ( retain( successful_snapshots[i] ) ) { printf("Successfully backed-up machine %s\n", successful_snapshots[i]); } else { printf("Error while retaining machine %s: %s\n", successful_snapshots[i],error); } } else { printf("Error while merging machine %s: %s\n", successful_snapshots[i],error); } } }
Sub-Processes
Snapshot
- Create a snapshot with state:
- If the VM
vm-001
is running:- Save the state of VM
vm-001
to the filevm-001.state
(This file can either be created on a RAM-Disk or directly in the retain location. This example however saves the file to a RAM-Disk):virsh save vm-001 /path/to/ram-disk/vm-001.state
- After this command, the VMs CPU and memory state is represented by the file
/path/to/ram-disk/vm-001.state
and the VMvm-001
is shut down.
- Save the state of VM
- If the VM
vm-001
is shut down:- Create a fake state file for the VM:
echo "Machine is not runnung, no state file" > /path/to/ram-disk/vm-001.state
- Create a fake state file for the VM:
- If the VM
- Move the disk image
/path/to/images/vm-001.qcow2
to the retain location:mv /path/to/images/vm-001.qcow2 /path/to/retain/vm-001.qcow2
- Please note: The retain directory (
/path/to/retain/
) has to be on the same partition as the images directory (/path/to/images/
). This will make themv
operation very fast (only renaming the inode). So the downtime (remember the VMvm-001
is shut down) is as short as possible. - Please note2: If the VM
vm-001
has more than just one disk-image, repeat this step for every disk-image
- Please note: The retain directory (
- Create the new (empty) disk image with the old as backing store file:
qemu-img create -f qcow2 -b /path/to/retain/vm-001.qcow2 /path/to/images/vm-001.qcow2
- Please note: If the VM
vm-001
has more than just one disk-image, repeat this step for every disk-image
- Please note: If the VM
- Set correct ownership and permission to the newly created image:
-
chmod 660 /path/to/images/vm-001.qcow2
-
chown root:vm-storage /path/to/images/vm-001.qcow2
- Please note: If the VM
vm-001
has more than just one disk-image, repeat these steps for every disk-image
-
- Save the VMs XML description
- Save the current XML description of VM
vm-001
to a file at the retain location:virsh dumpxml vm-001 > /path/to/retain/vm-001.xml
- Save the current XML description of VM
- Save the backend entry
- There is no generic command to save the backend entry (since the command depends on the backend). Important here is, that the backend entry of the VM
vm-001
is saved to the retain location:/path/to/retain/vm-001.backend
- There is no generic command to save the backend entry (since the command depends on the backend). Important here is, that the backend entry of the VM
- Restore the VMs
vm-001
from its saved state (this will also start the VM):virsh restore /path/to/ram-disk/vm-001.state
- Please note: After this operation the VM
vm-001
is running again (continues where we stopped it), and we have a consistent backup for the VMvm-001
:- The file
/path/to/ram-disk/vm-001.state
contains the CPU and memory state of VMvm-001
at time T1 - The file
/path/to/retain/vm-001.qcow2
contains the disk state of VMvm-001
at time T1- Important: Remember: The live-disk-image
/path/to/images/vm-001.qcow2
still contains a reference to this file!! So you cannot delete or move it!!!
- Important: Remember: The live-disk-image
- The file
/path/to/retain/vm-001.xml
contains the XML description of VMvm-001
at time T1 - The file
/path/to/retain/vm-001.backend
contains the backend entry of VMvm-001
at time T1
- The file
- Please note: After this operation the VM
- Move the state file from the RAM-Disk to the retain location (if you used the RAM-Disk to save the VMs state)
-
mv /path/to/ram-disk/vm-001.state /path/to/retain/vm-001.state
-
See also: Snapshot workflow
Merge
- Check if the VM
vm-001
is running- If not, start the VM in paused state:
virsh start --paused vm-001
- If not, start the VM in paused state:
- Merge the live-disk-image (
/path/to/images/vm-001.qcow2
) with its backing store file (/path/to/retain/vm-001.qcow2
):virsh qemu-monitor-command vm-001 --hmp "block_stream drive-virtio-disk0"
- Please note: If a VM has more than just one disk-image, repeat this step for every image. Just increase the number at the end of the command. So command to merge the second disk image would be:
virsh qemu-monitor-command vm-001 --hmp "block_stream drive-virtio-disk1"
- Please note: If a VM has more than just one disk-image, repeat this step for every image. Just increase the number at the end of the command. So command to merge the second disk image would be:
- If the machine is running in paused state (means we started it in 1. because it was not running), stop it again:
-
virsh shutdown vm-001
-
Please note: After these steps, the live-disk-image /path/to/image/vm-001.qcow2
no longer contains a reference to the image at the retain location (/path/to/retain/vm-001.qcow2
). This is important for the retain process.
See also: Merge workflow
Retain
- Move the all the files in from the retain directory (
/path/to/retain/
) to the backup directory (/path/to/backup/
)- Move the VMs state file to the backup directory
-
mv /path/to/retain/vm-001.state /path/to/backup/vm-001.state
-
- Move the VMs disk image to the backup directory
-
mv /path/to/retain/vm-001.qcow2 /path/to/backup/vm-001.qcow2
- Please note: If the VM
vm-001
has more than just one disk image, repeat this step for each disk image
- Please note: If the VM
-
- Move the VMs XML description file to the backup directory
-
mv /path/to/retain/vm-001.xml /path/to/backup/vm-001.xml
-
- Move the VMs backend entry file to the backup directory
-
mv /path/to/retain/vm-001.backend /path/to/backup/vm-001.backend
-
- Move the VMs state file to the backup directory
See also Retain workflow
Communication through backend
Since the stoney cloud is (as the name says already) a cloud solution, it makes sense to have a backend (in our case openLDAP) involved in the whole process. Like that it is possible to run the backup jobs decentralized on every vm-node. The control instance can then modify the backend, and theses changes are seen by the diffenrent backup daemons on the vm-nodes. So the communication could look like shown in the following picture (Figure 1):
Control-Instance Daemon Interaction for creating a Backup with LDIF Examples
The step numbers correspond with the graphical overview from above.
Step 00: Backup Configuration for a virtual machine
# The following backup configuration says, that the backup should be done daily, at 03:00 hours (localtime). # * * * * * command to be executed # - - - - - # | | | | | # | | | | +----- day of week (0 - 6) (Sunday=0) # | | | +------- month (1 - 12) # | | +--------- day of month (1 - 31) # | +----------- hour (0 - 23) # +------------- min (0 - 59) # localtime in the crontab entry dn: ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch objectclass: top objectclass: organizationalUnit objectclass: sstVirtualizationBackupObjectClass objectclass: sstCronObjectClass ou: backup description: This sub tree contains the backup plan for the virtual machine kvm-005. sstCronMinute: 0 sstCronHour: 3 sstCronDay: * sstCronMonth: * sstCronDayOfWeek: * sstCronActive: TRUE sstBackupRootDirectory: file:///var/backup/virtualization sstBackupRetainDirectory: file:///var/virtualization/retain sstBackupRamDiskLocation: file:///mnt/ramdisk-test sstVirtualizationDiskImageFormat: qcow2 sstVirtualizationDiskImageOwner: root sstVirtualizationDiskImageGroup: vm-storage sstVirtualizationDiskImagePermission: 0660 sstBackupNumberOfIterations: 1 sstVirtualizationVirtualMachineForceStart: FALSE sstVirtualizationBandwidthMerge: 0
Step 01: Initialize Backup Sub Tree (Control instance daemon)
The sub tree ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch reflects the time, when the backup is planned (in the form of [YYYY][MM][DD]T[hh][mm][ss]Z (ISO 8601) and it should be written at the time, when the backup is planned and should be executed. The section 20121002T010000Z means the following:
- Year: 2012
- Month: 10
- Day of Month: 02
- Hour of Day: 01
- Minutes: 00
- Seconds: 00
Please be aware the the time is to be written in UTC (see also the comment in the LDIF example below).
# This entry is the place holder for the backup, which is to be executed at 03:00 hours (localtime with daylight-saving). This # leads to the 20121002T010000Z timestamp (which is written in UTC). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch objectclass: top objectclass: sstProvisioning objectclass: organizationalUnit ou: 20121002T010000Z sstProvisioningExecutionDate: 0 sstProvisioningMode: initialize sstProvisioningReturnValue: 0 sstProvisioningState: 20121002T014513Z
Step 02: Finalize the Initialization (Control instance daemon)
# The attribute sstProvisioningState is updated with current time by the fc-brokerd, when sstProvisioningMode is modified. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T010001Z - replace: sstProvisioningMode sstProvisioningMode: initialized
Step 03: Start the Snapshot Process (Control instance daemon)
With the setting of the sstProvisioningMode to snapshot, the actual backup process is kicked off by the Control instance daemon.
# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to # snapshot (this way the Provisioning-Backup-VKM daemon knows, that it must start the snapshotting process). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 0 - replace: sstProvisioningMode sstProvisioningMode: snapshot
Step 04: Starting the Snapshot Process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon receives the snapshot command, it sets the sstProvisioningMode to snapshotting to tell the Control instance daemon and other interested parties, that it is snapshotting the virtual machine or virtual machine template.
# The attribute sstProvisioningMode is set to snapshotting by the Provisioning-Backup-VKM daemon. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningMode sstProvisioningMode: snapshotting
Step 05: Finalizing the Snapshot Process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon has executed the snapshot command, it sets the sstProvisioningMode to snapshotted, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the snapshot of the virtual machine or virtual machine template is finished.
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when # the attributes sstProvisioningReturnValue and sstProvisioningMode are set. # With this combination, the fc-brokerd knows, that it can proceed. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T010011Z - replace: sstProvisioningReturnValue sstProvisioningReturnValue: 0 - replace: sstProvisioningMode sstProvisioningMode: snapshotted
Step 06: Start the Merge Process (Control instance daemon)
With the setting of the sstProvisioningMode to merge, the Control instance daemon tells the Provisioning-Backup-KVM daemon to merge the backing file disk image back into the current disk image.
# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to # merge (this way the Provisioning-Backup-VKM daemon knows, that it must start the merging process). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 0 - replace: sstProvisioningMode sstProvisioningMode: merge
Step 07: Starting the Merge Process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon receives the merge command, it sets the sstProvisioningMode to merging to tell the Control instance daemon and other interested parties, that it is merging the virtual machine or virtual machine template.
# The attribute sstProvisioningMode is set to merging by the Provisioning-Backup-VKM daemon. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningMode sstProvisioningMode: merging
Step 08: Finalizing the Merging Process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon has executed the merge command, it sets the sstProvisioningMode to merged, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the merging of the virtual machine or virtual machine template is finished.
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when # the attributes sstProvisioningReturnValue and sstProvisioningMode are set. # With this combination, the fc-brokerd knows, that it can proceed. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T010500Z - replace: sstProvisioningReturnValue sstProvisioningReturnValue: 0 - replace: sstProvisioningMode sstProvisioningMode: merged
Step 09: Start the Retain Process (Control instance daemon)
With the setting of the sstProvisioningMode to retain, the Control instance daemon tells the Provisioning-Backup-KVM daemon to retain (copy and then delete) all the necessary files to the configured backup location.
# The attribute sstProvisioningState is set to zero by the fc-brokerd, when sstProvisioningMode is modified to # retain (this way the Provisioning-Backup-VKM daemon knows, that it must start the retaining process). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 0 - replace: sstProvisioningMode sstProvisioningMode: retain
Step 10: Starting the Retain Process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon receives the retain command, it sets the sstProvisioningMode to retaining to tell the Control instance daemon and other interested parties, that it is retaining the necessary files to the configured backup location.
# The attribute sstProvisioningMode is set to retaining by the Provisioning-Backup-VKM daemon. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningMode sstProvisioningMode: retaining
Step 11: Finalizing the Retaing Process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon has executed the retain command, it sets the sstProvisioningMode to retained, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the retaining of all the necessary files to the configured backup location is finished.
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when # the attributes sstProvisioningReturnValue and sstProvisioningMode are set. # With this combination, the fc-brokerd knows, that it can proceed. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T012000Z - replace: sstProvisioningReturnValue sstProvisioningReturnValue: 0 - replace: sstProvisioningMode sstProvisioningMode: retained
Step 12: Finalizing the Backup Process (Control instance daemon)
As soon as the Control instance daemon notices, that the attribute sstProvisioningMode ist set to retained, it sets the sstProvisioningMode to finished and the sstProvisioningState to the current timestamp (UTC). All interested parties now know, that the backup process is finished, there for a new backup process could be started.
# The attribute sstProvisioningState is updated with current time by the fc-brokerd, when sstProvisioningMode is # set to finished. # All interested parties now know, that the backup process is finished, there for a new backup process could be started. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T012001Z - replace: sstProvisioningMode sstProvisioningMode: finished
Current Implementation (Backup)
Since we do not have a working control instance, we need to have a workaround for backing up the machines:
- We do already have a BackupKVMWrapper.pl script (File-Backend) which executes the three sub-processes in the correct order for a given list of machines (see #Backup multiple machines at the same_time).
- We do already have the implementation for the whole backup with the LDAP-Backend (see stoney conductor: prov backup kvm ).
- We can now combine these two existing scripts and create a wrapper (lets call it KVMBackup) which, in some way, adds some logic to the BackupKVMWrapper.pl. In fact the KVMBackup wrapper will generate the list of machines which need a backup.
The behaviour on our servers is as follows (c.f. Figure 2):
- The (decentralized) KVMBackup wrapper (which is executed everyday via cronjob) generates a list off all machines running on the current host.
- For each of these machines:
- Check if the machine is excluded from the backup, if yes, remove the machine from the list
- Check if the last backup was successful, if not, remove the machine from the list
- For each of these machines:
- Update the backup subtree for each machine in the list
- Remove the old backup leaf (the "yesterday-leaf"), and add a new one (the "today-leaf")
- After this step, the machines are ready to be backed up
- Call the BackupKVMWrapper.pl script with the machines list as a parameter
- Wait for the BackupKVMWrapper.pl script to finish
- Go again through all machines and update the backup subtree a last time
- Check if the backup was successful, if yes, set sstProvisioningMode = finished (see also TBD)
How to exclude a machine from the backup
If you want to exclude a machine from the backup run you simply need to add the following entry to your LDAP directory:
dn: ou=backup,sstVirtualMachine=<MACHINE-NAME>,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch objectclass: top objectclass: organizationalUnit objectclass: sstVirtualizationBackupObjectClass ou: backup sstbackupexcludefrombackup: TRUE
If the backup subtree in the LDAP directory already exists, you need to add the sstbackupexcludefrombackup attribute:
dn: ou=backup,sstVirtualMachine=<MACHINE-NAME>,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: add objectclass: sstVirtualizationBackupObjectClass sstbackupexcludefrombackup: TRUE
Re-include the machine to the backup
If you want to re include a machine, simply delete the machines whole backup subtree. It will be recreated during the next backup run.
Next steps
Restore
Basic idea
The restore process, similar to the backup process, can be divided into three sub-processes:
- Unretain the small files: Copy the small files (backend entry, XML description) from the backup directory to the retain directory
- Unretain the big files: Copy the big files (state file, disk image(s)) form the backup directory to the retain directory
- Restore the machine: Replace the live disk image(s) by the one(s) from the backup and restore the machine from the state file
Additionally the restore process can also be divided into two phases:
- User-Interaction phase: After the "unretain small files" the user needs to decide two things:
- On conflicts between the backend entry file and the XML description, the user need to decide how to resolve this conflict(s)
- The user can also abort the restore process up to this point. After that the restore can not be aborted or undone!
- Non-User-Interaction phase: The daemons communicate through the backend between each other and the restore process continues without further user input (c.f. Communication through backend)
Sub Processes
Unretain small files
This workflow assumes that the backup directory is on the same physical server as the retain directory (protocol is file://)
- Copy the backend-entry file from the backup directory to the retain directory:
-
cp -p /path/to/backup/vm-001.backend /path/to/retain/vm-001.backend
-
- Copy the XML description from the from the backup directory to the retain directory:
-
cp -p /path/to/backup/vm-001.xml /path/to/retain/vm-001.xml
-
- Compare the backend-entry file (the one in the retain directory) with the live-backend entry
- Resolve all conflicts between these two backend entries
- Modify the backend entry at the retain location accordingly
- Resolve all conflicts between these two backend entries
- Apply the same changes for the XML description at the retain location (backend entry and XML description need to be consistent).
Unretain large files
- Copy the state file from the backup directory to the retain directory:
-
cp -p /path/to/backup/vm-001.state /path/to/retain/vm-001.state
-
- Copy the disk image(s) from the backup directory to the retain directory:
-
cp -p /path/to/backup/vm-001.qcow2 /path/to/retain/vm-001.qcow2
- Important: If a VM has more than just one disk image, repeat this step for every disk image
-
Restore the VM
- Shutdown the VM if it is running:
-
virsh shutdown vm-001
-
- Undefine the VM if it is still defined:
-
virsh undefine vm-001
-
- Overwrite the original disk image:
-
mv /path/to/retain/vm-001.qcow2 /path/to/images/vm-001.qcow2
- Important: If a VM has more than just one disk image, repeat this step for every disk image
-
- Restore the VMs backend entry:
- Write the backend entry from the retain location (
/path/to/retain/vm-001.backend
) to the backend
- Write the backend entry from the retain location (
- Overwrite the VMs XML description with the one from the retain location
-
cp -p /path/to/retain/vm-001.xml /path/to/xmls/vm-001.xml
-
- Restore the VM from the state file with the corrected XML
-
virsh restore /path/to/retain/vm-001.state --xml /path/to/xmls/vm-001.xml
-
Communication through backend
The actual KVM-Restore process is controlled completely by the Control instance daemon via the OpenLDAP directory. See OpenLDAP Directory Integration the involved attributes and possible values.
You can modify/update these interactions by editing File:Restore-Interaction.xmi (you may need Umbrello UML Modeller diagram programme for KDE to display the content properly).
Control instance Daemon Interaction for restoring a Backup with LDIF Examples
Step 01: Start the unretainSmallFiles process (Control instance daemon)
The first step of the restore process is to copy the small files (in this case the XML file and the LDIF) from the configured backup location to the configured retain location.
# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to # unretainSmallFiles (this way the Provisioning-Backup-VKM daemon knows, that it must start the unretainSmallFiles process). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 0 - replace: sstProvisioningMode sstProvisioningMode: unretainSmallFiles
Step 02: Starting the unretainSmallFiles process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon receives the command to unretain the small files, it sets the sstProvisioningMode to unretainingSmallFiles to tell the Control instance daemon and other interested parties, that it is unretaining the small files for the virtual machine or virtual machine template.
# The attribute sstProvisioningMode is set to unretainingSmallFiles by the Provisioning-Backup-VKM daemon. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningMode sstProvisioningMode: unretainingSmallFiles
Step 03: Finalizing the unretainSmallFiles process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon has executed the commands to unretain the small files, it sets the sstProvisioningMode to unretainedSmallFiles, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the unretaining of all the small files from the configured backup location is finished.
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when # the attributes sstProvisioningReturnValue and sstProvisioningMode are set. # With this combination, the Control instance daemon knows, that it can proceed. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T012000Z - replace: sstProvisioningReturnValue sstProvisioningReturnValue: 0 - replace: sstProvisioningMode sstProvisioningMode: unretainedSmallFiles
Step 05: Start the unretainLargeFiles process (Control instance daemon)
Next step in the restore process is to copy the large files (state file and disk images) from the configured backup directory to the configured retain directory.
# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to # unretainLargeFiles (this way the Provisioning-Backup-VKM daemon knows, that it must start the unretainLargeFiles process). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 0 - replace: sstProvisioningMode sstProvisioningMode: unretainLargeFiles
Step 06: Starting the unretainLargeFiles process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon receives the command to unretain the large files, it sets the sstProvisioningMode to unretainingLargeFiles to tell the Control instance daemon and other interested parties, that it is unretaining the large files for the virtual machine or virtual machine template.
In the meantime the vm-manager merges the LDIF we have unretained in step 02 with the one in the live directory to sort out possible differences in the configuration of the virtual machine.
# The attribute sstProvisioningMode is set to unretainingSmallFiles by the Provisioning-Backup-VKM daemon. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningMode sstProvisioningMode: unretainingLargeFiles
Step 07: Finalizing the unretainLargeFiles process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon has executed the commands to unretain the large files, it sets the sstProvisioningMode to unretainedLargeFiles, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the unretaining of all the large files from the configured backup location is finished.
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when # the attributes sstProvisioningReturnValue and sstProvisioningMode are set. # With this combination, the Control instance daemon knows, that it can proceed. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T012000Z - replace: sstProvisioningReturnValue sstProvisioningReturnValue: 0 - replace: sstProvisioningMode sstProvisioningMode: unretainedLargeFiles
Step 09: Start the restore process (Control instance daemon)
Since we now have all necessary files in the configured retain location, the restore process can be started. There we simply copy the disk images back to their original location and restore the VM from the state file (which is also at the configured retain location)
# The attribute sstProvisioningState is set to zero by the Control instance daemon, when sstProvisioningMode is modified to # restore (this way the Provisioning-Backup-VKM daemon knows, that it must start the restore process). dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 0 - replace: sstProvisioningMode sstProvisioningMode: restore
Step 10: Starting the restore process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon receives the restore command, it sets the sstProvisioningMode to restoring to tell the Control instance daemon and other interested parties, that it is restoring the virtual machine or virtual machine template.
# The attribute sstProvisioningMode is set to restoring by the Provisioning-Backup-VKM daemon. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningMode sstProvisioningMode: restoring
Step 11: Finalizing the restore process (Provisioning-Backup-KVM daemon)
As soon as the Provisioning-Backup-KVM daemon has executed the restore command, it sets the sstProvisioningMode to restored, the sstProvisioningState to the current timestamp (UTC) and sstProvisioningReturnValue to zero to tell the Control instance daemon and other interested parties, that the restore process is finished.
# The attribute sstProvisioningState is set with the current timestamp by the Provisioning-Backup-VKM daemon, when # the attributes sstProvisioningReturnValue and sstProvisioningMode are set. # With this combination, the Control instance daemon knows, that it can proceed. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T012000Z - replace: sstProvisioningReturnValue sstProvisioningReturnValue: 0 - replace: sstProvisioningMode sstProvisioningMode: restored
Step 12: Finalizing the restore process (Control instance daemon)
As soon as the Control instance daemon notices, that the attribute sstProvisioningMode ist set to restored, it sets the sstProvisioningMode to finished and the sstProvisioningState to the current timestamp (UTC). All interested parties now know, that the restore process is finished.
# The attribute sstProvisioningState is updated with current time by the Control instance daemon, when sstProvisioningMode is # set to finished. # All interested parties now know, that the restore process is finished. dn: ou=20121002T010000Z,ou=backup,sstVirtualMachine=kvm-005,ou=virtual machines,ou=virtualization,ou=services,o=stepping-stone,c=ch changetype: modify replace: sstProvisioningState sstProvisioningState: 20121002T012001Z - replace: sstProvisioningMode sstProvisioningMode: finished
Current Implementation (Restore)
- Since the prov-backup-kvm daemon is not running on the vm-nodes (c.f. stoney_conductor:_Backup#State_of_the_art), the restore process does not work when clicking the icon in the webinterface.
- Resolving the conflicts in the backend and XML description file is not yet done
- Actually all steps not executed by prov-backup-kvm are not yet properly implemented (c.f. stoney_conductor:_prov_backup_kvm#Restore)
- The implementation is done, but the last step from the restore process is different:
- The
virsh restore
command is not executed with the--xml
option, the XML from the state file is taken when restoring the machine. Therefore the conflicts are not properly resolved.
- The
How to manually restore a machine from backup
Important: Before you continue with this guide, make sure that you have no other possibility to restore the machine. It might be easier and safer to get lost files from the online backup if the machine has one set up.
If you really have to restore the machine from the backup:
- Stop the machine from via the web interface
- Login (as root) on the VM-Node the machine was running on
As a first step, you would like to set some useful bash variables to be able to copy paste the following guide:
Double check all variables you are setting here. If one is not correct, you will restore a running machine or overwrite a live-disk image!
machinename="<MACHINE-NAME>" # For example: machinename="b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6" vmpool="<VM-POOL>" # For example vmpool="0f83f084-8080-413e-b558-b678e504836e" vmtype="<VM-TYPE>" # For example vmtype="vm-persistent"
Change to the backup directory for the given machine and check the iterations:
cd /var/backup/virtualization/${vmtype}/${vmpool}/${machinename} ls -al
Change into the most recent iteration
cd 2014... ls -al
In there you should have:
- The state file <MACHINE-NAME>.state.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.state.20140109T134445Z)
- The XML description <MACHINE-NAME>.xml.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.xml.20140109T134445Z)
- The ldif file <MACHINE-NAME>.ldif.<BACKUP-DATE> (for example b6dc3d27-5981-4b18-8f3f-31ed3d21a3c6.ldif.20140109T134445Z)
- And at least one disk image <DISK-IMAGE>.qcow2.<BACKUP-DATE> (for example 8798561b-d5de-471b-a6fc-ec2b4831ed12.qcow2.20140109T134445Z)
Now you should save the backup date and the disk image(s) in a variable
backupdate="<BACKUP-DATE>" # For example: backupdate="20140109T134445Z" diskimage1="<DISK-IMAGE-1>" # For example: diskimage1="8798561b-d5de-471b-a6fc-ec2b4831ed12.qcow2" diskimage2="<DISK-IMAGE-2>" # For example: diskimage2="aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee.qcow2" ...
Have again a look at the different variables and double check them again
echo "Machine Name = ${machinename}" echo "VM Pool = ${vmpool}" echo "VM Type = ${vmtype}" echo "Backup date = ${backupdate}" echo "Disk Image 1 = ${diskimage1}" echo "Disk Image 2 = ${diskimage2}" ...
Copy all these files to the retain location:
currentdate=`date --utc +'%Y%m%dT%H%M%SZ'` mkdir -p /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate} cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${machinename}.ldif.${backupdate} /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/
Now you are entering the critical part. You won't be able to undo the following steps
Check if there is a difference between the current LDAP entry and the one from the backup
domain="<DOMAIN>" # For example domain="stoney-cloud.org" ldapbase="<LDAPBASE>" # For expample ldapbase="dc=stoney-cloud,dc=org" ldapsearch -H ldaps://ldapm.${domain} -b "sstVirtualMachine=${machinename},ou=virtual machines,ou=virtualization,ou=services,${ldapbase}" -s sub -x -LLL -o ldif-wrap=no -D "cn=Manager,${ldapbase}" -W "(objectclass=*)" > /tmp/${machinename}.ldif diff -Naur /tmp/${machinename}.ldif /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}
and edit the file at the retain location according to your needs.
If there are no differences (or the differences are not important) you can skip the following step. Otherwise use the PhpLdapAdmin to delete the machine from the LDAP directory (do not forget to delete the dhcp entry dn: cn=<MACHINE-NAME>,ou=virtual machines,cn=192.168.140.0,cn=config-01,ou=dhcp,ou=networks,ou=virtualization,ou=services,dc=stoney-cloud,dc=org
). Then add the LDIF (the one you just edited) to the LDAP (first do some general replacement)
/usr/bin/ldapadd -H "ldaps://ldapm.${domain}" -x -D "cn=Manager,${ldapbase}" -W -f /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate}/${machinename}.ldif.${backupdate}
Undefine the machine
virsh undefine ${machinename}
Copy all the disk images from the backup location back to their original location
cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${diskimage1}.${backupdate} /var/virtualization/${vmtype}/${vmpool}/${diskimage1} cp -p /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${diskimage2}.${backupdate} /var/virtualization/${vmtype}/${vmpool}/${diskimage2} ...
And restore the domain from the state file from the backup location with the XML from the retain location (the one you might have edited)
virsh restore /var/backup/virtualization/${vmtype}/${vmpool}/${machinename}/${backupdate}/${machinename}.state.${backupdate}
Now the machine should be up and running again. Continuing where it was stopped when taking the backup.
If everything is OK, you can cleanup the created files and directories
rm -rf /var/virtualization/retain/${vmtype}/${vmpool}/${machinename}/${currentdate} rm /tmp/${machinename}.ldif