taking hot backups with Oracle VM

Propably my major complaint about oracle vm is it’s lack of a hot backup solution for virtual machines. There is simply no documented or supported way to back up running virtual machines – crazy. Of course you can install good old backup agents inside of your VMs but that is one of the things I want to avoid when deploying virtualization. Or, you could suspend or halt a machine, copy the virtual disk off to somewhere else and be happy. But I feel like we should not have to stop our services for backup. This is 2012 afterall!
Sure, the latest release introduced a new feature to export an OCFS2 filesystem via NFS so it can be backed up from another machine. But that is only half of the solution since a simple filesystem copy of a virtual disk image is very likely to be corrup if a running machine writes to it while we are trying to read.

One workaround suggested across forums and message boards is to clone your running machine and then copy the virtual disk of the stopped cloned machine. This sounds like a dirty workaround but if it flies, I’ll be happy. Unfortunately, I could not find a way to clone VMs through commandline scripts so this was not really practical for automated, day-to-day backups.
Until I came across instructions today that describes a new (and provided as-is, unsupported…) CLI interface for ovm manager. It requires version 3.1.1 build 365 but updating from build 305 was quite easy.

And sure enough, I can now use ssh to log in to the CLI:

[root@ovm ~]# ssh -l admin -p 10000 localhost
admin@localhost's password: 
OVM> showversion
3.1.1.365

Diving right into the task at hand, I cloned a test VM:

OVM> clone Vm name=BTCminer_01 destType=Vm destName=BTCM01_backup serverPool=ptx_pool
Command: clone Vm name=BTCminer_01 destType=Vm destName=BTCM01_backup serverPool=ptx_pool
Status: Success
Time: 2012-07-19 12:14:08.539

OVM> show VM name=BTCM01_backup
Command: show VM name=BTCM01_backup
Status: Success
Time: 2012-07-19 12:21:38.055
Data: 
  Name = BTCM01_backup
  Id = 0004fb0000060000a0d318dfea1a8ecb
  Status = Stopped
  Memory (MB) = 1024
  Max. Memory (MB) = 2048
  Max. Processors = 8
  Processors = 8
  Priority = 10
  Processor Cap = 80
  High Availability = false
  Operating System = Oracle Linux 6
  Mouse Type = Default
  Domain Type = Xen PVM
  Keymap = en-us
  description = bitcoin miner test, burning away CPU
  Server = 08:00:20:ff:ff:ff:ff:ff:ff:ff:00:1b:24:78:cc:62  [ovm01]
  Repository = 0004fb0000030000d4d126daf6f36560  [ovm_repo1tb]
  Vnic 1 = 0004fb0000070000fe61d3745c1e09c4  [00:21:f6:42:42:01]
  VmDiskMapping 1 = 0004fb0000130000e5ff03a3b8fe3a6b

OVM> show VmDiskMapping id=0004fb0000130000e5ff03a3b8fe3a6b
Command: show VmDiskMapping id=0004fb0000130000e5ff03a3b8fe3a6b
Status: Success
Time: 2012-07-19 14:53:32.845
Data: 
  Name = 0004fb0000130000e5ff03a3b8fe3a6b
  Id = 0004fb0000130000e5ff03a3b8fe3a6b
  Slot = 0
  Emulated Block Device = false
  Virtual Disk Id = 0004fb000012000054b0f999972b7d64.img  [BTCminer_01 (2)]
  Vm Id = 0004fb0000060000a0d318dfea1a8ecb  [BTCM01_backup]

I now have a stopped (consistent) clone of my running machine and I know the machine id and the virtual disk image file. sweet! I already mounted the repository on my ovm server, so now I can copy the vm.cfg and virtual disk to another filesystem (plain local disk in my test case)

cp -pr /mnt/repository/VirtualMachines/0004fb0000060000a0d318dfea1a8ecb/ /var/www/html/ovmbackup/
root@ovm ~]# cp -pr /mnt/repository/VirtualDisks/0004fb00001200007df94d4e5a72be09.img /var/www/html/ovmbackup/

Of course, I was eager to see how restoring works…

OVM> importVirtualDisk repository name=ovm_repo1tb server=ovm01 url='http://ovmmgr/ovmbackup/backup.img'
Command: importVirtualDisk repository name=ovm_repo1tb server=ovm01 url='http://ovmmgr/ovmbackup/backup.img'
Status: Success
Time: 2012-07-19 15:51:12.712

OVM> create VM name=recoverytest repository=ovm_repo1tb domainType=XEN_PVM memory=1024 on Server name=ovm01
Command: create VM name=recoverytest repository=ovm_repo1tb domainType=XEN_PVM on Server name=ovm01
Status: Success
Time: 2012-07-19 16:04:06.091

OVM> create vmDiskMapping name=recoverMap1 slot=1 storageDevice=backup.img on vm name=recoverytest
Command: create vmDiskMapping name=recoverMap1 slot=1 storageDevice=backup.img on vm name=recoverytest
Status: Success
Time: 2012-07-19 16:05:53.694

I cheated a little bit and did the assignment of a virtual network in the GUI, stopped the original vm and started the recovered machine. Eureka! It came up just like I expected it to. All the little pieces are in place now to build an automated backup process for our virtual machines.

Next Steps:
Put the backup-steps in a simple script so that it can run automatically. I would love to be able to use public key authentication with that ssh server. If that does not work, I’ll have to play with modifying the provided “expect” scripts to do what I want.

I also don’t want to just trust the backup to work, especially with the snapshot taken while the VM is running. In theory, the filesystem inside the VM should survive this crash-consistent state but I want to really make sure it does. Plus, there are a ton of other things that can go wrong. So in addition to automating the backup process I’d like to automate the recovery aswell. The idea is to import the backup back into OVM, change the virtual network to a sandbox and boot the VM. We can then perform a series of basic tests against it to check if all needed services inside the VM come back up the way they should. When this works, we have tested and guaranteed that our backup really works and we also know how long it takes to restore our backup.

8 thoughts on “taking hot backups with Oracle VM

  1. Great write-up. Our company has 3 OVM Servers and let me tell you its a pain to back them up. I’m very surprised at the Oracle VM team touting this as such a robust product when in fact they do not offer a documented backup solution. Currently we run expect scripts to SSH into our OVM Manager and the scripts shut-down all of our JD Enterprise One machines, then simply cp’s the images and config files someplace else.
    I think the major difference in your approach here is that you create a clone of the machine while its running correct?

  2. Yes, since snapshots are not offered by OVM, we use the clone feature to create a clone of a running VM and then copy the contents off of that disk.
    Another approach would be to use the reflink command on the shell to create an OCFS snapshot of a disk

  3. We have a VM 3.1.1 Installation, one of our VM servers currently with local repository and the VM Machines defined went down. I can not find any utility to bring the server online the message is Server is locked. The operator found the server unresponsive and turned off using the power button. You guide looks appropiate for this case, however I still need to bring back services.
    Thanks for posting.

  4. Pingback: DOAG 2012 review | portrix systems

  5. Pingback: top 3 blog posts of 2012 | portrix systems

  6. Pingback: top 3 posts and review of 2013 | portrix systems

  7. Awesome post. I still struggle to believe OVM3 doesn’t provide a better way of handling backups. For example, even snapshots at the file system layer. I suppose they ideally want everybody to go and buy a ZFS Storage Appliance and using SAN-based snapshots which can be then streamed to media. For the rest of us, your approach is certainly much appreciated.

Leave a Reply

Your email address will not be published. Required fields are marked *