During my most recent installation of an Oracle VM 3.2 server pool, I was pondering again all the different options to manage storage with OVM, wrote them down together with pros and cons (to help me decide which one to implement) and now I am sharing these notes.
First of all, you will need to understand the different kinds of storage that OVM can use for all its data. We will start with the easier ones and work our way up.
VM Templates, Assemblies and ISOs need to be stored somewhere. Since these files are not really needed for VMs to run, availability may not be of the utmost importance. However, if you are planning to (thin) clone off of templates, these need to be on the same storage as the virtual disks. Using physical disks for these files (especially ISOs) does not make much sense.
The shared storage for server pool is needed for cluster heartbeat and eviction decisions. It does not require a lot of performance but if this storage is unavailable the cluster gets into trouble. So availability trumps speed or throughput here. I have set this up on physical disks (both iSCSI and FC) in the past but sometimes ran into trouble when that storage was unavailable for short periods during scsi bus rescans. What happened was that access to the SCSI device was blocked during a LIP and after a few seconds nodes started to reboot because they were unable to access and/or ping the shared storage. The root cause of that is most likely something within the SAN, SCSI stack, HBA or driver and may or may not be the same for your environment. I prefer NFS for the cluster storage now.
The vm config storage still needs to be on a repo even if you are using physical disks for everything else. I have not tested what happens to running machines when this storage goes down (I imagine they’d simply keep running). And again, performance is not an issue, these are just a bunch of small xml files with the machine configs.
The actual vm disks (either physical or virtual) are the real big question when setting up an OVM system. The options are NFS vs repository vs physical disks (LUNs in a SAN). And also ethernet (or iSCSI) vs fibre channel. And not all options are supported for all file types which makes it even more complicated.
This overview should make the options clear:
|ISOs, assemblies||pool cluster storage||vm config||vm disks|
So far, we have looked at what our options are and which are supported. Now let’s see which features are supported by each of the storage options. Performance considerations will be taken into account in another part of this blog series.
NFS-based repositories are by far the easiest method to set up. All you need is a filer somewhere and you are good to go. Backup is also really easy because you just need to mount the share and copy the data off of it. But there is no support for thin cloning so everytime you create a VM from a template or clone a VM all the data will be copied and used from the storage.
|easiest setup||no thin cloning (and no cloning of running VMs)|
|easy backup: can directly read files, “only” need to shut down machines for consistent vdisks||sneak peek into part 2: poor performance when compared with the other options|
The way OCFS repositories work is by creating a cluster filesystem on a shared LUN (either iSCSI or Fibre Channel) across all nodes. So obviously you will need that LUN first and all the rest will be taken care of by the OVM manager when you create a new repository. The cluster file system allows all nodes in the server pool to access the same files at the same time and OCFS also provides a few features like reflink based think cloning of virtual disks. This comes at a small price of some added overhead and the other risk is that a problem with the cluster file system will affect all the files and VMs depending on it at the same time. I will leave the discussion of iSCSI vs FC out of this post since this will most likely depend on your existing infrastructure.
|thin cloning (based on reflinks)||backup requires setup of NFS export of fs|
|somewhat easy setup (LUNs only need to be setup once)||resize (add storage) is annoying|
The final option available is to use physical disks. The term may be a bit misleading since these disks are just as physical or virtual as the others. Also, in a server pool, these cannot be disks local to any of the OVM servers since those could only be accessed from that one server which prevents VMs from failing over or being migrated to another server in the pool. So a physical disk in a server pool is a LUN (either iSCSI or Fibre Channel) on a SAN that is directly mapped to a virtual machine without a layer of OCFS in between. Creating these physical disks require new LUNs to be created and presented through the SAN which can be a number of manual steps. Fortunately, OVM supports storage-connect plugins which can be used to automate these tasks from the OVM manager GUI. Plugins exist for arrays from a number of vendors including EMC, NetApp, Hitachi, Fujitsu and of course also Oracle’s own ZFS storage appliance. I have used physical disks with a generic array before but that was a big pain and I highly recommend to using the plugin if one exists for your storage. Not only does it make management of devices easier but it also allows you to use features from your storage array to support thin cloning.
|zfs based thin cloning (on ZFS SA, other arrays may have similar features)||backup either with array tools or by cloning to an nfs or repo (but currently that is not scriptable through cli)|
|no ocfs or vdisk overhead (more on this in part 2)||(re)discovery of LUNs is not always smooth, may depend on setup|
|nice, easy integrated management from ovmm|
In conclusion I decided against NFS for my vdisk storage for two reasons. Performance was much better with all other approaches and the ability to create snapshots of running VMs is the foundation for our backup and recovery strategy. Without consistent snapshots (or clones) of running machines the backup options are limited to regular file-based backups from withing the VM guests or require you to halt the VMs in order to take a backup of the whole machine.
Deciding between repositories and physical disks is a bit more challenging. On one hand, OCFS adds a bit of overhead and one more piece that can break to your setup. On the other hand, adding and rediscovering LUNs can also cause trouble, especially without the storage-connect plugin. Part 2 of this blog will be about benchmarking and comparing these options so one can base the decision on performance aswell as manageability.
What others say about this: