Today, I attended an informal Oracle breakfast event which included a presentation by Joerg Moellenkamp about best practices for running Oracle databases on ZFS filesystems. There is a whitepaper that describes most of the issues that you should consider. In this post, I’d simply like to share my notes on the presentation, things that I find important or were new to me.
Joerg made a case for using ZFS to mirror data because this will give ZFS another chance to repair broken blocks or checksums. I never thought about it that way and preferred to let the SAN take care of mirroring but I will consider this in the future.
I was aware that the more a zpool got filled up the more effort it was for the system to find free blocks and that this leads to slower performance. What I did not know is that zfs actually switches to a different algorithm to find free blocks at a certain level and that this level is configurable with metaslab_df_free_pct. Older releases switch at about 80% full and try to find the “best fit” for new data which slows things down even more. Read more about it here.
One issue that I did find out just a few days ago is that you cannot set the primarycache and secondarycache parameters independently. The way that L2ARC caching (using read-optimized SSDs as cache devices in a hybrid storage pool) works is by only writing to this second level cache when data is cleaned out from the primary cache. So if you disable the caching of data or metadata for your primarycache (memory), then this data will also never make it’s way to your SSDs. This post is really helpful to understand the internals behind it (and then it becomes very obvious)
The theory of “IOPS inflation” was also briefly discussed: Due to ZFS’s copy-on-write behaviour, blocks that are updated get written to a different location on disk which may lead to a degradation in performance for sequential reads that would benefit from the blocks being in the ‘proper’ order like backups or full scans. While this has not been an issue for our databases (and Joerg also mentioned that he only knows one case), I’d like to take some time and construct a demo for some further studies sometime.
Update 2013/02/27: Bart Sjerps wrote an excellent blog article that shows the fragmentation on the physical disks that occurs when updating random blocks with Oracle on ZFS. He uses SLOB and introduces a searchable ASCII string to look at the raw files. There is no conclusion (yet) about how big the impact on performance for full scans or backups are but it does become very clear that fragmentation does easily occur and that this will lead to more IOPS to the disks to read a number of “sequential” blocks.