Trouble with clusterware fencing and Solaris x64 boot archives

When Oracle clusterware detects an error (like no connectivity on network interfaces or to shared disks) it performs a hard and immediate reboot of the Server. This is the documented and expected behavior. Nothing wrong with that. But the annoying part is that solaris will refuse to reboot if any file that makes up the boot archive has changed since the last time the boot archive was synced. This usually happens when you issue ‘reboot’, ‘init 6’, ‘init 0’ and such. Just not when clusterware reboots the server or it crashed because of a power outage. The fix is to boot into failsafe mode, this will detect a broken bootarchive and ask you to fix it. It doesn’t take long but it still requires some manual intervention and this is not what I want to be concerned with when in the middle of the night or during some major RAC database problems. So I was looking for a way to prevent this from occurring at all.
You can manually reboot the archive with ‘bootadm update-archive’ as root user and it is propably a good idea to run this automatically from cron every day or so.

Leave a Reply

Your email address will not be published. Required fields are marked *