AIX

ID #1029

AIX HACMP Cluster tips

Quorum Issues

Quorum presents some tricky issues within an HACMP cluster. The cluster administrator has two choices:
  • Enable quorum checking in shared volume groups. This results in the entire volume group going offline if 50% or more of the VGDAs are lost. A typical shared volume group will have each physical volume's data mirrored on a second physical volume. Since this results in the volume group having an even number of physical volumes, loss of half of the physical volumes will cause quorum to be lost and the rest of the physical volumes will go offline.

    Obviously, this isn't acceptable behaviour in a high-availability environment.

  • Disable quorum checking in shared volume groups. This avoids the problem described above since the volume group remains online as long as at least one physical volume is still available. Unfortunately, a volume group with quorum disabled can't be varied online if any of the physical volumes are unavailable.

    Obviously, this is also unacceptable behaviour in a high-availability environment.

Fortunately, there is a solution to this apparent quandry. The varyonvg command will online a quorum-disabled volume group with missing physical volumes if the MISSINGPV_VARYON environment variable is set to TRUE when the varyonvg command is invoked.

So . . . configure your shared volume groups with quorum disabled and put the line

MISSINGPV_VARYON=TRUE
in each node's /etc/environment file (changes to this file don't take effect until after the next reboot).

Note that the MISSINGPV_VARYON environment variable is a relatively new feature of AIX. Verify that it works as described above on your version of AIX before relying on it.

Also note that setting MISSINGPV_VARYON in /etc/environment doesn't work in HACMP/ES clusters because HACMP/ES ignores the contents of /etc/environment. In an HACMP/ES cluster, find the commented out line in /usr/es/sbin/cluster/utlitities/clvaryonvg and uncomment it (it's near line 400).

One last piece of advice and warning:

If half of your disks go offline with the result that you lose exactly half of your disk mirrors then your volume group will remain online if you've disabled quorum. If you then experience a failover and the takeover node is able to online the half of your disks which were previously offline then everything will (probably) appear to work. The problem is that you are using OLD DATA!!!!!

This scenario is disconcertingly easy to create (cut the paths to half of the disks for the first node and the paths to the other half of the disks from the second node and then trigger a failover). Trust me! You do NOT want this to happen!

There are no perfect solutions to this problem although here's a bit of advice which will probably help a lot:
  • Always, ALWAYS, ALWAYS respond to broken disk mirrors promptly since that avoids the old data being very old if the above scenario occurs. This is probably an excellent place to use AIX's Error Notification mechanism (smit screens to do this are provided with HACMP 'classic' and HACMP/ES).
  • One way to avoid the above scenario is to triple-mirror all logical logical volumes and turn quorum back on for the volume group. If done correctly, this will allow the disk subsystem to survive the loss of one disk copy and should prevent the disks from staying online or being brought online in any combination which can result in a failure along the lines of the above scenario.

    Obviously, triple-mirroring logical volumes is a rather expensive proposition. Then again, the consequences of running with old data are likely to be even more expensive.

Finding mounted file systems associated with a volume group

The following shell script will list the currently mounted file systems associated with a specified volume group (the -l option specifies the default behaviour of listing the logical volumes and the -f option specifies that file system mount points are to be listed instead).

This script could be used in a crontab job which periodically verifies that nobody has mounted a shared file system without adding it to the appropriate resource group.

#!/bin/ksh
#
# list the mounted file systems associated with a specified volume group
#
# This script was developed and tested on AIX 4.3.2.
#
# Author: Daniel Boulet (danny@matildasystems.com)

showlv=1

if [ "$1" = "-l" ] ; then
showlv=1
shift
elif [ "$1" = "-f" ] ; then
showlv=0
shift
fi

if [ $# -ne 1 ] ; then
echo "Usage: $0 [ -l | -f ] __vgname__"
exit 1
fi

if getlvodm -d "$1" > /dev/null 2> /dev/null ; then
getlvodm -L "$1" | while read lvname junk ; do
if df | grep -q "^/dev/$lvname " ; then
if [ $showlv -eq 1 ] ; then
echo $lvname
else
df | grep ^/dev/$lvname | awk '{print $7}'
fi
fi
done
else
echo $0: "$1 isn't the name of a volume group"
exit 1
fi

exit 0

Centralized syslog messages

This is a really simple tip which lets you see cluster-related syslog messages in proper time-ordering.
  1. Configure syslogd on each cluster node to send all syslog messages to a non-cluster system. Assuming that the non-cluster system is called critter, add the following line to /etc/syslog.conf on each cluster node:
    *.debug @critter
    and refresh syslogd on each cluster node using the command:
    refresh -s syslogd
  2. Configure syslogd on the non-cluster system to send log messages to a file. For example, adding the following line to /etc/syslog.conf and refreshing syslogd on the non-cluster system sends all syslog messages to the file /var/log/debug:
    *.debug /var/log/debug
Once you've done this, syslog messages from the cluster (including everything you see in /var/adm/cluster.log) will appear in the specified file on the non-cluster node in proper time-ordering.

Note that the protocol which syslogd uses to send messages to other systems is somewhat unreliable so some messages might be lost in transit (the frequency of lost messages is typically quite low and depends on the quality of your network infrastructure, your network load and the relative system loads of the machines in question).

Categories for this entry

Tags: Cluster, HACMP, IBM

Related entries:

Last update: 2009-06-19 18:01
AuthorLuke Francis
Revision: 1.0

Digg it! Print this record Send FAQ to a friend Show this as PDF file
Propose a translation for Propose a translation for
Please rate this FAQ:

Average rating: 0 out of 5 (0 Votes )

completely useless 1 2 3 4 5 most valuable

You cannot comment on this entry