1 <!doctype html public
"-//w3c//dtd html 4.0 transitional//en">
4 <meta http-equiv=
"Content-Type" content=
"text/html; charset=iso-8859-1">
5 <meta name=
"GENERATOR" content=
"Mozilla/4.78 [en] (X11; U; Linux 2.4.7-10 i686) [Netscape]">
6 <title>Cluster Installation and Administration
</title>
7 <DOCTYPE HTML PUBLIC W3C//DTD
4.0//EN
">
9 <body bgcolor="#FFFFFF
">
12 <font size=+4>Installation and Administration *Draft*</font></h1>
15 <font size=+4>Red Hat Cluster Manager</font></h2>
34 <p>Copyright © 2000 Mission Critical Linux, Inc.
35 <br>Copyright © 2002 Red Hat, Inc.
39 <p>This document describes how to set up and manage the Red Hat Cluster
40 Manager, which provides application availability and data integrity.
46 <p><i>Editorial comments:</i>
47 <p><i> Searching for "Editorial comment
" will highlight
48 many areas which need some work.</i>
51 <i>Need to update TOC & heading numbers (as sections have been added
52 & deleted).</i></li>
55 <i>New power management scheme not done yet.</i></li>
58 <i>New NFS services section added, but awaiting editorial review.</i></li>
61 <i>Needs Piranha integration work.</i></li>
64 <i>Needs updates to reflect service manager changes as well as service
68 <i>Many sections ended up with lots of unnecessary extra blank lines (gratuitiously
69 thrown in by Netscape Composer?).</i></li>
72 <i>Could benefit from a spell check.</i></li>
96 Table of Contents</h2>
98 <table BORDER=0 CELLSPACING=0 CELLPADDING=3 WIDTH="75%
" NOSAVE >
100 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37" NOSAVE><font size=+1><a href="#changes
">New
101 and Changed Features</a></font></td>
105 <td COLSPAN="6"><font size=+1><a href="#introduction
">1 Introduction</a></font></td>
109 <td WIDTH="2%
"> </td>
111 <td COLSPAN="5"><a href="#overview
">1.1 Cluster Overview</a></td>
115 <td WIDTH="2%
"> </td>
117 <td COLSPAN="5"><a href="#features
">1.2 Cluster Features</a></td>
121 <td WIDTH="2%
"> </td>
123 <td COLSPAN="5"><a href="#steps
">1.3 How To Use This Manual</a></td>
127 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37"><font size=+1><a href="#hardware
">2
128 Hardware Installation and Operating System Configuration </a></font></td>
132 <td WIDTH="2%
"> </td>
134 <td COLSPAN="5"><a href="#gather
">2.1 Choosing a Hardware Configuration</a></td>
138 <td WIDTH="2%
"> </td>
140 <td WIDTH="2%
"> </td>
142 <td COLSPAN="4"><a href="#hardware-table
">2.1.1 Cluster Hardware Table</a></td>
146 <td WIDTH="2%
"> </td>
148 <td WIDTH="2%
"> </td>
150 <td COLSPAN="4"><a href="#install-min
">2.1.2 Example of a Minimum Cluster
151 Configuration </a></td>
155 <td WIDTH="2%
"> </td>
157 <td WIDTH="2%
"> </td>
159 <td COLSPAN="4"><a href="#install-max
">2.1.3 Example of a No-Single-Point-Of-Failure
160 Configuration</a></td>
164 <td WIDTH="2%
" HEIGHT="21"> </td>
166 <td COLSPAN="5" HEIGHT="21"><a href="#basic-install
">2.2 Steps for Setting
167 Up the Cluster Systems</a></td>
171 <td WIDTH="2%
"> </td>
173 <td WIDTH="2%
"> </td>
175 <td COLSPAN="4"><a href="#hardware-system
">2.2.1 Installing the Basic System
180 <td WIDTH="2%
"> </td>
182 <td WIDTH="2%
"> </td>
184 <td COLSPAN="4"><a href="#hardware-terminal
">2.2.2 Setting Up a Console
189 <td WIDTH="2%
"> </td>
191 <td WIDTH="2%
"> </td>
193 <td COLSPAN="4"><a href="#hardware-network
">2.2.3 Setting Up a Network
194 Switch or Hub</a></td>
198 <td WIDTH="2%
"> </td>
200 <td COLSPAN="5"><a href="#install-linux
">2.3 Steps for Installing and Configuring
201 the Linux Distribution</a></td>
205 <td WIDTH="2%
"> </td>
207 <td WIDTH="2%
"> </td>
209 <td COLSPAN="4"><a href="#linux-dist
">2.3.1 Linux Distribution and Kernel
210 Requirements</a></td>
214 <td WIDTH="2%
"> </td>
216 <td WIDTH="2%
"> </td>
218 <td COLSPAN="3"> </td>
220 <td WIDTH="96%
"><a href="#valinux
">2.3.1.1 VA Linux Distribution Installation
221 Requirements </a></td>
225 <td WIDTH="2%
"> </td>
227 <td WIDTH="2%
"> </td>
229 <td COLSPAN="3"> </td>
231 <td WIDTH="96%
"><a href="#redhat
">2.3.1.2 Red Hat Distribution Installation
232 Requirements</a></td>
236 <td WIDTH="2%
"> </td>
238 <td WIDTH="2%
"> </td>
240 <td COLSPAN="4"><a href="#hosts
">2.3.2 Editing the /etc/hosts File</a></td>
244 <td WIDTH="2%
"> </td>
246 <td WIDTH="2%
"> </td>
248 <td COLSPAN="4"><a href="#alt-kernel
">2.3.3 Decreasing the Kernel Boot
249 Timeout Limit</a></td>
253 <td WIDTH="2%
"> </td>
255 <td WIDTH="2%
"> </td>
257 <td COLSPAN="4"><a href="#dmesg
">2.3.4 Displaying Console Startup Messages</a></td>
261 <td WIDTH="2%
"> </td>
263 <td WIDTH="2%
"> </td>
265 <td COLSPAN="4"><a href="#devices-kernel
">2.3.5 Displaying Devices Configured
266 in the Kernel</a></td>
270 <td WIDTH="2%
"> </td>
272 <td COLSPAN="5"><a href="#install-cluster
">2.4 Steps for Setting Up and
273 Connecting the Cluster Hardware</a></td>
277 <td WIDTH="2%
"> </td>
279 <td WIDTH="2%
"> </td>
281 <td COLSPAN="5"><a href="#hardware-heart
">2.4.1 Configuring Heartbeat Channels</a></td>
285 <td WIDTH="2%
"> </td>
287 <td WIDTH="2%
"> </td>
289 <td COLSPAN="4"><a href="#hardware-power
">2.4.2 Configuring Power Switches</a></td>
293 <td WIDTH="2%
"> </td>
295 <td WIDTH="2%
"> </td>
297 <td COLSPAN="4"><a href="#hardware-ups
">2.4.3 Configuring UPS Systems</a></td>
301 <td WIDTH="2%
"> </td>
303 <td WIDTH="2%
"> </td>
305 <td COLSPAN="4"><a href="#hardware-storage
">2.4.4 Configuring Shared Disk
310 <td WIDTH="2%
"> </td>
312 <td WIDTH="2%
"> </td>
314 <td COLSPAN="3"> </td>
316 <td WIDTH="96%
"><a href="#multiinit
">2.4.4.1 Setting Up a Multi-Initiator
321 <td WIDTH="2%
"> </td>
323 <td WIDTH="2%
"> </td>
325 <td COLSPAN="3"> </td>
327 <td WIDTH="96%
"><a href="#singleinit
">2.4.4.2 Setting Up a Single-Initiator
332 <td WIDTH="2%
"> </td>
334 <td WIDTH="2%
"> </td>
336 <td COLSPAN="3"> </td>
338 <td WIDTH="96%
"><a href="#single-fibre
">2.4.4.3 Setting Up a Single-Initiator
339 Fibre Channel Interconnect</a></td>
343 <td WIDTH="2%
"> </td>
345 <td WIDTH="2%
"> </td>
347 <td COLSPAN="3"> </td>
349 <td WIDTH="96%
"><a href="#state-partitions
">2.4.4.4 Configuring the Quorum
350 Partitions </a></td>
354 <td WIDTH="2%
"> </td>
356 <td WIDTH="2%
"> </td>
358 <td COLSPAN="3"> </td>
360 <td WIDTH="96%
"><a href="#partition
">2.4.4.5 Partitioning Disks</a></td>
364 <td WIDTH="2%
"> </td>
366 <td WIDTH="2%
"> </td>
368 <td COLSPAN="3"> </td>
370 <td WIDTH="96%
"><a href="#rawdevices
">2.4.4.6 Creating Raw Devices</a></td>
374 <td WIDTH="2%
"> </td>
376 <td WIDTH="2%
"> </td>
378 <td COLSPAN="3"> </td>
380 <td WIDTH="96%
"><a href="#filesystems
">2.4.4.7 Creating File Systems</a></td>
384 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="35"><font size=+1><a href="#software
">3
385 Cluster Software Installation and Initialization</a></font></td>
389 <td WIDTH="2%
"> </td>
391 <td COLSPAN="5"><a href="#software-steps
">3.1 Steps for Installing and
392 Initializing the Cluster Software</a></td>
396 <td WIDTH="2%
"> </td>
398 <td WIDTH="2%
"> </td>
400 <td COLSPAN="4"><a href="#software-rawdevices
">3.1.1 Editing the rawdevices
405 <td WIDTH="2%
"> </td>
407 <td WIDTH="2%
"> </td>
409 <td COLSPAN="4"><a href="#software-config
">3.1.2 Example of the cluconfig
414 <td WIDTH="2%
"> </td>
416 <td COLSPAN="5"><a href="#software-check
">3.2 Checking the Cluster Configuration</a></td>
420 <td WIDTH="2%
"> </td>
422 <td WIDTH="2%
"> </td>
424 <td COLSPAN="4"><a href="#cludiskutil
">3.2.1 Testing the Quorum Partitions </a></td>
428 <td WIDTH="2%
"> </td>
430 <td WIDTH="2%
"> </td>
432 <td COLSPAN="4"><a href="#pswitch
">3.2.2 Testing the Power Switches</a></td>
436 <td WIDTH="2%
"> </td>
438 <td WIDTH="2%
"> </td>
440 <td COLSPAN="4"><a href="#release
">3.2.3 Displaying the Cluster Software
445 <td WIDTH="2%
"> </td>
447 <td COLSPAN="5"><a href="#software-logging
">3.3 Configuring syslog Event
448 Logging </a></td>
452 <td WIDTH="2%
"> </td>
454 <td COLSPAN="5"><a href="#software-ui
">3.4 Using the cluadmin Utility</a></td>
458 <td WIDTH="2%
"> </td>
460 <td COLSPAN="5"><a href="#software-gui
">3.5 Configuring and Using the Graphical
461 User Interface</a></td>
465 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="36"><font size=+1><a href="#service
">4
466 Service Configuration and Administration</a></font></td>
470 <td WIDTH="2%
"> </td>
472 <td COLSPAN="5"><font size=+0><a href="#service-configure
">4.1 Configuring
473 a Service</a></font></td>
477 <td WIDTH="2%
"> </td>
479 <td WIDTH="2%
"> </td>
481 <td COLSPAN="4"><font size=+0><a href="#service-gather
">4.1.1 Gathering
482 Service Information</a></font></td>
486 <td WIDTH="2%
"> </td>
488 <td WIDTH="2%
"> </td>
490 <td COLSPAN="4"><font size=+0><a href="#service-scripts
">4.1.2 Creating
491 Service Scripts</a></font></td>
495 <td WIDTH="2%
"> </td>
497 <td WIDTH="2%
"> </td>
499 <td COLSPAN="4"><font size=+0><a href="#service-storage
">4.1.3 Configuring
500 Service Disk Storage</a></font></td>
504 <td WIDTH="2%
"> </td>
506 <td WIDTH="2%
"> </td>
508 <td COLSPAN="4"><font size=+0><a href="#service-app
">4.1.4 Verifying Application
509 Software and Service Scripts</a></font></td>
513 <td WIDTH="2%
"> </td>
515 <td WIDTH="2%
"> </td>
517 <td COLSPAN="4"><font size=+0><a href="#service-dbase
">4.1.5 Setting Up
518 an Oracle Service</a></font></td>
522 <td WIDTH="2%
"> </td>
524 <td WIDTH="2%
"> </td>
526 <td COLSPAN="4"><font size=+0><a href="#service-mysql
">4.1.6 Setting Up
527 a MySQL Service</a></font></td>
531 <td WIDTH="2%
"> </td>
533 <td WIDTH="2%
"> </td>
535 <td COLSPAN="4"><font size=+0><a href="#service-db2
">4.1.7 Setting Up a
536 DB2 Service</a></font></td>
540 <td WIDTH="2%
"> </td>
542 <td WIDTH="2%
"> </td>
544 <td COLSPAN="4"><font size=+0><a href="#service-apache
">4.1.8 Setting Up
545 an Apache Service</a></font></td>
549 <td WIDTH="2%
"> </td>
551 <td COLSPAN="5"><a href="#service-status
">4.2 Displaying a Service Configuration</a></td>
555 <td WIDTH="2%
"> </td>
557 <td COLSPAN="5"><a href="#service-disable
">4.3 Disabling a Service</a></td>
561 <td WIDTH="2%
"> </td>
563 <td COLSPAN="5"><a href="#service-enable
">4.4 Enabling a Service</a></td>
567 <td WIDTH="2%
"> </td>
569 <td COLSPAN="5"><a href="#service-modify
">4.5 Modifying a Service</a></td>
573 <td WIDTH="2%
"> </td>
575 <td COLSPAN="5"><a href="#service-relocate
">4.6 Relocating a Service</a></td>
579 <td WIDTH="2%
"> </td>
581 <td COLSPAN="5"><a href="#service-delete
">4.7 Deleting a Service </a></td>
585 <td WIDTH="2%
"> </td>
587 <td COLSPAN="5"><a href="#service-error
">4.8 Handling Services in an Error
592 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="36"><font size=+1><a href="#admin
">5
593 Cluster Administration</a></font></td>
597 <td WIDTH="2%
"> </td>
599 <td COLSPAN="5"><a href="#cluster-status
">5.1 Displaying Cluster and Service
604 <td WIDTH="2%
"> </td>
606 <td COLSPAN="5"><a href="#cluster-start
">5.2 Starting and Stopping the
607 Cluster Software</a></td>
611 <td WIDTH="2%
"> </td>
613 <td COLSPAN="5"><a href="#cluster-config
">5.3 Modifying the Cluster Configuration</a></td>
617 <td WIDTH="2%
"> </td>
619 <td COLSPAN="5"><a href="#cluster-backup
">5.4 Backing Up and Restoring
620 the Cluster Database</a></td>
624 <td WIDTH="2%
"> </td>
626 <td COLSPAN="5"><a href="#cluster-logging
">5.5 Modifying Cluster Event
627 Logging </a></td>
631 <td WIDTH="2%
"> </td>
633 <td COLSPAN="5"><a href="#cluster-reinstall
">5.6 Updating the Cluster Software</a></td>
637 <td WIDTH="2%
"> </td>
639 <td COLSPAN="5"><a href="#cluster-reload
">5.7 Reloading the Cluster Database</a></td>
643 <td WIDTH="2%
"> </td>
645 <td COLSPAN="5"><a href="#cluster-name
">5.8 Changing the Cluster Name</a></td>
649 <td WIDTH="2%
"> </td>
651 <td COLSPAN="5"><a href="#cluster-init
">5.9 Reinitializing the Cluster</a></td>
655 <td WIDTH="2%
"> </td>
657 <td COLSPAN="5"><a href="#cluster-remove
">5.10 Removing a Cluster Member</a></td>
661 <td WIDTH="2%
"> </td>
663 <td COLSPAN="5"><a href="#diagnose
">5.11 Diagnosing and Correcting Problems
664 in a Cluster</a></td>
668 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37"><font size=+1><a href="#supplement
">A
669 Supplementary Hardware Information</a></font></td>
673 <td WIDTH="2%
" HEIGHT="20"> </td>
675 <td COLSPAN="5" HEIGHT="20"><a href="#cyclades
">A.1 Setting Up a Cyclades
676 Terminal Server</a></td>
680 <td WIDTH="2%
"> </td>
682 <td WIDTH="2%
"> </td>
684 <td COLSPAN="4"><a href="#hardware-router
">A.1.1 Setting Up the Router
689 <td WIDTH="2%
"> </td>
691 <td WIDTH="2%
"> </td>
693 <td COLSPAN="4"><a href="#hardware-parameters
">A.1.2 Setting Up the Network
694 and Terminal Port Parameters</a></td>
698 <td WIDTH="2%
"> </td>
700 <td WIDTH="2%
"> </td>
702 <td COLSPAN="4"><a href="#console-linux
">A.1.3 Configuring Linux to Send
703 Console Messages to the Console Port</a></td>
707 <td WIDTH="2%
"> </td>
709 <td WIDTH="2%
"> </td>
711 <td COLSPAN="4"></td>
715 <td WIDTH="2%
"> </td>
717 <td COLSPAN="5"><a href="#rps-
10">A.2 Setting Up an RPS-10 Power Switch</a></td>
721 <td WIDTH="2%
"> </td>
723 <td COLSPAN="5"><a href="#scsi-reqs
">A.3 SCSI Bus Configuration Requirements</a></td>
727 <td WIDTH="2%
"> </td>
729 <td WIDTH="2%
"> </td>
731 <td COLSPAN="4"><a href="#scsi-term
">A.3.1 SCSI Bus Termination </a></td>
735 <td WIDTH="2%
"> </td>
737 <td WIDTH="2%
"> </td>
739 <td COLSPAN="4"><a href="#scsi-length
">A.3.2 SCSI Bus Length</a></td>
743 <td WIDTH="2%
"> </td>
745 <td WIDTH="2%
"> </td>
747 <td COLSPAN="4"><a href="#scsi-ids
">A.3.3 SCSI Identification Numbers </a></td>
751 <td WIDTH="2%
" HEIGHT="22"> </td>
753 <td COLSPAN="5" HEIGHT="22"><a href="#hba
">A.4 Host Bus Adapter Features
754 and Configuration Requirements </a></td>
758 <td WIDTH="2%
" HEIGHT="22"> </td>
760 <td COLSPAN="5" HEIGHT="22"><a href="#adaptec
">A.5 Adaptec Host Bus Adapter
765 <td WIDTH="2%
"> </td>
767 <td COLSPAN="5"><a href="#vscom
">A.6 VScom Multiport Serial Card Requirement</a></td>
771 <td WIDTH="2%
"> </td>
773 <td COLSPAN="5"><a href="#tulip
">A.7 Tulip Network Driver Requirement</a></td>
777 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37"><font size=+1><a href="#supp-software
">B
778 Supplementary Software Information </a></font></td>
782 <td WIDTH="2%
"> </td>
784 <td COLSPAN="5"><a href="#cluster-com
">B.1 Cluster Communication Mechanisms</a></td>
788 <td WIDTH="2%
"> </td>
790 <td COLSPAN="5"><a href="#cluster-daemons
">B.2 Cluster Daemons</a></td>
794 <td WIDTH="2%
"> </td>
796 <td COLSPAN="5"><a href="#admin-scenarios
">B.3 Failover and Recovery Scenarios</a></td>
800 <td WIDTH="2%
"> </td>
802 <td WIDTH="2%
"> </td>
804 <td COLSPAN="4"><a href="#admin-failure
">B.3.1 System Hang</a></td>
808 <td WIDTH="2%
"> </td>
810 <td WIDTH="2%
"> </td>
812 <td COLSPAN="4"><a href="#admin-panic
">B.3.2 System Panic</a></td>
816 <td WIDTH="2%
"> </td>
818 <td WIDTH="2%
"> </td>
820 <td COLSPAN="4"><a href="#admin-storage
">B.3.3 Inaccessible Quorum Partitions</a></td>
824 <td WIDTH="2%
"> </td>
826 <td WIDTH="2%
"> </td>
828 <td COLSPAN="4"><a href="#admin-network
">B.3.4 Total Network Connection
833 <td WIDTH="2%
"> </td>
835 <td WIDTH="2%
"> </td>
837 <td COLSPAN="4"><a href="#admin-power
">B.3.5 Remote Power Switch Connection
842 <td WIDTH="2%
"> </td>
844 <td WIDTH="2%
"> </td>
846 <td COLSPAN="4"><a href="#admin-quorum
">B.3.6 Quorum Daemon Failure</a></td>
850 <td WIDTH="2%
"> </td>
852 <td WIDTH="2%
"> </td>
854 <td COLSPAN="4"><a href="#admin-heartbeat
">B.3.7 Heartbeat Daemon Failure</a></td>
858 <td WIDTH="2%
"> </td>
860 <td WIDTH="2%
"> </td>
862 <td COLSPAN="4"><a href="#admin-powerd
">B.3.8 Power Daemon Failure</a></td>
866 <td WIDTH="2%
"> </td>
868 <td WIDTH="2%
"> </td>
870 <td COLSPAN="4"><a href="#admin-serviceman
">B.3.9 Service Manager Daemon
875 <td WIDTH="2%
"> </td>
877 <td COLSPAN="5" WIDTH="2%
"><a href="#software-manual
">B.4 Cluster Database
882 <td WIDTH="2%
"> </td>
884 <td COLSPAN="5"><a href="#app-tuning
">B.5 Tuning Oracle Services</a></td>
888 <td WIDTH="2%
"> </td>
890 <td COLSPAN="5"><a href="#raw-program
">B.6 Raw I/O Programming Example</a></td>
894 <td WIDTH="2%
" HEIGHT="26"> </td>
896 <td COLSPAN="5"><a href="#lvs
">B.7 Using a Cluster in an LVS Environment</a></td>
901 <hr noshade width="80%
" align="center
">
902 <p>Copyright © 2000 Mission Critical Linux, Inc.
903 <br>Copyright © 2002 Red Hat, Inc.
905 <p>Permission is granted to copy, distribute and/or modify this document
906 under the terms of the GNU Free Documentation License, Version 1.1 or any
907 later version published by the Free Software Foundation. A copy of the
908 license is included on the <a href="http://www.gnu.org/copyleft/fdl.html#SEC1
" target="_blank
">GNU
909 Free Documentation License Web site</a>.
910 <p>Linux is a trademark of Linus Torvalds
911 <p>All product names mentioned herein are the trademarks of their respective
916 The Red Hat Cluster Manager software was originally based on the open source
917 Kimberlite <a href="http://oss.missioncriticallinux.com/kimberlite
">oss.missioncriticallinux.com/kimberlite</a>
918 cluster project which was developed by Mission Critical Linux, Inc.
919 <p>Subsequent to its inception based on Kimberlite, developers at Red Hat
920 have made a large number of enhancements and modifications. The following
921 is a non-comprehensive list highlighting some of these enhancements.
924 Packaging and integration into the Red Hat installation paradigm in order
925 to simplify the end user's experience.</li>
928 Addition of support for high availability NFS services.</li>
931 Addition of support for high availability Samba services.</li>
934 Addition of service monitoring, which will automatically restart a failed
938 Rewrite of the service manager to facilitate additional cluster-wide operations.</li>
941 A set of miscellaneous bug fixes.</li>
943 The Red Hat Cluster Manager software incorporates STONITH compliant power
944 switch modules from the Linux-HA project <a href="http://www.linux-ha.org/stonith
">www.linux-ha.org/stonith</a>
946 <br><a NAME="introduction
"></a>
949 The Red Hat Cluster Manager technology provides data integrity and the
950 ability to maintain application availability in the event of a failure.
951 Using redundant hardware, shared disk storage, power management, and robust
952 cluster communication and application failover mechanisms, a cluster can
953 meet the needs of the enterprise market.
954 <p>Especially suitable for database applications, network file servers,
955 and World Wide Web (Web) servers with dynamic content, a cluster can also
956 be used in conjunction with other Linux availability efforts, such as Linux
957 Virtual Server (LVS), to deploy a highly available e-commerce site that
958 has complete data integrity and application availability, in addition to
959 load balancing capabilities. See <a href="#lvs
">Using a Cluster in an LVS
960 Environment</a> for more information.
961 <p><i>Editorial note: need to better integrate with the Piranha load balancing
962 documentation (rather then referring to LVS).</i>
963 <p>The following sections describe:
966 <a href="#overview
">Cluster overview</a></li>
969 <a href="#features
">Cluster features</a></li>
972 <a href="#steps
">How to use this manual</a></li>
977 <a NAME="overview
"></a></h2>
980 1.1 Cluster Overview</h2>
981 To set up a cluster, you connect the <b>cluster systems</b> (often referred
982 to as <b>member systems</b>) to the cluster hardware, and configure the
983 systems into the cluster environment. The foundation of a cluster is an
984 advanced host membership algorithm. This algorithm ensures that the cluster
985 maintains complete data integrity at all times by using the following methods
986 of inter-node communication:
989 Quorum disk partitions on shared disk storage to hold system status</li>
993 Ethernet and serial connections between the cluster systems for heartbeat
996 To make an application and data highly available in a cluster, you configure
997 a <b>cluster</b> <b>service</b>, which is a discrete group of service properties
998 and resources, such as an application and shared disk storage. A service
999 can be assigned an IP address to provide transparent client access to the
1000 service. For example, you can set up a cluster service that provides clients
1001 with access to highly-available database application data.
1002 <p>Both cluster systems can run any service and access the service data
1003 on shared disk storage. However, each service can run on only one cluster
1004 system at a time, in order to maintain data integrity. You can set up an
1006 configuration</b> in which both cluster systems run different services,
1007 or a <b>hot-standby configuration</b> in which a primary cluster system
1008 runs all the services, and a backup cluster system takes over only if the
1009 primary system fails.
1010 <p>The following figure shows a cluster in an active-active configuration.
1011 <p><img SRC="cluster.gif
" >
1012 <p>If a hardware or software failure occurs, the cluster will automatically
1013 restart the failed system's services on the functional cluster system.
1014 This <b>service failover </b>capability ensures that no data is lost, and
1015 there is little disruption to users. When the failed system recovers, the
1016 cluster can re-balance the services across the two systems.
1017 <p>In addition, a cluster administrator can cleanly stop the services running
1018 on a cluster system, and then restart them on the other system. This <b>service
1019 relocation</b> capability enables you to maintain application and data
1020 availability when a cluster system requires maintenance.
1023 <a NAME="features
"></a></h2>
1026 1.2 Cluster Features</h2>
1027 A cluster includes the following features:
1030 <b>No-single-point-of-failure hardware configuration</b></li>
1036 <p>You can set up a cluster that includes a dual-controller RAID array,
1037 multiple network and serial communication channels, and redundant uninterruptible
1038 power supply (UPS) systems to ensure that no single failure results in
1039 application down time or loss of data.
1040 <p>Alternately, you can set up a low-cost cluster that provides less availability
1041 than a no-single-point-of-failure cluster. For example, you can set up
1042 a cluster with JBOD ("just a bunch of disks
") storage and only a single
1044 <p>Note that you cannot use host-based, adapter-based, or software RAID
1045 in a cluster, because these products usually do not properly coordinate
1046 multisystem access to shared storage.
1049 <b>Service configuration framework</b></li>
1055 <p>A cluster enables you to easily configure individual servicesto make
1056 data and applications highly available. To create a service, you specify
1057 the resources used in the service and properties for the service, including
1058 the service name, application start and stop script, disk partitions, mount
1059 points, and the cluster system on which you prefer to run the service.
1060 After you add a service, the cluster enters the information into the cluster
1061 database on shared storage, where it can be accessed by both cluster systems.
1062 <p>The cluster provides an easy-to-use framework for database applications.
1063 For example, a <b>database service</b> serves highly-available data to
1064 a database application. The application running on a cluster system provides
1065 network access to database client systems, such as Web servers. If the
1066 service fails over to another cluster system, the application can still
1067 access the shared database data. A network-accessible database service
1068 is usually assigned an IP address, which is failed over along with the
1069 service to maintain transparent access for clients.
1070 <p>The cluster service framework can be easily extended to other applications,
1071 such as mail and print applications.
1073 <b>Data integrity assurance</b></li>
1079 <p>To ensure data integrity, only one cluster system can run a service
1080 and access service data at one time. Using power switches in the cluster
1081 configuration enable each cluster system to power-cycle the other cluster
1082 system before restarting its services during the failover process. This
1083 prevents the two systems from simultaneously accessing the same data and
1084 corrupting it. Although not required, it is recommended that you use power
1085 switches to guarantee data integrity under all failure conditions.
1088 <b>Cluster administration user interface</b></li>
1094 <p>A user interface simplifies cluster administration and enables you to
1095 easily create, start, and stop services, and monitor the cluster.
1098 <b>Multiple cluster communication methods</b></li>
1104 <p>To monitor the health of the other cluster system, each cluster system
1105 monitors the health of the remote power switch, if any, and issues heartbeat
1106 pings over network and serial channels to monitor the health of the other
1107 cluster system. In addition, each cluster system periodically writes a
1108 timestamp and cluster state information to two <b>quorum partitions</b>
1109 located on shared disk storage. System state information includes whether
1110 the system is an active cluster member. Service state information includes
1111 whether the service is running and which cluster system is running the
1112 service. Each cluster system checks to ensure that the other system's status
1114 <p>To ensure correct cluster operation, if a system is unable to write
1115 to both quorum partitions at startup time, it will not be allowed to join
1116 the cluster. In addition, if a cluster system is not updating its timestamp,
1117 and if heartbeats to the system fail, the cluster system will be removed
1119 <p>The following figure shows how systems communicate in a cluster configuration.
1121 the terminal server used to access system consoles via serial ports is
1122 not a required cluster component.</i>
1124 Cluster Communication Mechanisms</h4>
1125 <img SRC="comm.gif
" >
1129 <b>Service failover capability</b></li>
1135 <p>If a hardware or software failure occurs, the cluster will take the
1136 appropriate action to maintain application availability and data integrity.
1137 For example, if a cluster system completely fails, the other cluster system
1138 will restart its services. Services already running on this system are
1140 <p>When the failed system reboots and is able to write to the quorum partitions,
1141 it can rejoin the cluster and run services. Depending on how you configured
1142 the services, the cluster can re-balance the services across the two cluster
1146 <b>Manual service relocation capability</b></li>
1152 <p>In addition to automatic service failover, a cluster enables administrators
1153 to cleanly stop services on one cluster system and restart them on the
1154 other system. This enables administrators to perform planned maintenance
1155 on a cluster system, while providing application and data availability.
1158 <b>Event logging facility</b></li>
1164 <p>To ensure that problems are detected and resolved before they affect
1165 service availability, the cluster daemons log messages by using the conventional
1166 Linux syslog subsystem. You can customize the severity level of the messages
1167 that are logged.</ul>
1168 <a NAME="steps
"></a>
1170 1.3 How To Use This Manual</h2>
1171 <i>Editorial comment: perhaps a section with a title like this should appear
1172 earlier in the manual?</i>
1173 <p>This manual contains information about setting up the cluster hardware,
1174 and installing the Linux distribution and the cluster software. These tasks
1175 are described in <a href="#hardware
">Hardware Installation and Operating
1176 System Configuration</a> and <a href="#software
">Cluster Software Installation
1177 and Initialization</a>.
1178 <p>For information about setting up and managing cluster services, see
1179 <a href="#service
">Service
1180 Configuration and Administration</a>. For information about managing a
1181 cluster, see <a href="#admin
">Cluster Administration</a>.
1182 <p><a href="#supplement
">Supplementary Hardware Information</a> contains
1183 detailed configuration information for specific hardware devices, in addition
1184 to information about shared storage configurations. You should always check
1185 for information that is applicable to your hardware.
1186 <p><a href="#supp-software
">Supplementary Software Information</a> contains
1187 background information on the cluster software and other related information.
1189 <hr noshade width="80%
">
1190 <h3 CLASS="ChapterTitleTOC
">
1191 <a NAME="hardware
"></a></h3>
1193 <h1 CLASS="ChapterTitleTOC
">
1194 2 Hardware Installation and Operating System Configuration</h1>
1195 To set up the hardware configuration and install the Linux distribution,
1199 <a href="#gather
">Choose a cluster hardware configuration that meets the
1200 needs of your applications and users.</a></li>
1204 <a href="#basic-install
">Set up and connect the cluster systems and the
1205 optional console switch and network switch or hub.</a></li>
1209 <a href="#install-linux
">Install and configure the Linux distribution on
1210 the cluster systems.</a></li>
1214 <a href="#install-cluster
">Set up the remaining cluster hardware components
1215 and connect them to the cluster systems.</a></li>
1218 <div CLASS="ChapterTitleTOC
">After setting up the hardware configuration
1219 and installing the Linux distribution, you can install the cluster software.</div>
1222 <h2 CLASS="ChapterTitleTOC
">
1223 <a NAME="gather
"></a></h2>
1225 <h2 CLASS="ChapterTitleTOC
">
1226 2.1 Choosing a Hardware Configuration</h2>
1227 The Red Hat Cluster Manager allows you to use commodity hardware to set
1228 up a cluster configuration that will meet the performance, availability,
1229 and data integrity needs of your applications and users. Cluster hardware
1230 ranges from low-cost minimum configurations that include only the components
1231 required for cluster operation, to high-end configurations that include
1232 redundant heartbeat channels, hardware RAID, and power switches.
1233 <p>Regardless of your configuration, you should always use high-quality
1234 hardware in a cluster, because hardware malfunction is the primary cause
1235 of system down time.
1236 <p>Although all cluster configurations provide availability, some configurations
1237 protect against every single point of failure. In addition, all cluster
1238 configurations provide data integrity, but some configurations protect
1239 data under every failure condition. Therefore, you must fully understand
1240 the needs of your computing environment and also the availability and data
1241 integrity features of different hardware configurations, in order to choose
1242 the cluster hardware that will meet your requirements.
1243 <p>When choosing a cluster hardware configuration, consider the following:
1246 Performance requirements of your applications and users</li>
1252 <p>Choose a hardware configuration that will provide adequate memory, CPU,
1253 and I/O resources. You should also be sure that the configuration can handle
1254 any future increases in workload.
1257 Cost restrictions</li>
1263 <p>The hardware configuration you choose must meet your budget requirements.
1264 For example, systems with multiple I/O ports usually cost more than low-end
1265 systems with less expansion capabilities.
1268 Availability requirements</li>
1270 <br>If you have a computing environment that requires the highest availability,
1271 such as a production environment, you can set up a cluster hardware configuration
1272 that protects against all single points of failure, including disk, storage
1273 interconnect, heartbeat channel, and power failures. Environments that
1274 can tolerate an interruption in availability, such as development environments,
1275 may not require as much protection. See <a href="#hardware-heart
">Configuring
1276 Heartbeat Channels</a>, <a href="#hardware-ups
">Configuring UPS Systems</a>,
1277 and <a href="#hardware-storage
">Configuring Shared Disk Storage</a> for
1278 more information about using redundant hardware for high availability.
1281 Data integrity under all failure conditions requirement</li>
1287 <p>Using power switches in a cluster configuration guarantees that service
1288 data is protected under every failure condition. These devices enable a
1289 cluster system to power cycle the other cluster system before restarting
1290 its services during failover. Power switches protect against data corruption
1291 if an unresponsive ("hung
") system becomes responsive ("unhung
") after
1292 its services have failed over, and then issues I/O to a disk that is also
1293 receiving I/O from the other cluster system.
1294 <p>In addition, if a quorum daemon fails on a cluster system, the system
1295 is no longer able to monitor the quorum partitions. If you are not using
1296 power switches in the cluster, this error condition may result in services
1297 being run on more than one cluster system, which can cause data corruption.
1298 See <a href="#hardware-power
">Configuring Power Switches</a> for more information
1299 about the benefits of using power switches in a cluster. It is recommended
1300 that production environments use power switches in the cluster configuration.</ul>
1301 A <b>minimum hardware configuration</b> includes only the hardware components
1302 that are required for cluster operation, as follows:
1305 <b>Two servers</b> to run cluster services</li>
1308 <b>Ethernet connection</b> for a heartbeat channel and client network access</li>
1311 <b>Shared disk storage</b> for the cluster quorum partitions and service
1314 See <a href="#install-min
">Example of a Minimum Cluster Configuration</a>
1315 for an example of this type of hardware configuration.
1316 <p>The minimum hardware configuration is the most cost-effective cluster
1317 configuration; however, it includes multiple points of failure. For example,
1318 if a shared disk fails, any cluster service that uses the disk will be
1319 unavailable. In addition, the minimum configuration does not include power
1320 switches, which protect against data corruption under all failure conditions.
1321 Therefore, only development environments should use a minimum cluster configuration.
1322 <p>To improve availability and protect against component failure, and to
1323 guarantee data integrity under all failure conditions, you can expand the
1324 minimum configuration. The following table shows how you can improve availability
1325 and guarantee data integrity:
1327 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="94%
" >
1328 <tr ALIGN=LEFT VALIGN=TOP>
1329 <td WIDTH="30%
"><b>To protect against:</b></td>
1331 <td WIDTH="70%
"><b>You can use:</b></td>
1334 <tr ALIGN=LEFT VALIGN=TOP>
1335 <td WIDTH="30%
">Disk failure</td>
1337 <td WIDTH="70%
">Hardware RAID to replicate data across multiple disks.</td>
1340 <tr ALIGN=LEFT VALIGN=TOP>
1341 <td WIDTH="30%
">Storage interconnect failure</td>
1343 <td WIDTH="70%
">RAID array with multiple SCSI buses or Fibre Channel interconnects.</td>
1346 <tr ALIGN=LEFT VALIGN=TOP>
1347 <td WIDTH="30%
">RAID controller failure</td>
1349 <td WIDTH="70%
">Dual RAID controllers to provide redundant access to disk
1353 <tr ALIGN=LEFT VALIGN=TOP>
1354 <td WIDTH="30%
">Heartbeat channel failure</td>
1356 <td WIDTH="70%
">Point-to-point Ethernet or serial connection between the
1357 cluster systems.</td>
1360 <tr ALIGN=LEFT VALIGN=TOP>
1361 <td WIDTH="30%
">Power source failure</td>
1363 <td WIDTH="70%
">Redundant uninterruptible power supply (UPS) systems.</td>
1366 <tr ALIGN=LEFT VALIGN=TOP>
1367 <td WIDTH="30%
">Data corruption under all failure conditions</td>
1369 <td WIDTH="70%
">Power switches</td>
1373 <p>A <b>no-single-point-of-failure hardware configuration</b> that guarantees
1374 data integrity under all failure conditions can include the following components:
1377 <b>Two servers</b> to run cluster services</li>
1380 <b>Ethernet connection</b> between each system for a heartbeat channel
1381 and client network access</li>
1384 <b>Dual-controller RAID array</b> to replicate quorum partitions and service
1388 <b>Two power switches</b> to enable each cluster system to power-cycle
1389 the other system during the failover process</li>
1392 <b>Point-to-point Ethernet connection</b> between the cluster systems for
1393 a redundant Ethernet heartbeat channel</li>
1396 <b>Point-to-point serial connection</b> between the cluster systems for
1397 a serial heartbeat channel</li>
1400 <b>Two UPS systems</b> for a highly-available source of power</li>
1402 See <a href="#install-max
">Example of a No-Single-Point-Of-Failure Configuration</a>
1403 for an example of this type of hardware configuration.
1404 <p>Cluster hardware configurations can also include other optional hardware
1405 components that are common in a computing environment. For example, you
1406 can include a <b>network switch</b> or <b>network hub</b>, which enables
1407 you to connect the cluster systems to a network, and a <b>console switch</b>,
1408 which facilitates the management of multiple systems and eliminates the
1409 need for separate monitors, mouses, and keyboards for each cluster system.
1410 <p>One type of console switch is a <b>terminal server</b>, which enables
1411 you to connect to serial consoles and manage many systems from one remote
1412 location. As a low-cost alternative, you can use a <b>KVM</b> (keyboard,
1413 video, and mouse) switch, which enables multiple systems to share one keyboard,
1414 monitor, and mouse. A KVM is suitable for configurations in which you access
1415 a graphical user interface (GUI) to perform system management tasks.
1416 <p>When choosing a cluster system, be sure that it provides the PCI slots,
1417 network slots, and serial ports that the hardware configuration requires.
1418 For example, a no-single-point-of-failure configuration requires multiple
1419 serial and Ethernet ports. Ideally, choose cluster systems that have at
1420 least two serial ports. See <a href="#hardware-system
">Installing the Basic
1421 System Hardware</a> for more information.
1424 <p><a NAME="hardware-table
"></a>
1426 2.1.1 Cluster Hardware Table</h3>
1427 Use the following table to identify the hardware components required for
1428 your cluster configuration. In some cases, the table lists specific products
1429 that have been tested in a cluster, although a cluster is expected to work
1430 with other products.
1432 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1434 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="47">
1435 <center><b><font size=+1>Cluster System Hardware</font></b></center>
1440 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1442 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1444 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1446 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1450 <td VALIGN=TOP WIDTH="16%
">Cluster system</td>
1452 <td VALIGN=TOP WIDTH="11%
">Two</td>
1454 <td VALIGN=TOP WIDTH="61%
">Red Hat Cluster Manager supports IA-32 hardware
1455 platforms. Each cluster system must provide enough PCI slots, network slots,
1456 and serial ports for the cluster hardware configuration. Because disk devices
1457 must have the same name on each cluster system, it is recommended that
1458 the systems have symmetric I/O subsystems. In addition, it is recommended
1459 that each system have a minimum of 450 Mhz CPU speed and 256 MB of memory.
1460 See <a href="#hardware-system
">Installing the Basic System Hardware</a>
1461 for more information. </td>
1463 <td VALIGN=TOP WIDTH="12%
">Yes</td>
1467 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1469 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="45"><b><font size=+1>Power
1470 Switch Hardware</font></b></td>
1474 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1476 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1478 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1480 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1484 <td VALIGN=TOP WIDTH="16%
" HEIGHT="48">Serial power switches</td>
1486 <td VALIGN=TOP WIDTH="11%
" HEIGHT="48">Two</td>
1488 <td VALIGN=TOP WIDTH="61%
" HEIGHT="48">Power switches enable each cluster
1489 system to power-cycle the other cluster system. See <a href="#hardware-power
">Configuring
1490 Power Switches</a> for information about using power switches in a cluster.
1491 Note: clusters are configured with either serial or network attached power
1492 switches (not both).
1493 <p>The following serial attached power switch has been fully tested:
1496 RPS-10 (model M/HD in the US, and model M/EC in Europe), which is available
1497 from <a href="http://www.wti.com/rps-
10.htm
" target="_blank
">www.wti.com/rps-10.htm</a>.
1498 Refer to <a href="#rps-
10">RPS-10 Configuration Information</a> </li>
1500 Latent support is provided for the following serial attached power switch.
1501 This switch has not yet been fully tested:
1504 APC Serial On/Off Switch (partAP9211), <a href="http://www.apc.com
">www.apc.com</a> </li>
1508 <td VALIGN=TOP WIDTH="12%
" HEIGHT="48">Strongly recommended for data integrity
1509 under all failure conditions</td>
1513 <td VALIGN=TOP WIDTH="16%
" HEIGHT="51">Null modem cable</td>
1515 <td VALIGN=TOP WIDTH="11%
" HEIGHT="51">Two</td>
1517 <td VALIGN=TOP WIDTH="61%
" HEIGHT="51">Null modem cables connect a serial
1518 port on a cluster system to aserial power switch. This serial connection
1519 enables each cluster system to power-cycle the other system. Some power
1520 switches may require different cables. </td>
1522 <td VALIGN=TOP WIDTH="12%
" HEIGHT="51">Only if using power switches</td>
1526 <td VALIGN=TOP WIDTH="16%
" HEIGHT="26">Mounting bracket</td>
1528 <td VALIGN=TOP WIDTH="11%
" HEIGHT="26">One</td>
1530 <td VALIGN=TOP WIDTH="61%
" HEIGHT="26">Some power switches support rack
1531 mount configurations and require a separate mounting bracket (e.g. RPS-10). </td>
1533 <td VALIGN=TOP WIDTH="12%
" HEIGHT="26">Only for rack mounting power switches</td>
1537 <td>Network power switch</td>
1541 <td>Network attached power switches enable each cluster member to power
1542 cycle all others. Refer to <a href="#power-setup
">Configuring
1543 Power Switches</a> for information about using network attached power switches,
1544 as well as caveats associated with each.
1545 <p>The following network attached power switch has been fully tested:
1548 WTI NPS-115, or NPS-230, available from <a href="http://www.wti.com
">www.wti.com</a>
1549 . Note: the NPS power switch can properly accommodate systems with dual
1550 redundant power supplies. Refer to <a href="#power-wti-nps
">WTI
1551 NPS Configuration Information.</a></li>
1553 Latent support is provided for the following network attached power switches.
1554 These switches have not yet been fully tested:
1557 APC Master Switch (AP9211, or AP9212), <a href="http://www.apc.com/products/masterswitch/index.cfm
">www.apc.com</a> </li>
1562 Baytech RPC-3 and RPC-5, <a href="http://www.baytech.net
">www.baytech.net</a> </li>
1566 <td>Strongly recommended for data integrity under all failure conditions</td>
1570 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1572 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="53"><b><font size=+1>Shared
1573 Disk Storage Hardware</font></b></td>
1577 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1579 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1581 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1583 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1587 <td VALIGN=TOP WIDTH="16%
" HEIGHT="223">External disk storage enclosure</td>
1589 <td VALIGN=TOP WIDTH="11%
" HEIGHT="223">One</td>
1591 <td VALIGN=TOP WIDTH="61%
" HEIGHT="223">For production environments, it
1592 is recommended that you use single-initiator SCSI buses or single-initiator
1593 Fibre Channel interconnects to connect the cluster systems to a single
1594 or dual-controller RAID array. To use single-initiator buses or interconnects,
1595 a RAID controller must have multiple host ports and provide simultaneous
1596 access to all the logical units on the host ports. If a logical unit can
1597 fail over from one controller to the other, the process must be transparent
1598 to the operating system.
1599 <p>The following are recommended SCSI RAID arrays that provides simultaneous
1600 access to all the logical units on the host ports (this is not a comprehensive
1601 list; rather its limited to those RAID boxes which have been tested):
1604 Winchester Systems FlashDisk RAID Disk Array, which is available from <a href="http://www.winsys.com
" target="_blank
">www.winsys.com</a>. </li>
1607 Dot Hill's SANnet Storage Systems, which is available from <a href="http://www.dothill.com
">www.dothill.com</a> </li>
1610 CMD's CRD-7040 & CRA-7040, CRD -7220, CRD-7240 & CRA-7240, CRD-7400
1611 & CRA-7400 controller based RAID arrays. Available from
1612 <a href="http://www.synetexinc.com
">www.synetexinc.com</a> </li>
1614 Note: in order to ensure symmetry of device IDs & LUNs, many RAID arrays
1615 with dual redundant controllers are required to be configured in an active/passive
1617 <p>For development environments, you can use a multi-initiator SCSI bus
1618 or multi-initiator Fibre Channel interconnect to connect the cluster systems
1619 to a JBOD storage enclosure, a single-port RAID array, or a RAID controller
1620 that does not provide access to all the shared logical units from the ports
1621 on the storage enclosure.
1622 <p>You cannot use host-based, adapter-based, or software RAID products
1623 in a cluster, because these products usually do not properly coordinate
1624 multi-system access to shared storage.
1625 <p>See <a href="#hardware-storage
">Configuring Shared Disk Storage</a>
1626 for more information.</td>
1628 <td VALIGN=TOP WIDTH="12%
" HEIGHT="223">Yes</td>
1632 <td VALIGN=TOP WIDTH="16%
" HEIGHT="273">Host bus adapter</td>
1634 <td VALIGN=TOP WIDTH="11%
" HEIGHT="273">Two</td>
1636 <td VALIGN=TOP WIDTH="61%
" HEIGHT="273">To connect to shared disk storage,
1637 you must install either a parallel SCSI or a Fibre Channel host bus adapter
1638 in a PCI slot in each cluster system.
1639 <p>For parallel SCSI, use a low voltage differential (LVD) host bus adapter.
1640 Adapters have either HD68 or VHDCI connectors. If you want hot plugging
1641 support, you must be able to disable the host bus adapter's onboard termination.
1642 Recommended parallel SCSI host bus adapters include the following:
1645 Adaptec 2940U2W, 29160, 29160LP, 39160, and 3950U2 </li>
1648 Adaptec AIC-7896 on the Intel L440GX+ motherboard </li>
1651 Qlogic QLA1080 and QLA12160</li>
1654 Tekram Ultra2 DC-390U2W </li>
1657 LSI Logic SYM22915 </li>
1659 A recommended Fibre Channel host bus adapter is the Qlogic QLA2200.
1660 <p>For multi-initiator configurations, the Tekram Ultra2 DC-390U2W and
1661 LSI Logic SYM22915 are recommended. Some other adapters have issues
1662 precluding external termination for hot-plugging.
1663 <p>See <a href="#hba
">Host Bus Adapter Features and Configuration Requirements</a>
1664 and <a href="#adaptec
">Adaptec Host Bus Adapter Requirement</a> for device
1665 features and configuration information.</td>
1667 <td VALIGN=TOP WIDTH="12%
" HEIGHT="273">Yes</td>
1671 <td VALIGN=TOP WIDTH="16%
" HEIGHT="47">SCSI cable</td>
1673 <td VALIGN=TOP WIDTH="11%
" HEIGHT="47">Two </td>
1675 <td VALIGN=TOP WIDTH="61%
" HEIGHT="47">SCSI cables with 68 pins connect
1676 each host bus adapter to a storage enclosure port. Cables have either HD68
1677 or VHDCI connectors. </td>
1679 <td VALIGN=TOP WIDTH="12%
" HEIGHT="47">Only for parallel SCSI configurations</td>
1683 <td VALIGN=TOP WIDTH="16%
" HEIGHT="156">External SCSI LVD active terminator</td>
1685 <td VALIGN=TOP WIDTH="11%
" HEIGHT="156">Two</td>
1687 <td VALIGN=TOP WIDTH="61%
" HEIGHT="156">For hot plugging support, connect
1688 an external LVD active terminator to a host bus adapter that has disabled
1689 internal termination. This enables you to disconnect the terminator from
1690 the adapter without affecting bus operation. Terminators have either HD68
1691 or VHDCI connectors.
1692 <p>Recommended external pass-through terminators with HD68 connectors can
1693 be obtained from Technical Cable Concepts, Inc., 350 Lear Avenue, Costa
1694 Mesa, California, 92626 (714-835-1081), or <a href="http://www.techcable.com
" target="_blank
">www.techcable.com</a>.
1695 The part description and number is TERM SSM/F LVD/SE Ext Beige, 396868-LVD/SE.</td>
1697 <td VALIGN=TOP WIDTH="12%
" HEIGHT="156">Only for parallel SCSI configurations
1698 that require external termination for hot plugging</td>
1702 <td VALIGN=TOP WIDTH="16%
" HEIGHT="32">SCSI terminator</td>
1704 <td VALIGN=TOP WIDTH="11%
" HEIGHT="32">Two</td>
1706 <td VALIGN=TOP WIDTH="61%
" HEIGHT="32">For a RAID storage enclosure that
1707 uses "out
" ports (such as FlashDisk RAID Disk Array) and is connected to
1708 single-initiator SCSI buses, connect terminators to the "out
" ports in
1709 order to terminate the buses. </td>
1711 <td VALIGN=TOP WIDTH="12%
" HEIGHT="32">Only for parallel SCSI configurations
1712 and only if necessary for termination</td>
1716 <td VALIGN=TOP WIDTH="16%
" HEIGHT="32">Fibre Channel hub or switch</td>
1718 <td VALIGN=TOP WIDTH="11%
" HEIGHT="32">One or two</td>
1720 <td VALIGN=TOP WIDTH="61%
" HEIGHT="32">A Fibre Channel hub or switch is
1721 required, unless you have a storage enclosure with two ports, and the host
1722 bus adapters in the cluster systems can be connected directly to different
1725 <td VALIGN=TOP WIDTH="12%
" HEIGHT="32">Only for some Fibre Channel configurations</td>
1729 <td VALIGN=TOP WIDTH="16%
" HEIGHT="32">Fibre Channel cable</td>
1731 <td VALIGN=TOP WIDTH="11%
" HEIGHT="32">Two to six</td>
1733 <td VALIGN=TOP WIDTH="61%
" HEIGHT="32">A Fibre Channel cable connects a
1734 host bus adapter to a storage enclosure port, a Fibre Channel hub, or a
1735 Fibre Channel switch. If a hub or switch is used, additional cables are
1736 needed to connect the hub or switch to the storage adapter ports. </td>
1738 <td VALIGN=TOP WIDTH="12%
" HEIGHT="32">Only for Fibre Channel configurations</td>
1742 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1743 <tr ALIGN=CENTER VALIGN=CENTER>
1744 <td COLSPAN="4" HEIGHT="50"><b><font size=+1>Network Hardware</font></b></td>
1748 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1750 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1752 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1754 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1758 <td VALIGN=TOP WIDTH="16%
">Network interface</td>
1760 <td VALIGN=TOP WIDTH="11%
">One for each network connection</td>
1762 <td VALIGN=TOP WIDTH="61%
">Each network connection requires a network interface
1763 installed in a cluster system. </td>
1765 <td VALIGN=TOP WIDTH="12%
">Yes</td>
1769 <td VALIGN=TOP WIDTH="16%
">Network switch or hub </td>
1771 <td VALIGN=TOP WIDTH="11%
">One</td>
1773 <td VALIGN=TOP WIDTH="61%
">A network switch or hub enables you to connect
1774 multiple systems to a network.</td>
1776 <td VALIGN=TOP WIDTH="12%
">No</td>
1780 <td VALIGN=TOP WIDTH="16%
" HEIGHT="49">Network cable</td>
1782 <td VALIGN=TOP WIDTH="11%
" HEIGHT="49">One for each network interface </td>
1784 <td VALIGN=TOP WIDTH="61%
" HEIGHT="49">A conventional network cable, such
1785 as a cable with an RJ45 connector, connects each network interface to a
1786 network switch or a network hub.</td>
1788 <td VALIGN=TOP WIDTH="12%
" HEIGHT="49">Yes</td>
1792 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1793 <tr ALIGN=CENTER VALIGN=CENTER>
1794 <td COLSPAN="4" HEIGHT="51"><b><font size=+1>Point-To-Point Ethernet Heartbeat
1795 Channel Hardware </font></b></td>
1799 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1801 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1803 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1805 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1809 <td VALIGN=TOP WIDTH="16%
" HEIGHT="51">Network interface</td>
1811 <td VALIGN=TOP WIDTH="11%
" HEIGHT="51">Two for each channel </td>
1813 <td VALIGN=TOP WIDTH="61%
" HEIGHT="51">Each Ethernet heartbeat channel
1814 requires a network interface installed in both cluster systems.</td>
1816 <td VALIGN=TOP WIDTH="12%
" HEIGHT="51">No</td>
1820 <td VALIGN=TOP WIDTH="16%
" HEIGHT="71">Network crossover cable</td>
1822 <td VALIGN=TOP WIDTH="11%
" HEIGHT="71">One for each channel</td>
1824 <td VALIGN=TOP WIDTH="61%
" HEIGHT="71">A network crossover cable connects
1825 a network interface on one cluster system to a network interface on the
1826 other cluster system, creating an Ethernet heartbeat channel.</td>
1828 <td VALIGN=TOP WIDTH="12%
" HEIGHT="71">Only for a redundant Ethernet heartbeat
1833 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1835 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="54"><b><font size=+1>Point-To-Point
1836 Serial Heartbeat Channel Hardware </font></b></td>
1840 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1842 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1844 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1846 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1850 <td VALIGN=TOP WIDTH="16%
">Serial card</td>
1852 <td VALIGN=TOP WIDTH="11%
">Two for each serial channel </td>
1854 <td VALIGN=TOP WIDTH="61%
">Each serial heartbeat channel requires a serial
1855 port on both cluster systems. To expand your serial port capacity, you
1856 can use multi-port serial PCI cards. Recommended multi-port cards include
1857 the following:
1860 Vision Systems VScom 200H PCI card, which provides you with two serial
1861 ports and is available from <a href="http://www.vscom.de
" target="_blank
">www.vscom.de</a> </li>
1865 Cyclades-4YoPCI+ card, which provides you with four serial ports and is
1866 available from <a href="http://www.cyclades.com
" target="_blank
">www.cyclades.com</a> </li>
1868 Note: since configuration of serial heartbeat channels is optional, it
1869 is not required that you invest in additional hardware specifically for
1870 this purpose. Should future support be provided for more than 2 cluster
1871 members, serial heartbeat channel support may be deprecated.</td>
1873 <td VALIGN=TOP WIDTH="12%
">No</td>
1877 <td VALIGN=TOP WIDTH="16%
">Null modem cable</td>
1879 <td VALIGN=TOP WIDTH="11%
">One for each channel</td>
1881 <td VALIGN=TOP WIDTH="61%
">A null modem cable connects a serial port on
1882 one cluster system to a corresponding serial port on the other cluster
1883 system, creating a serial heartbeat channel.</td>
1885 <td VALIGN=TOP WIDTH="12%
">Only for serial heartbeat channel</td>
1889 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1891 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="56"><b><font size=+1>Console
1892 Switch Hardware </font></b></td>
1896 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1898 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1900 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1902 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1906 <td VALIGN=TOP WIDTH="16%
">Terminal server</td>
1908 <td VALIGN=TOP WIDTH="11%
">One</td>
1910 <td VALIGN=TOP WIDTH="61%
">A terminal server enables you to manage many
1911 systems from one remote location. Recommended terminal servers include
1915 Cyclades terminal server, which is available from <a href="http://www.cyclades.com
" target="_blank
">www.cyclades.com</a></li>
1919 NetReach Model CMS-16, which is available from Western Telematic, Inc.
1920 at <a href="http://www.wti.com/cms.htm
" target="_blank
">www.wti.com/cms.htm</a></li>
1924 <td VALIGN=TOP WIDTH="12%
">No</td>
1928 <td VALIGN=TOP WIDTH="16%
" HEIGHT="43">RJ45 to DB9 crossover cable</td>
1930 <td VALIGN=TOP WIDTH="11%
" HEIGHT="43">Two</td>
1932 <td VALIGN=TOP WIDTH="61%
" HEIGHT="43">RJ45 to DB9 crossover cables connect
1933 a serial port on each cluster system to a Cyclades terminal server. Other
1934 types of terminal servers may require different cables. </td>
1936 <td VALIGN=TOP WIDTH="12%
" HEIGHT="43">Only for terminal server</td>
1940 <td VALIGN=TOP WIDTH="16%
">Network cable</td>
1942 <td VALIGN=TOP WIDTH="11%
">One</td>
1944 <td VALIGN=TOP WIDTH="61%
">A network cable connects a terminal server to
1945 a network switch or hub.</td>
1947 <td VALIGN=TOP WIDTH="12%
">Only for terminal server</td>
1951 <td VALIGN=TOP WIDTH="16%
">KVM</td>
1953 <td VALIGN=TOP WIDTH="11%
">One</td>
1955 <td VALIGN=TOP WIDTH="61%
">A KVM enables multiple systems to share one
1956 keyboard, monitor, and mouse. A recommended KVM is the Cybex Switchview,
1957 which is available from <a href="http://www.cybex.com
" target="_blank
">www.cybex.com</a>.
1958 Cables for connecting systems to the switch depend on the type of KVM.</td>
1960 <td VALIGN=TOP WIDTH="12%
">No</td>
1964 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
1966 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="55"><b><font size=+1>UPS
1967 System Hardware</font></b></td>
1971 <td WIDTH="16%
" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1973 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%
" HEIGHT="25"><b>Quantity</b></td>
1975 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%
" HEIGHT="25"><b>Description</b></td>
1977 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%
" HEIGHT="25"><b>Required</b></td>
1981 <td VALIGN=TOP WIDTH="16%
" HEIGHT="150">UPS system</td>
1983 <td VALIGN=TOP WIDTH="11%
" HEIGHT="150">One or two</td>
1985 <td VALIGN=TOP WIDTH="61%
" HEIGHT="150">Uninterruptible power supply (UPS)
1986 systems protect against downtime if a power outage occurs. UPS systems
1987 are highly recommended for cluster operation. Ideally, connect the power
1988 cables for the shared storage enclosure and both power switches to redundant
1989 UPS systems. In addition, a UPS system must be able to provide voltage
1990 for an adequate period of time, and should be connected to its own power
1992 <p>A recommended UPS system is the APC Smart-UPS 1400 Rackmount, which
1993 is available from <a href="http://www.apcc.com/products/smart-ups_rm/index.cfm
">www.apc.com</a>. </td>
1995 <td VALIGN=TOP WIDTH="12%
" HEIGHT="150">Strongly recommended for availability</td>
2000 <p><a NAME="install-min
"></a>
2002 2.1.2 Example of a Minimum Cluster Configuration</h3>
2003 The hardware components described in the following table can be used to
2004 set up a minimum cluster configuration that uses a multi-initiator SCSI
2005 bus and supports hot plugging. This configuration does not guarantee data
2006 integrity under all failure conditions, because it does not include power
2007 switches. Note that this is a sample configuration; you may be able to
2008 set up a minimum configuration using other hardware.
2010 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2011 <tr BGCOLOR="#FFFFFF
">
2012 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="2" HEIGHT="45"><b><font size=+1>Minimum
2013 Cluster Hardware Configuration Example</font></b></td>
2016 <tr ALIGN=LEFT VALIGN=TOP>
2017 <td WIDTH="22%
" HEIGHT="124"><b>Two servers</b></td>
2019 <td WIDTH="78%
" HEIGHT="124">Each cluster system includes the following
2023 Network interface for client access and an Ethernet heartbeat channel </li>
2026 One Adaptec 2940U2W SCSI adapter (termination disabled) for the shared
2027 storage connection </li>
2033 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2034 <tr ALIGN=LEFT VALIGN=TOP>
2035 <td WIDTH="22%
" HEIGHT="46"><b>Two network cables with RJ45 connectors</b></td>
2037 <td WIDTH="78%
" HEIGHT="46">Network cables connect a network interface
2038 on each cluster system to the network for client access and Ethernet heartbeats. </td>
2042 <table width="95%
" border="1" cellspacing="0" cellpadding="3">
2043 <tr align="left
" valign="top
">
2045 <td width="22%
" height="28"><b>Two RPS-10 power switches</b></td>
2047 <td width="78%
" height="28">
2048 <p>Power switches enable each cluster system to power-cycle the other system before restarting its
2050 The power cable for each cluster system is connected to a power switch.
2056 <table width="95%
" border="1" cellspacing="0" cellpadding="3">
2057 <tr align="left
" valign="top
">
2059 <td width="22%
" height="46"><b>Two null modem cables</b></td>
2061 <td width="78%
" height="46">Null modem cables connect
2062 a serial port on each cluster system to the power switch that provides power
2063 to the other cluster system. This connection enables each cluster system
2064 to power-cycle the other system.</td>
2069 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2071 <td ALIGN=LEFT VALIGN=TOP WIDTH="22%
" HEIGHT="26"><b>JBOD storage enclosure</b></td>
2073 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%
" HEIGHT="26">The storage enclosure's
2074 internal termination is disabled. </td>
2078 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2079 <tr ALIGN=LEFT VALIGN=TOP>
2080 <td WIDTH="22%
" HEIGHT="50"><b>Two pass-through LVD active terminators</b></td>
2082 <td WIDTH="78%
" HEIGHT="50">External pass-through LVD active terminators
2083 connected to each host bus adapter provide external SCSI bus termination
2084 for hot plugging support.</td>
2088 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2089 <tr ALIGN=LEFT VALIGN=TOP>
2090 <td WIDTH="22%
" HEIGHT="31"><b>Two HD68 SCSI cables</b></td>
2092 <td WIDTH="78%
" HEIGHT="31">HD68 cables connect each terminator to a port
2093 on the storage enclosure, creating a multi-initiator SCSI bus. </td>
2097 <p>The following figure shows a minimum cluster hardware configuration
2098 that includes the hardware described in the previous table and a multi-initiator
2099 SCSI bus, and also supports hot plugging. A "T
" enclosed by a circle indicates
2100 internal (onboard) or external SCSI bus termination. A slash through the
2101 "T
" indicates that termination has been disabled.
2102 <h4 class="ChapterTitleTOC
">
2103 Minimum Cluster Hardware Configuration With Hot Plugging</h4>
2104 <img SRC="lowcost.gif
" >
2107 <p><a NAME="install-max
"></a>
2109 2.1.3 Example of a No-Single-Point-Of-Failure Configuration</h3>
2110 The components described in the following table can be used to set up a
2111 no-single-point-of-failure cluster configuration that includes two single-initiator
2112 SCSI buses and power switches to guarantee data integrity under all failure
2113 conditions. Note that this is a sample configuration; you may be able to
2114 set up a no-single-point-of-failure configuration using other hardware.
2115 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2116 <tr BGCOLOR="#FFFFFF
">
2117 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="2" HEIGHT="45">
2119 <b>No-Single-Point-Of-Failure Configuration Example</b></h3>
2123 <tr ALIGN=LEFT VALIGN=TOP>
2124 <td WIDTH="22%
" HEIGHT="234"><b>Two servers</b></td>
2126 <td WIDTH="78%
" HEIGHT="234">Each cluster system includes the following
2130 Two network interfaces for:</li>
2134 Point-to-point Ethernet heartbeat channel</li>
2137 Client network access and Ethernet heartbeat connection</li>
2141 Three serial ports for:</li>
2145 Point-to-point serial heartbeat channel</li>
2148 Remote power switch connection</li>
2151 Connection to the terminal server </li>
2155 One Tekram Ultra2 DC-390U2W adapter (termination enabled) for the shared
2156 disk storage connection</li>
2162 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2163 <tr ALIGN=LEFT VALIGN=TOP>
2164 <td WIDTH="22%
"><b><font size=+0>One network switch </font></b></td>
2166 <td WIDTH="78%
"><font size=+0>A network switch enables you to connect multiple
2167 systems to a network. </font></td>
2171 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2172 <tr ALIGN=LEFT VALIGN=TOP>
2173 <td WIDTH="22%
"><b>One Cyclades terminal server </b></td>
2175 <td WIDTH="78%
">A terminal server enables you to manage remote systems
2176 from a central location. (A terminal server is not required for cluster
2181 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2182 <tr ALIGN=LEFT VALIGN=TOP>
2183 <td WIDTH="22%
" HEIGHT="24"><b>Three network cables</b></td>
2185 <td WIDTH="78%
" HEIGHT="24">Network cables connect the terminal server
2186 and a network interface on each cluster system to the network switch. </td>
2190 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2191 <tr ALIGN=LEFT VALIGN=TOP>
2192 <td WIDTH="22%
" HEIGHT="47"><b>Two RJ45 to DB9 crossover cables </b></td>
2194 <td WIDTH="78%
" HEIGHT="47">RJ45 to DB9 crossover cables connect a serial
2195 port on each cluster system to the Cyclades terminal server.</td>
2199 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2200 <tr ALIGN=LEFT VALIGN=TOP>
2201 <td WIDTH="22%
" HEIGHT="53"><b>One network crossover cable </b></td>
2203 <td WIDTH="78%
" HEIGHT="53">A network crossover cable connects a network
2204 interface on one cluster system to a network interface on the other system,
2205 creating a point-to-point Ethernet heartbeat channel. </td>
2209 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2210 <tr ALIGN=LEFT VALIGN=TOP>
2211 <td WIDTH="22%
" HEIGHT="31"><b>Two RPS-10 power switches</b></td>
2213 <td WIDTH="78%
" HEIGHT="31">Power switches enable each cluster system to
2214 power-cycle the other system before restarting its services. The power
2215 cable for each cluster system is connected to its own power switch.</td>
2219 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2220 <tr ALIGN=LEFT VALIGN=TOP>
2221 <td WIDTH="22%
" HEIGHT="49"><b>Three null modem cables</b></td>
2223 <td WIDTH="78%
" HEIGHT="49">Null modem cables connect a serial port on
2224 each cluster system to the power switch that provides power to the other
2225 cluster system. This connection enables each cluster system to power-cycle
2227 <p>A null modem cable connects a serial port on one cluster system to a
2228 corresponding serial port on the other system, creating a point-to-point
2229 serial heartbeat channel. </td>
2233 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2235 <td ALIGN=LEFT VALIGN=TOP WIDTH="22%
" HEIGHT="27"><b>FlashDisk RAID Disk
2236 Array with dual controllers </b></td>
2238 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%
" HEIGHT="27">Dual RAID controllers
2239 protect against disk and controller failure. The RAID controllers provide
2240 simultaneous access to all the logical units on the host ports.</td>
2244 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2245 <tr ALIGN=LEFT VALIGN=TOP>
2246 <td WIDTH="22%
"><b>Two HD68 SCSI cables</b></td>
2248 <td WIDTH="78%
">HD68 cables connect each host bus adapter to a RAID enclosure
2249 "in
" port, creating two single-initiator SCSI buses.</td>
2253 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2254 <tr ALIGN=LEFT VALIGN=TOP>
2255 <td WIDTH="22%
"><b>Two terminators</b></td>
2257 <td WIDTH="78%
">Terminators connected to each "out
" port on the RAID enclosure
2258 terminate both single-initiator SCSI buses.</td>
2262 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
2263 <tr ALIGN=LEFT VALIGN=TOP>
2264 <td WIDTH="22%
" HEIGHT="47"><b>Redundant UPS Systems</b></td>
2266 <td WIDTH="78%
" HEIGHT="47">UPS systems provide a highly-available source
2267 of power. The power cables for the power switches and the RAID enclosure
2268 are connected to two UPS systems.</td>
2272 <p>The following figure shows an example of a no-single-point-of-failure
2273 hardware configuration that includes the hardware described in the previous
2274 table, two single-initiator SCSI buses, and power switches to guarantee
2275 data integrity under all error conditions.
2276 <h4 class="ChapterTitleTOC
">
2277 No-Single-Point-Of-Failure Configuration Example</h4>
2279 <h3 class="ChapterTitleTOC
">
2280 <img SRC="hardware.gif
" ></h3>
2287 <p><a NAME="basic-install
"></a>
2288 <h2 CLASS="ChapterTitleTOC
">
2289 2.2 Steps for Setting Up the Cluster Systems</h2>
2290 After you identify the cluster hardware components, as described in <a href="#gather
">Choosing
2291 a Hardware Configuration</a>, you must set up the basic cluster system
2292 hardware and connect the systems to the optional console switch and network
2293 switch or hub. Follow these steps:
2296 In both cluster systems, install the required network adapters, serial
2297 cards, and host bus adapters. See <a href="#hardware-system
">Installing
2298 the Basic System Hardware</a> for more information about performing this
2303 Set up the optional console switch and connect it to each cluster system.
2304 See <a href="#hardware-terminal
">Setting Up a Console Switch</a> for more
2305 information about performing this task.</li>
2311 <p>If you are not using a console switch, connect each system to a console
2315 Set up the optional network switch or hub and use conventional network
2316 cables to connect it to the cluster systems and the terminal server (if
2317 applicable). See <a href="#hardware-network
">Setting Up a Network Switch
2318 or Hub</a> for more information about performing this task.</li>
2324 <p>If you are not using a network switch or hub, use conventional network
2325 cables to connect each system and the terminal server (if applicable) to
2327 After performing the previous tasks, you can install the Linux distribution,
2328 as described in <a href="#install-linux
">Steps for Installing and Configuring
2329 the Linux Distribution</a>.
2333 <a NAME="hardware-system
"></a></h3>
2336 2.2.1 Installing the Basic System Hardware</h3>
2337 Cluster systems must provide the CPU processing power and memory required
2338 by your applications. It is recommended that each system have a minimum
2339 of 450 Mhz CPU speed and 256 MB of memory.
2340 <p>In addition, cluster systems must be able to accommodate the SCSI or
2341 FC adapters, network interfaces, and serial ports that your hardware configuration
2342 requires. Systems have a limited number of preinstalled serial and network
2343 ports and PCI expansion slots. The following table will help you determine
2344 how much capacity your cluster systems require:
2346 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%
" >
2347 <tr ALIGN=LEFT VALIGN=TOP>
2348 <td WIDTH="40%
"><b>Cluster Hardware Component</b></td>
2350 <td WIDTH="23%
"><b>Serial Ports</b></td>
2352 <td WIDTH="20%
"><b>Network Slots</b></td>
2354 <td WIDTH="17%
"><b>PCI slots</b></td>
2357 <tr ALIGN=LEFT VALIGN=TOP>
2358 <td WIDTH="40%
">Remote power switch connection (optional, but strongly
2361 <td WIDTH="23%
">One</td>
2363 <td WIDTH="20%
"> </td>
2365 <td WIDTH="17%
"> </td>
2368 <tr ALIGN=LEFT VALIGN=TOP>
2369 <td WIDTH="40%
" HEIGHT="29">SCSI bus to shared disk storage </td>
2371 <td WIDTH="23%
" HEIGHT="29"> </td>
2373 <td WIDTH="20%
" HEIGHT="29"> </td>
2375 <td WIDTH="17%
" HEIGHT="29">One for each bus</td>
2378 <tr ALIGN=LEFT VALIGN=TOP>
2379 <td WIDTH="40%
">Network connection for client access and Ethernet heartbeat</td>
2381 <td WIDTH="23%
"> </td>
2383 <td WIDTH="20%
">One for each network connection</td>
2385 <td WIDTH="17%
"> </td>
2388 <tr ALIGN=LEFT VALIGN=TOP>
2389 <td WIDTH="40%
">Point-to-point Ethernet heartbeat channel (optional)</td>
2391 <td WIDTH="23%
"> </td>
2393 <td WIDTH="20%
">One for each channel</td>
2395 <td WIDTH="17%
"> </td>
2398 <tr ALIGN=LEFT VALIGN=TOP>
2399 <td WIDTH="40%
">Point-to-point serial heartbeat channel (optional)</td>
2401 <td WIDTH="23%
">One for each channel</td>
2403 <td WIDTH="20%
"> </td>
2405 <td WIDTH="17%
"> </td>
2408 <tr ALIGN=LEFT VALIGN=TOP>
2409 <td WIDTH="40%
">Terminal server connection (optional)</td>
2411 <td WIDTH="23%
">One</td>
2413 <td WIDTH="20%
"> </td>
2415 <td WIDTH="17%
"> </td>
2419 <p>Most systems come with at least one serial port. Ideally, choose systems
2420 that have at least two serial ports. If your system has a graphics display
2421 capability, you can use the serial console port for a serial heartbeat
2422 channel or a power switch connection. To expand your serial port capacity,
2423 you can use multi-port serial PCI cards.
2424 <p>In addition, you must be sure that local system disks will not be on
2425 the same SCSI bus as the shared disks. For example, you can use two-channel
2426 SCSI adapters, such as the Adaptec 3950-series cards, and put the internal
2427 devices on one channel and the shared disks on the other channel. You can
2428 also use multiple SCSI cards.
2429 <p>See the system documentation supplied by the vendor for detailed installation
2430 information. See <a href="#supplement
">Supplementary Hardware Information</a>
2431 for hardware-specific information about using host bus adapters in a cluster.
2432 <p>The following figure shows the bulkhead of a sample cluster system and
2433 the external cable connections for a typical cluster configuration.
2434 <h4 class="ChapterTitleTOC
">
2435 Typical Cluster System External Cabling</h4>
2436 <img SRC="backview.gif
" >
2439 <p><a NAME="hardware-terminal
"></a>
2441 2.2.2 Setting Up a Console Switch</h3>
2442 Although a console switch is not required for cluster operation, you can
2443 use one to facilitate cluster system management and eliminate the need
2444 for separate monitors, mouses, and keyboards for each cluster system. There
2445 are several types of console switches.
2446 <p>For example, a terminal server enables you to connect to serial consoles
2447 and manage many systems from a remote location. For a low-cost alternative,
2448 you can use a KVM (keyboard, video, and mouse) switch, which enables multiple
2449 systems to share one keyboard, monitor, and mouse. A KVM switch is suitable
2450 for configurations in which you access a graphical user interface (GUI)
2451 to perform system management tasks.
2452 <p>Set up the console switch according to the documentation provided by
2453 the vendor, unless this manual provides cluster-specific installation guidelines
2454 that supersede the vendor instructions.
2455 <p>After you set up the console switch, connect it to each cluster system.
2456 The cables you use depend on the type of console switch. For example, if
2457 you have a Cyclades terminal server, use RJ45 to DB9 crossover cables to
2458 connect a serial port on each cluster system to the terminal server.
2461 <p><a NAME="hardware-network
"></a>
2463 2.2.3 Setting Up a Network Switch or Hub</h3>
2464 Although a network switch or hub is not required for cluster operation,
2465 you may want to use one to facilitate cluster and client system network
2467 <p>Set up a network switch or hub according to the documentation provided
2469 <p>After you set up the network switch or hub, connect it to each cluster
2470 system by using conventional network cables. If you are using a terminal
2471 server, use a network cable to connect it to the network switch or hub.
2473 <h2 CLASS="ChapterTitleTOC
">
2474 <a NAME="install-linux
"></a></h2>
2476 <h2 CLASS="ChapterTitleTOC
">
2477 2.3 Steps for Installing and Configuring the Red Hat Linux Distribution</h2>
2478 After you set up the basic system hardware, install the Red Hat Linux distribution
2479 on both cluster systems and ensure that they recognize the connected devices.
2483 Install the Red Hat Linux distribution on both cluster systems. If you
2484 tailor the kernel, be sure to following the kernel requirements and guidelines
2485 described in <a href="#linux-dist
">Kernel Requirements</a>.</li>
2489 Reboot the cluster systems.</li>
2493 If you are using a terminal server, configure Linux to send console messages
2494 to the console port.</li>
2498 Edit the <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
2499 file on each cluster system and include the IP addresses used in the cluster.
2500 See <a href="#hosts
">Editing the /etc/hosts File</a> for more information
2501 about performing this task.</li>
2505 Decrease the alternate kernel boot timeout limit to reduce cluster system
2506 boot time. See <a href="#alt-kernel
">Decreasing the Kernel Boot Timeout
2507 Limit</a> for more information about performing this task.</li>
2511 Ensure that no login (or getty) programs are associated with the serial
2512 ports that are being used for the serial heartbeat channel or the remote
2513 power switch connection, if applicable. To perform this task, edit the
2514 <b><font face="Courier New, Courier, mono
">/etc/inittab</font></b>
2515 file and use a number sign (#) to comment out the entries that correspond
2516 to the serial ports used for the serial channel and the remote power switch.
2517 Then, invoke the <b><font face="Courier New, Courier, mono
">init q</font></b>
2522 Verify that both systems detect all the installed hardware:</li>
2527 Use the <b><font face="Courier New, Courier, mono
">dmesg</font></b> command
2528 to display the console startup messages. See <a href="#dmesg
">Displaying
2529 Console Startup Messages</a> for more information about performing this
2534 Use the <b><font face="Courier New, Courier, mono
">cat /proc/devices</font></b>
2535 command to display the devices configured in the kernel. See <a href="#devices-kernel
">Displaying
2536 Devices Configured in the Kernel</a> for more information about performing
2542 Verify that the cluster systems can communicate over all the network interfaces
2543 by using the <b><font face="Courier New, Courier, mono
">ping</font></b>
2544 command to send test packets from one system to the other system.</li>
2548 <p><a NAME="linux-dist
"></a>
2550 2.3.1 Kernel Requirements</h3>
2551 If you chose to manually configure your kernel, you must adhere to the
2553 <b>kernel requirements</b>:
2556 You must enable IP Aliasing support in the kernel by setting the <b><font face="Courier New, Courier, mono
">CONFIG_IP_ALIAS
2558 option to <b><font face="Courier New, Courier, mono
">y</font></b>. When
2559 specifying kernel options, under <b><font face="Courier New, Courier, mono
">Networking
2560 Options</font></b>, select <b><font face="Courier New, Courier, mono
">IP
2561 aliasing support</font></b>.</li>
2565 You must enable support for the <b><font face="Courier New, Courier, mono
">/proc</font></b>
2566 file system by setting the <b><font face="Courier New, Courier, mono
">CONFIG_PROC_FS</font></b>
2567 kernel option to <b><font face="Courier New, Courier, mono
">y</font></b>.
2568 When specifying kernel options, under <b><font face="Courier New, Courier, mono
">Filesystems</font></b>,
2569 select <b><font face="Courier New, Courier, mono
">/proc filesystem support</font></b>.</li>
2573 You must ensure that the SCSI driver is started before the cluster software.
2574 For example, you can edit the startup scripts so that the driver is started
2575 before the <b><font face="Courier New, Courier, mono
">cluster</font></b>
2576 script. You can also statically build the SCSI driver into the kernel,
2577 instead of including it as a loadable module, by modifying the <b><font face="Courier New, Courier, mono
">/etc/modules.conf</font></b>
2580 In addition, when installing the Linux distribution, it is <b>strongly
2581 recommended</b> that you:
2584 Gather the IP addresses for the cluster systems and for the point-to-point
2585 Ethernet heartbeat interfaces, before installing a Linux distribution.
2586 Note that the IP addresses for the point-to-point Ethernet interfaces can
2587 be private IP addresses, such as 10<b><i>.x.x.x</i></b> addresses.</li>
2591 Enable the following Linux kernel options to provide detailed information
2592 about the system configuration and events and help you diagnose problems:</li>
2597 Enable SCSI logging support by setting the <b><font face="Courier New, Courier, mono
">CONFIG_SCSI_LOGGING</font></b>
2598 kernel option to <b><font face="Courier New, Courier, mono
">y</font></b>.
2599 When specifying kernel options, under <b><font face="Courier New, Courier, mono
">SCSI
2600 Support</font></b>, select <b><font face="Courier New, Courier, mono
">SCSI
2606 Enable support for <b><font face="Courier New, Courier, mono
">sysctl</font></b>
2607 by setting the <b><font face="Courier New, Courier, mono
">CONFIG_SYSCTL</font></b>
2608 kernel option to <b><font face="Courier New, Courier, mono
">y</font></b>.
2609 When specifying kernel options, under
2610 <b><font face="Courier New, Courier, mono
">General
2611 Setup</font></b>, select
2612 <b><font face="Courier New, Courier, mono
">Sysctl
2613 support</font></b>.</li>
2617 Do not put local file systems, such as <b><font face="Courier New, Courier, mono
">/</font></b>,
2618 <b><font face="Courier New, Courier, mono
">/etc</font></b>,
2619 <b><font face="Courier New, Courier, mono
">/tmp</font></b>,
2620 and <b><font face="Courier New, Courier, mono
">/var</font></b> on shared
2621 disks or on the same SCSI bus as shared disks. This helps prevent the other
2622 cluster member from accidentally mounting these file systems, and also
2623 reserves the limited number of SCSI identification numbers on a bus for
2628 Put <b><font face="Courier New, Courier, mono
">/tmp</font></b> and <b><font face="Courier New, Courier, mono
">/var</font></b>
2629 on different file systems. This may improve system performance.</li>
2633 When a cluster system boots, be sure that the system detects the disk devices
2634 in the same order in which they were detected during the Linux installation.
2635 If the devices are not detected in the same order, the system may not boot.</li>
2637 <a NAME="hosts
"></a>
2639 2.3.2 Editing the /etc/hosts File</h3>
2640 The <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b> file
2641 contains the IP address-to-hostname translation table. The <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
2642 file on each cluster system must contain entries for the following:
2645 IP addresses and associated host names for both cluster systems</li>
2649 IP addresses and associated host names for the point-to-point Ethernet
2650 heartbeat connections (these can be private IP addresses)</li>
2652 As an alternative to the <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
2653 file, you could use a naming service such as DNS or NIS to define the host
2654 names used by a cluster. However, to limit the number of dependencies and
2655 optimize availability, it is strongly recommended that you use the <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
2656 file to define IP addresses for cluster network interfaces.
2657 <p>To following is an example of an <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
2658 file on a cluster system:
2659 <pre><font size=-1>127.0.0.1 localhost.localdomain localhost
2660 193.186.1.81 cluster2.linux.com cluster2
2661 10.0.0.1 ecluster2.linux.com ecluster2
2662 193.186.1.82 cluster3.linux.com cluster3
2663 10.0.0.2 ecluster3.linux.com ecluster3</font></pre>
2664 The previous example shows the IP addresses and host names for two cluster
2665 systems (<b><font face="Courier New, Courier, mono
">cluster2</font></b>
2666 and <b><font face="Courier New, Courier, mono
">cluster3</font></b>), and
2667 the private IP addresses and host names for the Ethernet interface used
2668 for the point-to-point heartbeat connection on each cluster system (<b><font face="Courier New, Courier, mono
">ecluster2</font></b>
2669 and <b><font face="Courier New, Courier, mono
">ecluster3</font></b>).
2670 <p>Verify correct formatting of the local host entry in the <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
2671 file, to ensure that it does not include non-local systems in the
2672 entry for the local host. An example of an incorrect local host entry that
2673 includes a non-local system (<b><font face="Courier New, Courier, mono
">server1</font></b>)
2675 <pre><font size=-1>127.0.0.1 localhost.localdomain localhost server1</font></pre>
2676 A heartbeat channel may not operate properly if the format is not correct.
2677 For example, the channel will erroneously appear to be "offline.
" Check
2678 your <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b> file
2679 and correct the file format by removing non-local systems from the local
2680 host entry, if necessary.
2681 <p>Note that each network adapter must be configured with the appropriate
2682 IP address and netmask.
2683 <p>The following is an example of a portion of the output from the <b><font face="Courier New, Courier, mono
">ifconfig</font></b>
2684 command on a cluster system:
2685 <pre><font size=-1># ifconfig
2687 eth0 Link encap:Ethernet HWaddr 00:00:BC:11:76:93
2688 inet addr:192.186.1.81 Bcast:192.186.1.245 Mask:255.255.255.0
2689 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
2690 RX packets:65508254 errors:225 dropped:0 overruns:2 frame:0
2691 TX packets:40364135 errors:0 dropped:0 overruns:0 carrier:0
2692 collisions:0 txqueuelen:100
2693 Interrupt:19 Base address:0xfce0
2695 eth1 Link encap:Ethernet HWaddr 00:00:BC:11:76:92
2696 inet addr:10.0.0.1 Bcast:10.0.0.245 Mask:255.255.255.0
2697 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
2698 RX packets:0 errors:0 dropped:0 overruns:0 frame:0
2699 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
2700 collisions:0 txqueuelen:100
2701 Interrupt:18 Base address:0xfcc0</font></pre>
2702 The previous example shows two network interfaces on a cluster system,
2703 <b><font face="Courier New, Courier, mono
">eth0
2705 interface for the cluster system) and <b><font face="Courier New, Courier, mono
">eth1</font></b>
2706 (network interface for the point-to-point heartbeat connection).
2707 <p><a NAME="alt-kernel
"></a>
2709 2.3.3 Decreasing the Kernel Boot Timeout Limit</h3>
2710 You can reduce the boot time for a cluster system by decreasing the kernel
2711 boot timeout limit. During the Linux boot sequence, you are given the opportunity
2712 to specify an alternate kernel to boot. The default timeout limit for specifying
2713 a kernel depends on the Linux distribution. For Red Hat distributions,
2714 the limit is five seconds.
2715 <p>To modify the kernel boot timeout limit for a cluster system, edit the
2716 <b><font face="Courier New, Courier, mono
">/etc/lilo.conf</font></b>
2717 file and specify the desired value (in tenths of a second) for the <b><font face="Courier New, Courier, mono
">timeout</font></b>
2718 parameter. The following example sets the timeout limit to three seconds:
2719 <pre>timeout = 30</pre>
2720 To apply the changes you made to the <b><font face="Courier New, Courier, mono
">/etc/lilo.conf</font></b>
2721 file, invoke the <b><font face="Courier New, Courier, mono
">/sbin/lilo</font></b>
2723 <p>Similarly, if you are using the <b>grub</b> boot loader, the timeout
2724 parameter in <b>/boot/grub/grub.conf </b>should be modified to specify
2725 the appropriate number of seconds. For example:
2727 <p><a NAME="dmesg
"></a>
2729 2.3.4 Displaying Console Startup Messages</h3>
2730 Use the <b><font face="Courier New, Courier, mono
">dmesg</font></b> command
2731 to display the console startup messages. See the <b><font face="Courier New, Courier, mono
">dmesg.8</font></b>
2732 manpage for more information.
2733 <p>The following example of <b><font face="Courier New, Courier, mono
">dmesg</font></b>
2734 command output shows that a serial expansion card was recognized during
2736 <pre><font size=-1>May 22 14:02:10 storage3 kernel: Cyclades driver 2.3.2.5 2000/01/19 14:35:33
2737 May 22 14:02:10 storage3 kernel: built May 8 2000 12:40:12
2738 May 22 14:02:10 storage3 kernel: Cyclom-Y/PCI #1: 0xd0002000-0xd0005fff, IRQ9,
2739 4 channels starting from port 0.</font></pre>
2740 The following example of <b><font face="Courier New, Courier, mono
">dmesg</font></b>
2741 command output shows that two external SCSI buses and nine disks were detected
2743 <pre><font size=-1>May 22 14:02:10 storage3 kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
2744 May 22 14:02:10 storage3 kernel: <adaptec aic-7890/1 ultra2 scsi host adapter>
2745 May 22 14:02:10 storage3 kernel: scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4
2746 May 22 14:02:10 storage3 kernel: <adaptec aha-294x ultra2 scsi host adapter>
2747 May 22 14:02:10 storage3 kernel: scsi : 2 hosts.
2748 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST39236LW Rev: 0004
2749 May 22 14:02:11 storage3 kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
2750 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2751 May 22 14:02:11 storage3 kernel: Detected scsi disk sdb at scsi1, channel 0, id 0, lun 0
2752 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2753 May 22 14:02:11 storage3 kernel: Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0
2754 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2755 May 22 14:02:11 storage3 kernel: Detected scsi disk sdd at scsi1, channel 0, id 2, lun 0
2756 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2757 May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0
2758 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2759 May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0
2760 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2761 May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0
2762 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2763 May 22 14:02:11 storage3 kernel: Detected scsi disk sdh at scsi1, channel 0, id 10, lun 0
2764 May 22 14:02:11 storage3 kernel: Vendor: SEAGATE Model: ST318203LC Rev: 0001
2765 May 22 14:02:11 storage3 kernel: Detected scsi disk sdi at scsi1, channel 0, id 11, lun 0
2766 May 22 14:02:11 storage3 kernel: Vendor: Dell Model: 8 BAY U2W CU Rev: 0205
2767 May 22 14:02:11 storage3 kernel: Type: Processor ANSI SCSI revision: 03
2768 May 22 14:02:11 storage3 kernel: scsi1 : channel 0 target 15 lun 1 request sense failed, performing reset.
2769 May 22 14:02:11 storage3 kernel: SCSI bus is being reset for host 1 channel 0.
2770 May 22 14:02:11 storage3 kernel: scsi : detected 9 SCSI disks total.</font></pre>
2771 The following example of <b><font face="Courier New, Courier, mono
">dmesg</font></b>
2772 command output shows that a quad Ethernet card was detected on the system:
2773 <pre><font size=-1>May 22 14:02:11 storage3 kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html
2774 May 22 14:02:11 storage3 kernel: tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov
2775 May 22 14:02:11 storage3 kernel: eth0: Digital DS21140 Tulip rev 34 at 0x9800, 00:00:BC:11:76:93, IRQ 5.
2776 May 22 14:02:12 storage3 kernel: eth1: Digital DS21140 Tulip rev 34 at 0x9400, 00:00:BC:11:76:92, IRQ 9.
2777 May 22 14:02:12 storage3 kernel: eth2: Digital DS21140 Tulip rev 34 at 0x9000, 00:00:BC:11:76:91, IRQ 11.
2778 May 22 14:02:12 storage3 kernel: eth3: Digital DS21140 Tulip rev 34 at 0x8800, 00:00:BC:11:76:90, IRQ 10.</font></pre>
2781 <a NAME="devices-kernel
"></a></h2>
2784 2.3.5 Displaying Devices Configured in the Kernel</h3>
2785 To be sure that the installed devices, including serial and network interfaces,
2786 are configured in the kernel, use the <b><font face="Courier New, Courier, mono
">cat
2787 /proc/devices</font></b> command on each cluster system. You can also use
2788 this command to determine if you have raw device support installed on the
2789 system. For example:
2790 <pre><font size=-1># <b>cat /proc/devices
2791 </b>Character devices:
2813 The previous example shows:
2816 Onboard serial ports (<b><font face="Courier New, Courier, mono
">ttyS</font></b>)</li>
2819 Serial expansion card (<b><font face="Courier New, Courier, mono
">ttyC</font></b>)</li>
2822 Raw devices (<b><font face="Courier New, Courier,mono
">raw</font></b>)</li>
2825 SCSI devices (<b><font face="Courier New, Courier, mono
">sd</font></b>)</li>
2828 <h2 CLASS="ChapterTitleTOC
">
2829 <a NAME="install-cluster
"></a></h2>
2831 <h2 CLASS="ChapterTitleTOC
">
2832 2.4 Steps for Setting Up and Connecting the Cluster Hardware</h2>
2833 After installing the Red Hat Linux distribution, you can set up the cluster
2834 hardware components and then verify the installation to ensure that the
2835 cluster systems recognize all the connected devices. Note that the exact
2836 steps for setting up the hardware depend on the type of configuration.
2837 See <a href="#gather
">Choosing a Hardware Configuration</a> for more information
2838 about cluster configurations.
2839 <p>To set up the cluster hardware, follow these steps:
2842 Shut down the cluster systems and disconnect them from their power source.</li>
2846 Set up the point-to-point Ethernet and serial heartbeat channels, if applicable.
2847 See <a href="#hardware-heart
">Configuring Heartbeat Channels</a> for more
2848 information about performing this task.</li>
2852 If you are using power switches, set up the devices and connect each cluster
2853 system to a power switch. See <a href="#hardware-power
">Configuring
2854 Power Switches</a> for more information about performing this task.</li>
2860 <p>In addition, it is recommended that you connect each power switch (or
2861 each cluster system's power cord if you are not using power switches) to
2862 a different UPS system. See <a href="#hardware-ups
">Configuring UPS Systems</a>
2863 for information about using optional UPS systems.
2866 Set up the shared disk storage according to the vendor instructions and
2867 connect the cluster systems to the external storage enclosure. Be sure
2868 to adhere to the configuration requirements for multi-initiator or single-initiator
2869 SCSI buses. See <a href="#hardware-storage
">Configuring Shared Disk Storage</a>
2870 for more information about performing this task.</li>
2876 <p>In addition, it is recommended that you connect the storage enclosure
2877 to redundant UPS systems. See <a href="#hardware-ups
">Configuring UPS Systems</a>
2878 for more information about using optional UPS systems.
2881 Turn on power to the hardware, and boot each cluster system. During the
2882 boot, enter the BIOS utility to modify the system setup, as follows:</li>
2887 Assign a unique SCSI identification number to each host bus adapter on
2888 a SCSI bus. See <a href="#scsi-ids
">SCSI Identification Numbers</a> for
2889 more information about performing this task.</li>
2893 Enable or disable the onboard termination for each host bus adapter, as
2894 required by your storage configuration. See <a href="#hardware-storage
">Configuring
2895 Shared Disk Storage</a> and <a href="#scsi-term
">SCSI Bus Termination</a>
2896 for more information about performing this task.</li>
2900 If using a multi-initiator SCSI bus configuration, disable bus resets for
2901 the host bus adapters connected to cluster shared storage.</li>
2905 Enable the cluster system to automatically boot when it is powered on.</li>
2908 <p><br>If you are using Adaptec host bus adapters for shared storage, see
2909 <a href="#adaptec
">Adaptec
2910 Host Bus Adapter Requirement</a> for configuration information.
2912 Exit from the BIOS utility, and continue to boot each system. Examine the
2913 startup messages to verify that the Linux kernel has been configured and
2914 can recognize the full set of shared disks. You can also use the <b><font face="Courier New, Courier, mono
">dmesg</font></b>
2915 command to display console startup messages. See <a href="#dmesg
">Displaying
2916 Console Startup Messages</a> for more information about using this command.</li>
2920 Verify that the cluster systems can communicate over each point-to-point
2921 Ethernet heartbeat connection by using the <b><font face="Courier New, Courier, mono
">ping</font></b>
2922 command to send packets over each network interface.</li>
2926 Set up the quorum disk partitions on the shared disk storage. See <a href="#state-partitions
">Configuring
2927 the Quorum Partitions</a> for more information about performing this task.</li>
2930 <a NAME="hardware-heart
"></a>
2931 <h3 class="ChapterTitleTOC
">
2932 2.4.1 Configuring Heartbeat Channels</h3>
2933 The cluster uses heartbeat channels as a policy input during failover of
2934 the cluster systems. For example, if a cluster system stops updating its
2935 timestamp on the quorum partitions, the other cluster system will check
2936 the status of the heartbeat channels to determine if additional time should
2937 be alloted prior to initiating a failover.
2938 <p>A cluster must include at least one heartbeat channel. You can use an
2939 Ethernet connection for both client access and a heartbeat channel. However,
2940 it is recommended that you set up additional heartbeat channels for high
2941 availability. You can set up redundant Ethernet heartbeat channels, in
2942 addition to one or more serial heartbeat channels.
2943 <p>For example, if you have an Ethernet and a serial heartbeat channel,
2944 and the cable for the Ethernet channel is disconnected, the cluster systems
2945 can still check status through the serial heartbeat channel.
2946 <p>To set up a redundant Ethernet heartbeat channel, use a network crossover
2947 cable to connect a network interface on one cluster system to a network
2948 interface on the other cluster system.
2949 <p>To set up a serial heartbeat channel, use a null modem cable to connect
2950 a serial port on one cluster system to a serial port on the other cluster
2951 system. Be sure to connect corresponding serial ports on the cluster systems;
2952 do not connect to the serial port that will be used for a remote power
2953 switch connection. In the future, should support be added for more
2954 than 2 cluster members, then usage of serial based heartbeat channels may
2957 <p><a NAME="hardware-power
"></a>
2958 <h3 class="ChapterTitleTOC
">
2959 2.4.2 Configuring Power Switches</h3>
2960 Power switches enable a cluster system to power-cycle the other cluster
2961 system before restarting its services as part of the failover process.
2962 The ability to remotely disable a system ensures data integrity is maintained
2963 under any failure condition. It is recommended that production environments
2964 use power switches in the cluster configuration. Only development (test)
2965 environments should use a configuration without power switches.
2966 <p>In a cluster configuration that uses power switches, each cluster system's
2967 power cable is connected to a power switch through either a serial or network
2968 connection (depending on switch type). When failover occurs, a cluster
2969 system can use this connection to power-cycle the other cluster system
2970 before restarting its services.
2971 <p>Power switches protect against data corruption if an unresponsive ("hung
")
2972 system becomes responsive ("unhung
") after its services have failed over,
2973 and issues I/O to a disk that is also receiving I/O from the other cluster
2974 system. In addition, if a quorum daemon fails on a cluster system, the
2975 system is no longer able to monitor the quorum partitions. If you are not
2976 using power switches in the cluster, this error condition may result in
2977 services being run on more than one cluster system, which can cause data
2978 corruption and posibly system crashes.
2979 <p>It is strongly recommended that you use power switches in a cluster.
2980 However, if you are fully aware of the risk, you can choose to set up a
2981 cluster without power switches.
2982 <p>A cluster system may "hang
" for a few seconds if it is swapping or has
2983 a high system workload. For this reason, adqeuate time is allowed prior
2984 to concluding another system has failed (typically 12 seconds).
2985 <p>A cluster system may "hang
" indefinitely because of a hardware failure
2986 or a kernel error. In this case, the other cluster will notice that the
2987 "hung
" system is not updating its timestamp on the quorum partitions, and
2988 is not responding to pings over the heartbeat channels.
2989 <p>If a cluster system determines that a "hung
" system is down, and power
2990 switches are used in the cluster, the cluster system will power-cycle the
2991 "hung
" system before restarting its services. This will cause the "hung
"
2992 system to reboot in a clean state, and prevent it from issuing I/O and
2993 corrupting service data.
2994 <p>If power switches are not used in cluster, and a cluster system determines
2995 that a "hung
" system is down, it will set the status of the failed system
2996 to <b><font face="Courier New, Courier, mono
">DOWN</font></b> on the quorum
2997 partitions, and then restart the "hung
" system's services. If the "hung
"
2998 system becomes "unhung,
" it will notice that its status is <b><font face="Courier New, Courier, mono
">DOWN</font></b>,
2999 and initiate a system reboot. This will minimize the time that both cluster
3000 systems may be able to issue I/O to the same disk, but it does not provide
3001 the data integrity guarantee of power switches. If the "hung
" system never
3002 becomes responsive, you will have to manually reboot the system.
3003 <p>If you are using power switches, set up the hardware according to the
3004 vendor instructions. However, you may have to perform some cluster-specific
3005 tasks to use a power switch in the cluster. See <a href="#rps-
10">Setting
3006 Up Power Switches</a> for detailed information. Note that the cluster-specific
3007 information provided in this document supersedes the vendor information.
3008 Also be sure to read the detailed information provided in <a href="#power-setup
">Setting
3009 Up Power Switches</a> to take note of any caveats or functional attributes
3010 of specific power switch types.
3011 <p>When cabling up power switches, take special care to ensure that each
3012 cable is plugged into the appropriate outlet. This is crucial because
3013 there is no indemendent means for the software to verify correct cabling.
3014 Failure to cable correctly can lead to an incorrect system being power
3015 cycled, or for one system to inapropriately conclude that it has successfully
3016 power cycled another cluster member.
3017 <p>After you set up the power switches, perform these tasks to connect
3018 them to the cluster systems:
3021 Connect the power cable for each cluster system to a power switch.</li>
3025 On each cluster system, connect a serial port to the serial port on the
3026 power switch that provides power to the other cluster system. The cable
3027 you use for the serial connection depends on the type of power switch.
3028 For example, if you have an RPS-10 power switch, use null modem cables.
3029 Alternatively, if you have a network attached power switch, a network cable
3034 Connect the power cable for each power switch to a power source. It is
3035 recommended that you connect each power switch to a different UPS system.
3036 See <a href="#hardware-ups
">Configuring UPS Systems</a> for more information.</li>
3038 After you install the cluster software, but before you start the cluster,
3039 test the power switches to ensure that each cluster system can power-cycle
3040 the other system. See <a href="#pswitch
">Testing the Power Switches</a>
3042 <p><a NAME="hardware-ups
"></a>
3043 <h3 class="ChapterTitleTOC
">
3044 2.4.3 Configuring UPS Systems</h3>
3045 Uninterruptible power supply (UPS) systems provide a highly-available source
3046 of power. Ideally, a redundant solution should be used incorporating multiple
3047 UPS's (one per server). For maximal fault-tollerance, you could incorporate
3048 two UPS's per server as well as APC's Automatic Transfer Switches to be
3049 used to manage the power and shutdown management to the server. Both
3050 solutions are solely dependant on the level of availability desired.
3051 <p>It is not recommended that an existing large UPS infrastructure be the
3052 sole source of power for the cluster. A UPS solution dedicated to
3053 the cluster itself allows for more flexibility in terms of manageability
3055 <p>A complete UPS system must be able to provide adequate voltage and current
3056 for an adequate period of time. While there is no single UPS to fit
3057 every power requirement. Visit APC's UPS configurator at: <a href="http://www.apcc.com/template/size/apc
">www.apcc.com/template/size/apc</a>
3058 to size the correct UPS for your server. The APC Smart-UPS product line
3059 ships with software management for Red Hat Linux, the RPM name is
3061 <p>If your disk storage subsystem has two power supplies with separate
3062 power cords, set up two UPS systems, and connect one power switch (or one
3063 cluster system's power cord if you are not using power switches) and one
3064 of the storage subsystem's power cords to each UPS system.
3065 <p>A redundant UPS system configuration is shown in the following figure.
3066 <h4 class="ChapterTitleTOC
">
3067 Redudant UPS System Configuration</h4>
3068 <img SRC="two_ups.gif
" >
3069 <p>You can also connect both power switches (or both cluster systems' power
3070 cords) and the disk storage subsystem to the same UPS system. This is the
3071 most cost-effective configuration, and provides some protection against
3072 power failure. However, if a power outage occurs, the single UPS system
3073 becomes a possible single point of failure. In addition, one UPS system
3074 may not be able to provide enough power to all the attached devices for
3075 an adequate amount of time.
3076 <p>A single UPS system configuration is shown in the following figure.
3077 <h4 class="ChapterTitleTOC
">
3078 Single UPS System Configuration</h4>
3079 <img SRC="one_ups.gif
" >
3080 <p>Many UPS system products include Linux applications that monitor the
3081 operational status of the UPS system through a serial port connection.
3082 If the battery power is low, the monitoring software will initiate a clean
3083 system shutdown. If this occurs, the cluster software will be properly
3084 stopped, because it is controlled by a System V run level script (for example,
3085 <b><font face="Courier New, Courier, mono
">/etc/rc.d/init.d/cluster</font></b>).
3086 <p>See the UPS documentation supplied by the vendor for detailed installation
3089 <p><a NAME="hardware-storage
"></a>
3090 <h3 CLASS="ChapterTitleTOC
">
3091 2.4.4 Configuring Shared Disk Storage</h3>
3092 In a cluster, shared disk storage is used to hold service data and two
3093 quorum partitions. Because this storage must be available to both cluster
3094 systems, it cannot be located on disks that depend on the availability
3095 of any one system. See the vendor documentation for detailed product and
3096 installation information.
3097 <p>There are a number of factors to consider when setting up shared disk
3098 storage in a cluster:
3101 Hardware RAID versus JBOD</li>
3107 <p><b>JBOD</b> ("just a bunch of disks
") storage provides a low-cost storage
3108 solution, but it does not provide highly available data. If a disk in a
3109 JBOD enclosure fails, any cluster service that uses the disk will be unavailable.
3110 Therefore, only development environments should use JBOD.
3111 <p>Controller-based <b>hardware RAID</b> is more expensive than JBOD storage,
3112 but it enables you to protect against disk failure. In addition, a dual-controller
3113 RAID array protects against controller failure. It is strongly recommended
3114 that you use RAID 1 (mirroring) to make service data and the quorum partitions
3115 highly available. Optionally, you can use parity RAID for high availability.
3116 Do not use RAID 0 (striping) for the quorum partitions. It is recommended
3117 that production environments use RAID for high availability.
3118 <p>Note that you cannot use host-based, adapter-based, or software RAID
3119 in a cluster, because these products usually do not properly coordinate
3120 multisystem access to shared storage.
3123 Multi-initiator SCSI buses or Fibre Channel interconnects versus single-initiator
3124 buses or interconnects</li>
3130 <p>A <b>multi-initiator</b> SCSI bus or Fibre Channel interconnect has
3131 more than one cluster system connected to it. RAID controllers with a single
3132 host port and parallel SCSI disks must use a multi-initiator bus or interconnect
3133 to connect the two host bus adapters to the storage enclosure. This configuration
3134 provides no host isolation. Therefore, only development environments should
3135 use multi-initiator buses.
3136 <p>A <b>single-initiator</b> SCSI bus or Fibre Channel interconnect has
3137 only one cluster system connected to it, and provides host isolation and
3138 better performance than a multi-initiator bus. Single-initiator buses or
3139 interconnects ensure that each cluster system is protected from disruptions
3140 due to the workload, initialization, or repair of the other cluster system.
3141 <p>If you have a RAID array that has multiple host ports and provides simultaneous
3142 access to all the shared logical units from the host ports on the storage
3143 enclosure, you can set up two single-initiator buses or interconnects to
3144 connect each cluster system to the RAID array. If a logical unit can fail
3145 over from one controller to the other, the process must be transparent
3146 to the operating system. It is recommended that production environments
3147 use single-initiator buses or interconnects.
3156 <p>In some cases, you can set up a shared storage configuration that supports
3158 plugging</b>, which enables you to disconnect a device from a multi-initiator
3159 SCSI bus or a multi-initiator Fibre Channel interconnect without affecting
3160 bus operation. This enables you to easily perform maintenance on a device,
3161 while the services that use the bus or interconnect remain available.
3162 <p>For example, by using an external terminator to terminate a SCSI bus
3163 instead of the onboard termination for a host bus adapter, you can disconnect
3164 the SCSI cable and terminator from the adapter and the bus will still be
3166 <p>However, if you are using a Fibre Channel hub or switch, hot plugging
3167 is not necessary because the hub or switch allows the interconnect to remain
3168 operational if a device is disconnected. In addition, if you have a single-initiator
3169 SCSI bus or Fibre Channel interconnect, hot plugging is not necessary because
3170 the private bus does not need to remain operational when you disconnect
3172 Note that you must carefully follow the configuration guidelines for multi
3173 and single-initiator buses and for hot plugging, in order for the cluster
3174 to operate correctly.
3175 <p>You must adhere to the following <b>shared storage requirements</b>:
3178 The Linux device name for each shared storage device must be the same on
3179 each cluster system. For example, a device named <b><font face="Courier New, Courier, mono
">/dev/sdc</font></b>
3180 on one cluster system must be named <b><font face="Courier New, Courier, mono
">/dev/sdc</font></b>
3181 on the other cluster system. You can usually ensure that devices are named
3182 the same by using identical hardware for both cluster systems.</li>
3186 A disk partition can be used by only one cluster service.</li>
3190 Do not include any file systems used in a cluster service in the cluster
3191 system's local <b><font face="Courier New, Courier, mono
">/etc/fstab</font></b>
3192 files, because the cluster software must control the mounting and unmounting
3193 of service file systems.</li>
3197 For optimal performance, use a 4 KB block size when creating shared file
3198 systems. Note that some of the <b><font face="Courier New, Courier, mono
">mkfs</font></b>
3199 file system build utilities default to a 1 KB block size, which can cause
3200 long <b><font face="Courier New, Courier, mono
">fsck</font></b> times.</li>
3202 You must adhere to the following <b>parallel SCSI requirements</b>, if
3206 SCSI buses must be terminated at each end, and must adhere to length and
3207 hot plugging restrictions.</li>
3211 Devices (disks, host bus adapters, and RAID controllers) on a SCSI bus
3212 must have a unique SCSI identification number.</li>
3216 SCSI bus resets must be disabled.</li>
3220 <a href="#scsi-reqs
">SCSI Bus Configuration Requirements</a> for more
3222 <p>In addition, it is <b>strongly recommended</b> that you connect the
3223 storage enclosure to redundant UPS systems for a highly-available source
3224 of power. See <a href="#hardware-ups
">Configuring UPS Systems</a> for more
3226 <p>See <a href="#multiinit
">Setting Up a Multi-Initiator SCSI Bus</a>,
3227 <a href="#singleinit
">Setting
3228 Up a Single-Initiator SCSI Bus</a>, and <a href="#single-fibre
">Setting
3229 Up a Single-Initiator Fibre Channel Interconnect</a> for more information
3230 about configuring shared storage.
3231 <p>After you set up the shared disk storage hardware, you can partition
3232 the disks and then either create file systems or raw devices on the partitions.
3233 You must create two raw devices for the primary and the backup quorum partitions.
3234 See <a href="#state-partitions
">Configuring the Quorum Partitions</a>,
3235 <a href="#partition
">Partitioning
3236 Disks</a>, <a href="#rawdevices
">Creating Raw Devices</a>, and <a href="#filesystems
">Creating
3237 File Systems</a> for more information.
3238 <p><a NAME="multiinit
"></a>
3240 2.4.4.1 Setting Up a Multi-Initiator SCSI Bus</h4>
3241 A multi-initiator SCSI bus has more than one cluster system connected to
3242 it. If you have JBOD storage, you must use a multi-initiator SCSI bus to
3243 connect the cluster systems to the shared disks in a cluster storage enclosure.
3244 You also must use a multi-initiator bus if you have a RAID controller that
3245 does not provide access to all the shared logical units from host ports
3246 on the storage enclosure, or has only one host port.
3247 <p>A multi-initiator bus does not provide host isolation. Therefore, only
3248 development environments should use a multi-initiator bus.
3249 <p>A multi-initiator bus must adhere to the requirements described in <a href="#scsi-reqs
">SCSI
3250 Bus Configuration Requirements</a>. In addition, see
3252 Bus Adapter Features and Configuration Requirements</a> for information
3253 about terminating host bus adapters and configuring a multi-initiator bus
3254 with and without hot plugging support.
3255 <p>In general, to set up a multi-initiator SCSI bus with a cluster system
3256 at each end of the bus, you must do the following:
3259 Enable the onboard termination for each host bus adapter.</li>
3262 Disable the termination for the storage enclosure, if applicable.</li>
3265 Use the appropriate 68-pin SCSI cable to connect each host bus adapter
3266 to the storage enclosure.</li>
3268 To set host bus adapter termination, you usually must enter the system
3269 configuration utility during system boot. To set RAID controller or storage
3270 enclosure termination, see the vendor documentation.
3271 <p>The following figure shows a multi-initiator SCSI bus with no hot plugging
3273 <p><b>Multi-Initiator SCSI Bus Configuration</b>
3274 <p><img SRC="multidrop_1.gif
" height=130 width=360>
3275 <p>If the onboard termination for a host bus adapter can be disabled, you
3276 can configure it for hot plugging. This allows you to disconnect the adapter
3277 from the multi-initiator bus, without affecting bus termination, so you
3278 can perform maintenance while the bus remains operational.
3279 <p>To configure a host bus adapter for hot plugging, you must do the following:
3282 Disable the onboard termination for the host bus adapter.</li>
3285 Connect an external pass-through LVD active terminator to the host bus
3286 adapter connector.</li>
3288 You can then use the appropriate 68-pin SCSI cable to connect the LVD terminator
3289 to the (unterminated) storage enclosure.
3290 <p>The following figure shows a multi-initiator SCSI bus with both host
3291 bus adapters configured for hot plugging.
3292 <p><b>Multi-Initiator SCSI Bus Configuration With Hot Plugging</b>
3293 <p><img SRC="multidrop_2.gif
" height=137 width=360>
3294 <p>The following figure shows the termination in a JBOD storage enclosure
3295 connected to a multi-initiator SCSI bus.
3296 <p><b>JBOD Storage Connected to a Multi-Initiator Bus</b>
3297 <p><img SRC="jbod_raid_multi.gif
" height=208 width=360>
3298 <p>The following figure shows the termination in a single-controller RAID
3299 array connected to a multi-initiator SCSI bus.
3300 <p><b>Single-Controller RAID Array Connected to a Multi-Initiator Bus</b>
3301 <p><img SRC="single_raid_multi.gif
" height=295 width=360>
3302 <p>The following figure shows the termination in a dual-controller RAID
3303 array connected to a multi-initiator SCSI bus.
3304 <p><b>Dual-Controller RAID Array Connected to a Multi-Initiator Bus</b>
3305 <p><img SRC="dual_raid_multi.gif
" >
3307 <p><a NAME="singleinit
"></a>
3309 2.4.4.2 Setting Up a Single-Initiator SCSI Bus</h4>
3310 A single-initiator SCSI bus has only one cluster system connected to it,
3311 and provides host isolation and better performance than a multi-initiator
3312 bus. Single-initiator buses ensure that each cluster system is protected
3313 from disruptions due to the workload, initialization, or repair of the
3314 other cluster system.
3315 <p>If you have a single or dual-controller RAID array that has multiple
3316 host ports and provides simultaneous access to all the shared logical units
3317 from the host ports on the storage enclosure, you can set up two single-initiator
3318 SCSI buses to connect each cluster system to the RAID array. If a logical
3319 unit can fail over from one controller to the other, the process must be
3320 transparent to the operating system.
3321 <p>It is recommended that production environments use single-initiator
3322 SCSI buses or single-initiator Fibre Channel interconnects.
3323 <p>Note that some RAID controllers restrict a set of disks to a specific
3324 controller or port. In this case, you cannot set up single-initiator buses.
3325 In addition, hot plugging is not necessary in a single-initiator SCSI bus,
3326 because the private bus does not need to remain operational when you disconnect
3327 a host bus adapter from the bus.
3328 <p>A single-initiator bus must adhere to the requirements described in
3329 <a href="#scsi-reqs
">SCSI
3330 Bus Configuration Requirements</a>. In addition, see <a href="#hba
">Host
3331 Bus Adapter Features and Configuration Requirements</a> for detailed information
3332 about terminating host bus adapters and configuring a single-initiator
3334 <p>To set up a single-initiator SCSI bus configuration, you must do the
3338 Enable the onboard termination for each host bus adapter.</li>
3341 Enable the termination for each RAID controller.</li>
3344 Use the appropriate 68-pin SCSI cable to connect each host bus adapter
3345 to the storage enclosure.</li>
3347 To set host bus adapter termination, you usually must enter a BIOS utility
3348 during system boot. To set RAID controller termination, see the vendor
3350 <p>The following figure shows a configuration that uses two single-initiator
3352 <p><b>Single-Initiator SCSI Bus Configuration</b>
3353 <p><img SRC="multidrop_3.gif
" height=138 width=360>
3354 <p>The following figure shows the termination in a single-controller RAID
3355 array connected to two single-initiator SCSI buses.
3356 <p><b>Single-Controller RAID Array Connected to Single-Initiator SCSI Buses</b>
3357 <p><img SRC="single_raid_store.gif
" height=295 width=360>
3358 <p>The following figure shows the termination in a dual-controller RAID
3359 array connected to two single-initiator SCSI buses.
3360 <p><b>Dual-Controller RAID Array Connected to Single-Initiator SCSI Buses</b>
3361 <p><img SRC="dual_raid_store.gif
" >
3363 <p><a NAME="single-fibre
"></a>
3365 2.4.4.3 Setting Up a Single-Initiator Fibre Channel Interconnect</h4>
3366 A single-initiator Fibre Channel interconnect has only one cluster system
3367 connected to it, and provides host isolation and better performance than
3368 a multi-initiator bus. Single-initiator interconnects ensure that each
3369 cluster system is protected from disruptions due to the workload, initialization,
3370 or repair of the other cluster system.
3371 <p>It is recommended that production environments use single-initiator
3372 SCSI buses or single-initiator Fibre Channel interconnects.
3373 <p>If you have a RAID array that has multiple host ports, and the RAID
3374 array provides simultaneous access to all the shared logical units from
3375 the host ports on the storage enclosure, you can set up two single-initiator
3376 Fibre Channel interconnects to connect each cluster system to the RAID
3377 array. If a logical unit can fail over from one controller to the other,
3378 the process must be transparent to the operating system.
3379 <p>The following figure shows a single-controller RAID array with two host
3380 ports, and the host bus adapters connected directly to the RAID controller,
3381 without using Fibre Channel hubs or switches.
3382 <p><b>Single-Controller RAID Array Connected to Single-Initiator Fibre
3383 Channel Interconnects</b>
3384 <p><img SRC="single_fibre.gif
" >
3385 <p>If you have a dual-controller RAID array with two host ports on each
3386 controller, you must use a Fibre Channel hub or switch to connect each
3387 host bus adapter to one port on both controllers, as shown in the following
3389 <p><b>Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel
3391 <p><img SRC="fibre_hub.gif
" >
3393 <p><a NAME="state-partitions
"></a>
3394 <h4 class="ChapterTitleTOC
">
3395 2.4.4.4 Configuring Quorum Partitions</h4>
3396 You must create two raw devices on shared disk storage for the primary
3397 quorum partition and the backup quorum partition. Each quorum partition
3398 must have a minimum size of 10 MB. The amount of data in a quorum partition
3399 is constant; it does not increase or decrease over time.
3400 <p>The quorum partitions are used to hold cluster state information. Periodically,
3401 each cluster system writes its status (either UP or DOWN), a timestamp,
3402 and the state of its services. In addition, the quorum partitions contain
3403 a version of the cluster database. This ensures that each cluster system
3404 has a common view of the cluster configuration.
3405 <p>To monitor cluster health, the cluster systems periodically read state
3406 information from the primary quorum partition and determine if it is up
3407 to date. If the primary partition is corrupted, the cluster systems read
3408 the information from the backup quorum partition and simultaneously repair
3409 the primary partition. Data consistency is maintained through checksums
3410 and any inconsistencies between the partitions are automatically corrected.
3411 <p>If a system is unable to write to both quorum partitions at startup
3412 time, it will not be allowed to join the cluster. In addition, if an active
3413 cluster system can no longer write to both quorum partitions, the system
3414 will remove itself from the cluster by rebooting (and may be remotely power
3415 cycled by the healthy cluster member).
3416 <p>You must adhere to the following <b>quorum partition requirements</b>:
3419 Both quorum partitions must have a minimum size of 10 MB.</li>
3423 Quorum partitions must be raw devices. They cannot contain file systems.</li>
3427 The quorum partitions must be located on the same shared SCSI bus or the
3428 same RAID controller. This prevents a situation in which each cluster system
3429 has access to only one of the partitions.</li>
3433 Quorum partitions can be used only for cluster state and configuration
3436 The following are <b>recommended guidelines</b> for configuring the quorum
3440 It is strongly recommended that you set up a RAID subsystem for shared
3441 storage, and use RAID 1 (mirroring) to make the logical unit that contains
3442 the quorum partitions highly available. Optionally, you can use parity
3443 RAID for high availability. Do not use RAID 0 (striping) for the quorum
3450 <p>Otherwise, put both quorum partitions on the same disk.
3453 Do not put the quorum partitions on a disk that contains heavily-accessed
3454 service data. If possible, locate the quorum partitions on disks that contain
3455 service data that is lightly accessed.</li>
3457 See <a href="#partition
">Partitioning Disks</a> and <a href="#rawdevices
">Creating
3458 Raw Devices</a><a href="#software-rawdevices
"> </a>for more information
3459 about setting up the quorum partitions.
3460 <p>See <a href="#software-rawdevices
">Editing the rawdevices File</a> for
3461 information about editing the <b><font face="Courier New, Courier, mono
">rawdevices</font></b>
3462 file to bind the raw character devices to the block devices each time the
3463 cluster systems boot.
3466 <p><a NAME="partition
"></a>
3468 2.4.4.5 Partitioning Disks</h4>
3469 After you set up the shared disk storage hardware, you must partition the
3470 disks so they can be used in the cluster. You can then create file systems
3471 or raw devices on the partitions. For example, you must create two raw
3472 devices for the quorum partitions, using the guidelines described in <a href="#state-partitions
">Configuring
3473 Quorum Partitions.</a>
3474 <p>Invoke the interactive <b><font face="Courier New, Courier, mono
">fdisk</font></b>
3475 command to modify a disk partition table and divide the disk into partitions.
3476 Use the <b><font face="Courier New, Courier, mono
">p</font></b> command
3477 to display the current partition table. Use the <b><font face="Courier New, Courier, mono
">n</font></b>
3478 command to create a new partition.
3479 <p>The following example shows how to use the <b><font face="Courier New, Courier, mono
">fdisk</font></b>
3480 command to partition a disk:
3483 Invoke the interactive <b><font face="Courier New, Courier, mono
">fdisk</font></b>
3484 command, specifying an available shared disk device. At the prompt, specify
3485 the <b><font face="Courier New, Courier, mono
">p</font></b> command to
3486 display the current partition table. For example:</li>
3488 <pre><font size=-1># <b>fdisk /dev/sde
3489 </b>Command (m for help): <b>p</b>
3491 Disk /dev/sde: 255 heads, 63 sectors, 2213 cylinders
3492 Units = cylinders of 16065 * 512 bytes
3494 Device Boot Start End Blocks Id System
3495 /dev/sde1 1 262 2104483+ 83 Linux
3496 /dev/sde2 263 288 208845 83 Linux</font></pre>
3499 Determine the number of the next available partition, and specify the <b><font face="Courier New, Courier, mono
">n</font></b>
3500 command to add the partition. If there are already three partitions on
3501 the disk, specify <b><font face="Courier New, Courier, mono
">e</font></b>
3502 for extended partition or <b><font face="Courier New, Courier, mono
">p</font></b>
3503 to create a primary partition. For example:</li>
3505 <pre><font size=-1>Command (m for help): <b>n</b>
3506 Command action
3507 e extended
3508 p primary partition (1-4)</font></pre>
3511 Specify the partition number that you want. For example:</li>
3513 <pre><font size=-1>Partition number (1-4): <b>3</b></font></pre>
3516 Press the <b><font face="Courier New, Courier, mono
">Enter</font></b> key
3517 or specify the next available cylinder. For example:</li>
3519 <pre><font size=-1>First cylinder (289-2213, default 289): <b>289</b></font></pre>
3522 Specify the partition size that is required. For example:</li>
3524 <pre><font size=-1>Last cylinder or +size or +sizeM or +sizeK (289-2213, default 2213): <b>+2000M</b></font></pre>
3525 Note that large partitions will increase the cluster service failover time
3526 if a file system on the partition must be checked with <b><font face="Courier New, Courier, mono
">fsck</font></b>.
3527 Quorum partitions must be at least 10 MB.
3529 Specify the <b><font face="Courier New, Courier, mono
">w</font></b> command
3530 to write the new partition table to disk. For example:</li>
3532 <pre><font size=-1>Command (m for help): <b>w</b>
3533 The partition table has been altered!
3535 Calling ioctl() to re-read partition table.
3537 WARNING: If you have created or modified any DOS 6.x
3538 partitions, please see the fdisk manual page for additional
3541 Syncing disks.</font></pre>
3544 If you added a partition while both cluster systems are powered on and
3545 connected to the shared storage, you must reboot the other cluster system
3546 in order for it to recognize the new partition.</li>
3548 After you partition a disk, you can format it for use in the cluster. You
3549 must create raw devices for the quorum partitions. You can also format
3550 the remainder of the shared disks as needed by the cluster services. For
3551 example, you can create file systems or raw devices on the partitions.
3552 <p>See <a href="#rawdevices
">Creating Raw Devices</a> and <a href="#filesystems
">Creating
3553 File Systems</a> for more information.
3556 <p><a NAME="rawdevices
"></a>
3558 2.4.4.6 Creating Raw Devices</h4>
3559 After you partition the shared storage disks, as described in <a href="#partition
">Partitioning
3560 Disks</a>, you can create raw devices on the partitions. File systems are
3561 block devices (for example, <b><font face="Courier New, Courier, mono
">/dev/sda1</font></b>)
3562 that cache recently-used data in memory in order to improve performance.
3563 Raw devices do not utilize system memory for caching. See <a href="#filesystems
">Creating
3564 File Systems</a> for more information.
3565 <p>Linux supports raw character devices that are not hard-coded against
3566 specific block devices. Instead, Linux uses a character major number (currently
3567 162) to implement a series of unbound raw devices in the <b><font face="Courier New, Courier, mono
">/dev/raw</font></b>
3568 directory. Any block device can have a character raw device front-end,
3569 even if the block device is loaded later at runtime.
3570 <p>To create a raw device, edit the <b>/etc/sysconfig/rawdevices</b> file
3571 to bind a raw character device to the appropriate block device. Once bound
3572 to a block device, a raw device can be opened, read, and written.
3573 <p>You must create raw devices for the quorum partitions. In addition,
3574 some database applications require raw devices, because these applications
3575 perform their own buffer caching for performance purposes. Quorum partitions
3576 cannot contain file systems because if state data was cached in system
3577 memory, the cluster systems would not have a consistent view of the state
3582 Raw character devices must be bound to block devices each time a system
3583 boots. To ensure that this occurs, edit the <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices</font></b>
3584 file and specify the quorum partition bindings. If you are using a raw
3585 device in a cluster service, you can also use this file to bind the devices
3586 at boot time. See <a href="#software-rawdevices
">Editing the rawdevices
3587 File</a> for more information.
3588 <p>Query all the raw devices by using the<b><font face="Courier New, Courier, mono
">
3589 raw -aq</font></b> command:
3591 <br>/dev/raw/raw1 bound to major 8, minor 17
3592 <br>/dev/raw/raw2 bound to major 8, minor 18
3593 <p>Note that, for raw devices, there is no cache coherency between the
3594 raw device and the block device. In addition, requests must be 512-byte
3595 aligned both in memory and on disk. For example, the standard <b><font face="Courier New, Courier, mono
">dd</font></b>
3596 command cannot be used with raw devices because the memory buffer that
3597 the command passes to the write system call is not aligned on a 512-byte
3599 <p><a NAME="filesystems
"></a>
3601 2.4.4.7 Creating File Systems</h4>
3602 Use the <b><font face="Courier New, Courier, mono
">mkfs</font></b> command
3603 to create an <b><font face="Courier New, Courier, mono
">ext2</font></b>
3604 file system on a partition. Specify the drive letter and the partition
3605 number. For example:
3606 <pre># <b>mkfs /dev/sde3</b></pre>
3607 For optimal performance, use a 4 KB block size when creating shared file
3608 systems. Note that some of the <b><font face="Courier New, Courier, mono
">mkfs</font></b>
3609 file system build utilities default to a 1 KB block size, which can cause
3610 long <b><font face="Courier New, Courier, mono
">fsck</font></b> times.
3611 <p>Similarly, to create an <b>ext3</b> filesystem, the following command
3613 <p># <b>mkfs -t ext2 -j /dev/sde3</b>
3616 <hr noshade width="80%
">
3618 <a NAME="software
"></a></h1>
3621 3 Cluster Software Installation and Configuration</h1>
3622 After you install and configure the cluster hardware, you must install
3623 the cluster software and initialize the cluster systems. The following
3627 <a href="#software-steps
">Steps for installing and initializing the cluster
3631 <a href="#software-check
">Checking the cluster configuration</a></li>
3634 <a href="#software-logging
">Configuring syslog event logging</a></li>
3637 <a href="#software-ui
">Using the cluadmin utility</a></li>
3640 <a href="#software-gui
">Configuring and using the graphical user interface</a></li>
3645 <a NAME="software-steps
"></a></h2>
3648 3.1 Steps for Installing and Initializing the Cluster Software</h2>
3649 <i>Editorial comment: this section may be unnecessary as the cluster rpm
3650 is automatically installed.</i>
3651 <p>Before installing Red Hat Cluster Manager, be sure that you have installed
3652 all the required software and kernel patches, as described in <a href="#linux-dist
">Linux
3653 Distribution and Kernel Requirements</a>.
3654 <p>If you are updating the cluster software and want to preserve the existing
3655 cluster configuration database, you must back up the cluster database and
3656 stop the cluster software before you reinstall. See <a href="#cluster-reinstall
">Updating
3657 the Cluster Software</a> for more information.
3660 To install Red Hat Cluster Manager, invoke the <b><font face="Courier New, Courier, mono
">rpm
3661 --install clumanager-1.0.4-1.rpm</font></b> command. (The specific
3662 release numbers will change.)</li>
3668 <p>To initialize and start the cluster software, perform the following
3672 On both cluster systems, add a group named <b><font face="Courier New, Courier, mono
">cluster</font></b>
3673 to the <b><font face="Courier New, Courier, mono
">/etc/group</font></b>
3677 Edit the <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices
3679 on both cluster systems and specify the raw device special files and character
3680 devices for the primary and backup quorum partitions. You also must set
3681 the mode for the raw devices so that all users have read permission. See
3682 <a href="#state-partitions
">Configuring
3683 the Quorum Partitions</a> and <a href="#software-rawdevices
">Editing the
3684 rawdevices File</a> for more information.</li>
3688 Reboot the systems. The first time that you reboot, the cluster will log
3689 messages stating that the quorum daemon is unable to determine which device
3690 special file to use as a quorum partition. This message does not indicate
3691 a problem and can be ignored. It occurs because you have not yet run the
3692 <b><font face="Courier New, Courier, mono
">cluconfig</font></b>
3697 Run the <b><font face="Courier New, Courier, mono
">/sbin/cluconfig</font></b>
3698 utility on one cluster system. If you are updating the cluster software,
3699 the utility will prompt you whether to use the existing cluster database.
3700 If you do not choose to use the database, the utility will remove the cluster
3707 <p>If you are not using an existing cluster database, the utility will
3708 prompt you for the following cluster-specific information, which will be
3709 entered into the <b><font face="Courier New, Courier, mono
">member</font></b>
3710 fields in the cluster database, a copy of which is located in the <b><font face="Courier New, Courier, mono
">/etc/cluster.conf</font></b>
3715 Raw device special files for the primary and backup quorum partitions,
3716 as specified in the <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices</font></b>
3717 file (for example,<b><font face="Courier New, Courier, mono
"> /dev/raw/raw1</font></b>
3718 and <b><font face="Courier New, Courier, mono
">/dev/raw/raw2</font></b>)</li>
3722 Cluster system host names that are returned by the <b><font face="Courier New, Courier, mono
">hostname</font></b>command</li>
3726 Number of heartbeat connections (channels), both Ethernet and serial</li>
3730 Device special file for each heartbeat serial line connection (for example,
3731 <b><font face="Courier New, Courier, mono
">/dev/ttyS1</font></b>)</li>
3735 IP host name associated with each heartbeat Ethernet interface</li>
3739 Device special files for the serial ports to which the power switches are
3740 connected, if any (for example, <b><font face="Courier New, Courier, mono
">/dev/ttyS0</font></b>)</li>
3744 Power switch type (for example, <b><font face="Courier New, Courier, mono
">RPS10</font></b>
3745 or <b><font face="Courier New, Courier, mono
">None</font></b> if you are
3746 not using power switches)</li>
3748 See <a href="#software-config
">Example of the cluconfig Utility</a> for
3749 an example of running the utility.
3751 After you complete the cluster initialization on one cluster system, perform
3752 the following tasks on the other cluster system:</li>
3757 Run the <b><font face="Courier New, Courier, mono
">/sbin/cluconfig --init=<i>raw_file</i></font></b>
3758 command, where <b><i><font face="Courier New, Courier, mono
">raw_file</font></i></b>
3759 specifies the primary quorum partition. The script will use the information
3760 that you specified for the first cluster system as defaults. For example:</li>
3762 <pre># <b>cluconfig --init=/dev/raw/raw1</b></pre>
3766 Check the cluster configuration:</li>
3771 Invoke the <b><font face="Courier New, Courier, mono
">cludiskutil</font></b>
3772 utility with the <b><font face="Courier New, Courier, mono
">-t</font></b>
3773 option on both cluster systems to ensure that the quorum partitions map
3774 to the same physical device. See <a href="#cludiskutil
">Testing the Quorum
3775 Partitions</a> for more information.</li>
3779 If you are using power switches, invoke the <b>clustonith </b>command on
3780 both cluster systems to test the remote connections to the power switches.
3781 See <a href="#pswitch
">Testing the Power Switches</a> for more information.</li>
3785 Configure event logging so that cluster messages are logged to a separate
3786 file. See <a href="#software-logging
">Configuring syslog Event Logging</a>
3787 for information.</li>
3791 Start the cluster by invoking the <b><font face="Courier New, Courier, mono
">cluster
3792 start </font></b>command located in the System V <b><font face="Courier New, Courier, mono
">init</font></b>
3793 directory on both cluster systems. For example:</li>
3795 <pre># <b>service cluster start</b></pre>
3797 After you have initialized the cluster, you can add cluster services. See
3798 <a href="#software-ui
">Using
3799 the cluadmin Utility</a>, <a href="#software-gui
">Configuring and Using
3800 the Graphical User Interface</a>, and <a href="#service-configure
">Configuring
3801 a Service</a> for more information.
3803 <p><a NAME="software-rawdevices
"></a>
3805 3.1.1 Editing the rawdevices File</h3>
3806 The <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices</font></b>
3807 file is used to map the raw devices for the quorum partitions each time
3808 a cluster system boots. As part of the cluster software installation procedure,
3809 you must edit the <b><font face="Courier New, Courier, mono
">rawdevices
3811 on each cluster system and specify the raw character devices and block
3812 devices for the primary and backup quorum partitions. This enables the
3813 cluster graphical interface to work correctly.
3814 <p>If you are using raw devices in a cluster service, you can also use
3815 the <b><font face="Courier New, Courier, mono
">rawdevices</font></b> file
3816 to bind the devices at boot time. Edit the file and specify the raw character
3817 devices and block devices that you want to bind each time the system boots.
3818 <p>The following is an example rawdevices file which designates two quorum
3820 <pre># raw device bindings</pre>
3822 <pre># format: <rawdev> <major> <minor></pre>
3824 <pre># <rawdev> <blockdev></pre>
3826 <pre># example: /dev/raw/raw1 /dev/sda1</pre>
3828 <pre># /dev/raw/raw2 8 5</pre>
3830 <pre>/dev/raw/raw1 /dev/sdb1</pre>
3832 <pre>/dev/raw/raw2 /dev/sdb2</pre>
3834 <p><br>See <a href="#state-partitions
">Configuring Quorum Partitions</a>
3835 for more information about setting up the quorum partitions. See <a href="#rawdevices
">Creating
3836 Raw Devices</a> for more information on using the <b><font face="Courier New, Courier, mono
">raw</font></b>
3837 command to bind raw character devices to block devices.
3838 <p><a NAME="software-config
"></a>
3840 3.1.2 Example of the cluconfig Utility</h3>
3841 This section includes an example of the <b><font face="Courier New, Courier, mono
">cluconfig</font></b>
3842 cluster configuration utility, which prompts you for information about
3843 the cluster members, and then enters the information into the cluster database,
3844 a copy of which is located in the <b><font face="Courier New, Courier, mono
">cluster.conf</font></b>
3845 file. In the example, the information entered at the <b><font face="Courier New, Courier, mono
">cluconfig</font></b>
3846 prompts applies to the following configuration:
3849 On the <b><font face="Courier New, Courier, mono
">storage0</font></b> cluster
3856 <p>Ethernet heartbeat channels: <b><font face="Courier New, Courier, mono
">storage0</font></b>
3857 and <b><font face="Courier New, Courier, mono
">cstorage0</font></b>
3858 <br>Serial heartbeat channel: <b><font face="Courier New, Courier, mono
">/dev/ttyS1</font></b>
3859 <br>Power switch serial port: <b><font face="Courier New, Courier, mono
">/dev/ttyC0</font></b>
3860 <br>Power switch: <b><font face="Courier New, Courier, mono
">RPS10</font></b>
3861 <br>Quorum partitions: <b><font face="Courier New, Courier, mono
">/dev/raw/raw1</font></b>
3862 and <b><font face="Courier New, Courier, mono
">/dev/raw/raw2</font></b>
3865 On the <b><font face="Courier New, Courier, mono
">storage1</font></b> cluster
3872 <p>Ethernet heartbeat channels:<b><font face="Courier New, Courier, mono
">
3873 storage1</font></b> and <b><font face="Courier New, Courier, mono
">cstorage1</font></b>
3874 <br>Serial heartbeat channel: <b><font face="Courier New, Courier, mono
">/dev/ttyS1</font></b>
3875 <br>Power switch serial port: <b><font face="Courier New, Courier, mono
">/dev/ttyS0</font></b>
3876 <br>Power switch: <b><font face="Courier New, Courier, mono
">RPS10</font></b>
3877 <br>Quorum partitions: <b><font face="Courier New, Courier, mono
">/dev/raw/raw1</font></b>
3878 and <b><font face="Courier New, Courier, mono
">/dev/raw/raw2</font></b></ul>
3879 <i>Editorial comment: need to put an updated screen capture of cluconfig
3881 <pre><font size=-1># <b>/sbin/cluconfig
3882 </b>------------------------------------
3883 Cluster Member Configuration Utility
3884 ------------------------------------
3885 Version: 1.1.2 Built: Thu Oct 26 12:09:30 EDT 2000
3887 This utility sets up the member systems of a 2-node cluster.
3888 It prompts you for the following information:
3891 o Number of heartbeat channels
3892 o Information about the type of channels and their names
3893 o Raw quorum partitions, both primary and shadow
3894 o Power switch type and device name
3896 In addition, it performs checks to make sure that the information
3897 entered is consistent with the hardware, the Ethernet ports, the raw
3898 partitions and the character device files.
3900 After all the information is entered, it initializes the partitions
3901 and saves the configuration information to the quorum partitions.
3903 - Checking that cluster daemons are stopped: done
3905 Your cluster configuration should include power switches for optimal
3908 - Does the cluster configuration include power switches? (yes/no) [yes]: <b>y
3910 </b>----------------------------------------
3911 Setting information for cluster member 0
3912 ----------------------------------------
3913 Enter name of cluster member [storage0]: <b>storage0
3914 </b>Looking for host storage0 (may take a few seconds)...
3916 Cluster member name set to: storage0
3918 Enter number of heartbeat channels (minimum = 1) [1]: <b>3
3919 </b>You selected 3 channels
3920 Information about channel 0:
3921 Channel type: net or serial [net]: <b>net
3922 </b>Channel type set to: net
3923 Enter hostname of cluster member storage0 on heartbeat channel 0 [storage0]: <b>storage0
3924 </b>Looking for host storage0 (may take a few seconds)...
3926 Hostname corresponds to an interface on member 0
3927 Channel name set to: storage0
3929 Information about channel 1:
3930 Channel type: net or serial [net]: <b>net
3931 </b>Channel type set to: net
3932 Enter hostname this interface responds to [storage0]: <b>cstorage0
3933 </b>Looking for host cstorage0 (may take a few seconds)...
3934 Host cstorage0 found
3935 Hostname corresponds to an interface on member 0
3936 Channel name set to: cstorage0
3938 Information about channel 2:
3939 Channel type: net or serial [net]: <b>serial
3940 </b>Channel type set to: serial
3941 Enter device name [/dev/ttyS1]: <b>/dev/ttyS1
3942 </b>Device /dev/ttyS1 found and no getty running on it
3943 Device name set to: /dev/ttyS1
3945 Setting information about Quorum Partitions
3946 Enter Primary Quorum Partition [/dev/raw/raw1]: <b>/dev/raw/raw1
3947 </b>Raw device /dev/raw/raw1 found
3948 Primary Quorum Partition set to /dev/raw/raw1
3949 Enter Shadow Quorum Partition [/dev/raw/raw2]: <b>/dev/raw/raw2
3950 </b>Raw device /dev/raw/raw2 found
3951 Shadow Quorum Partition set to /dev/raw/raw2
3953 Information about power switch connected to member 0
3954 Enter serial port for power switch [/dev/ttyC0]: <b>/dev/ttyC0
3955 </b>Device /dev/ttyC0 found and no getty running on it
3956 Serial port for power switch set to /dev/ttyC0
3957 Specify one of the following switches (RPS10/APC) [RPS10]: <b>RPS10
3958 </b>Power switch type set to RPS10
3960 ----------------------------------------
3961 Setting information for cluster member 1
3962 ----------------------------------------
3963 Enter name of cluster member: <b>storage1
3964 </b>Looking for host storage1 (may take a few seconds)...
3966 Cluster member name set to: storage1
3968 You previously selected 3 channels
3969 Information about channel 0:
3970 Channel type selected as net
3971 Enter hostname of cluster member storage1 on heartbeat channel 0: <b>storage1
3972 </b>Looking for host storage1 (may take a few seconds)...
3974 Channel name set to: storage1
3976 Information about channel 1:
3977 Channel type selected as net
3978 Enter hostname this interface responds to [storage1]: <b>cstorage1
3980 </b>Information about channel 2:
3981 Channel type selected as serial
3982 Enter device name [/dev/ttyS1]: <b>/dev/ttyS1
3983 </b>Device name set to: /dev/ttyS1
3985 Setting information about Quorum Partitions
3986 Enter Primary Quorum Partition [/dev/raw/raw1]: <b>/dev/raw/raw1
3987 </b>Primary Quorum Partition set to /dev/raw/raw1
3988 Enter Shadow Quorum Partition [/dev/raw/raw2]: <b>/dev/raw/raw2
3989 </b>Shadow Quorum Partition set to /dev/raw/raw2
3991 Information about power switch connected to member 1
3992 Enter serial port for power switch [/dev/ttyS0]: <b>/dev/ttyS0
3993 </b>Serial port for power switch set to /dev/ttyS0
3994 Specify one of the following switches (RPS10/APC) [RPS10]: <b>RPS10
3995 </b>Power switch type set to RPS10
3997 ------------------------------------
3998 The following choices will be saved:
3999 ------------------------------------
4000 ---------------------
4001 Member 0 information:
4002 ---------------------
4004 Primary quorum partition set to /dev/raw/raw1
4005 Shadow quorum partition set to /dev/raw/raw2
4006 Heartbeat channels: 3
4007 Channel type: net. Name: storage0
4008 Channel type: net. Name: cstorage0
4009 Channel type: serial. Name: /dev/ttyS1
4010 Power Switch type: RPS10. Port: /dev/ttyC0
4012 ---------------------
4013 Member 1 information:
4014 ---------------------
4016 Primary quorum partition set to /dev/raw/raw1
4017 Shadow quorum partition set to /dev/raw/raw2
4018 Heartbeat channels: 3
4019 Channel type: net. Name: storage1
4020 Channel type: net. Name: cstorage1
4021 Channel type: serial. Name: /dev/ttyS1
4022 Power Switch type: RPS10. Port: /dev/ttyS0
4023 ------------------------------------
4025 Save changes? yes/no [yes]: <b>yes
4026 </b>Writing to output configuration file...done.
4027 Changes have been saved to /etc/cluster.conf
4028 ----------------------------
4029 Setting up Quorum Partitions
4030 ----------------------------
4031 Quorum partitions have not been set up yet.
4032 Run cludiskutil -I to set up the quorum partitions now? yes/no [yes]: <b>yes</b></font></pre>
4034 <pre><font size=-1>Saving configuration information to quorum partition:
4035 ------------------------------------------------------------------
4036 Setup on this member is complete. If errors have been reported,
4039 If you have not already set up the other cluster member, invoke the following
4040 command on the other cluster member:
4042 # /sbin/cluconfig --init=/dev/raw/raw1
4044 After running cluconfig on the other member system, you can start the
4045 cluster daemons on each cluster system by invoking the cluster start
4046 script located in the System V init directory. For example:
4048 # /etc/rc.d/init.d/cluster start</font>
4052 <p><br><a NAME="software-check
"></a>
4054 3.2 Checking the Cluster Configuration</h2>
4055 To ensure that you have correctly configured the cluster software, check
4056 the configuration by using tools located in the <b><font face="Courier New, Courier, mono
">/sbin</font></b>
4060 Test the quorum partitions and ensure that they are accessible</li>
4066 <p>Invoke the <b><font face="Courier New, Courier, mono
">cludiskutil</font></b>
4067 utility with the <b><font face="Courier New, Courier, mono
">-t</font></b>
4068 option to test the accessibility of the quorum partitions. See <a href="#cludiskutil
">Testing
4069 the Quorum Partitions</a> for more information.
4072 Test the operation of the power switches</li>
4078 <p>If you are using power switches, run the <b>clustonith</b> command on
4079 each cluster system to ensure that it can remotely power-cycle the other
4080 cluster system. Do not run this command while the cluster software is running.
4081 See <a href="#pswitch
">Testing Power Switches</a> for more information.
4084 Ensure that both cluster systems are running the same software version</li>
4090 <p>Invoke the <b><font face="Courier New, Courier, mono
">rpm -qa clumanager</font></b>
4091 command on each cluster system to display the revision of the installed
4093 The following sections describe these tools.
4096 <p><a NAME="cludiskutil
"></a>
4098 3.2.1 Testing the Quorum Partitions</h3>
4099 The quorum partitions must refer to the same physical device on both cluster
4100 systems. Invoke the <b><font face="Courier New, Courier, mono
">cludiskutil</font></b>
4101 utility with the<b><font face="Courier New, Courier, mono
"> -t </font></b>command
4102 to test the quorum partitions and verify that they are accessible.
4103 <p>If the command succeeds, run the <b><font face="Courier New, Courier, mono
">cludiskutil
4104 -p</font></b> command on both cluster systems to display a summary of the
4105 header data structure for the quorum partitions. If the output is different
4106 on the systems, the quorum partitions do not point to the same devices
4107 on both systems. Check to make sure that the raw devices exist and are
4108 correctly specified in the <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices</font></b>
4109 file. See <a href="#state-partitions
">Configuring the Quorum Partitions</a>
4110 for more information.
4111 <p>The following example shows that the quorum partitions refer to the
4112 same physical device on two cluster systems:
4113 <pre><font size=-1>[root@devel0 /root]# <b>cludiskutil -p
4114 </b>----- Shared State Header ------
4117 Updated on Thu Sep 14 05:43:18 2000
4119 --------------------------------
4120 [root@devel0 /root]#
4122 [root@devel1 /root]# <b>/sbin/cludiskutil -p
4123 </b>----- Shared State Header ------
4126 Updated on Thu Sep 14 05:43:18 2000
4128 --------------------------------
4129 [root@devel1 /root]#</font></pre>
4130 The <b><font face="Courier New, Courier, mono
">Magic#</font></b> and <b><font face="Courier New, Courier, mono
">Version</font></b>fields
4131 will be the same for all cluster configurations. The last two lines of
4132 output indicate the date that the quorum partitions were initialized with
4133 <b><font face="Courier New, Courier, mono
">cludiskutil -I,</font></b> and
4134 the numeric identifier for the cluster system that invoked the initialization
4136 <p>If the output of the <b><font face="Courier New, Courier, mono
">cludiskutil</font></b>
4137 utility with the <b><font face="Courier New, Courier, mono
">-p</font></b>
4138 option is not the same on both cluster systems, you can do the following:
4141 Examine the <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices
4143 on each cluster system and ensure that you have accurately specified the
4144 raw character devices and block devices for the primary and backup quorum
4145 partitions. If not edit the file and correct any mistakes. Then re-run
4146 the <b><font face="Courier New, Courier, mono
">cluconfig</font></b> utility.
4147 See <a href="#software-rawdevices
">Editing the rawdevices File</a> for
4148 more information.</li>
4152 Ensure that you have created the raw devices for the quorum partitions
4153 on each cluster system. See <a href="#state-partitions
">Configuring the
4154 Quorum Partitions</a> for more information.</li>
4158 On each cluster system, examine the system startup messages at the point
4159 where the system probes the SCSI subsystem to determine the bus configuration.
4160 Verify that both cluster systems identify the same shared storage devices
4161 and assign them the same name.</li>
4165 Verify that a cluster system is not attempting to mount a file system on
4166 the quorum partition. For example, make sure that the actual device (for
4167 example, <b><font face="Courier New, Courier, mono
">/dev/sdb1</font></b>)
4168 is not included in an <b><font face="Courier New, Courier, mono
">/etc/fstab</font></b>
4171 After you perform these tasks, re-run the <b><font face="Courier New, Courier, mono
">cludiskutil</font></b>
4172 utility with the <b><font face="Courier New, Courier, mono
">-p</font></b>
4175 <p><a NAME="pswitch
"></a>
4177 3.2.2 Testing the Power Switches</h3>
4178 If you are using power switches, after you install the cluster software,
4179 but before starting the cluster, use the <b>clustonith</b> command to test
4180 the power switches. Invoke the command on each cluster system to ensure
4181 that it can remotely power-cycle the other cluster system.
4182 <p>The <b>clustonith</b> command can accurately test a power switch only
4183 if the cluster software is not running. This is due to the fact that
4184 for serial attached switches, only one program at a time can access the
4185 serial port that connects a power switch to a cluster system. When you
4186 invoke the <b>clustonith</b> command, it checks the status of the cluster
4187 software. If the cluster software is running, the command exits with a
4188 message to stop the cluster software.
4189 <p>The format of the <b>clustonith</b> command is as follows:
4190 <pre>clustonith [-sSlLvr] [-t devicetype] [-F options-file] [-p stonith-parameters]
4192 -s Silent mode, supresses error and log messages
4193 -S Display switch status
4194 -l List the hosts a switch can access
4195 -L List the set of supported switch types
4196 -r hostname Power cycle the specified host
4197 -v Increases verbose debugging level</pre>
4199 <pre><i>Editorial note: we need a new manpage for clustonith(8). Once that is in place, there's no need to include all that info here.</i></pre>
4200 When testing power switches, the first step is to ensure that each cluster
4201 member can successfully communicate with its attached power switch. The
4202 following example of the <b>clustonith</b> command output shows that the
4203 cluster member is able to communicate with its power switch:
4204 <p># <b>clustonith -S</b>
4205 <br>WTI Network Power Switch device OK.
4206 <br>An example output of the <b>clustonith</b> command when it is unable
4207 to communicate with its power switch appears below:
4208 <br># <b>clustonith -S</b>
4209 <br>Unable to determine power switch type.
4210 <br>Unable to determine default power switch type.
4212 <p>The above error indicates could be indicitive of the following types
4216 For serial attached power switches:</li>
4220 Verify that the device special file for the remote power switch connection
4221 serial port (for example, <b><font face="Courier New, Courier, mono
">/dev/ttyS0</font></b>)
4222 is specified correctly in the cluster database, as established via the
4225 If necessary, use a terminal emulation package like <b><font face="Courier New, Courier, mono
">minicom</font></b>
4226 to test if the cluster system can access the serial port.</li>
4230 Ensure that a non-cluster program (for example, a getty program) is not
4231 using the serial port for the remote power switch connection. You can use
4232 the <b><font face="Courier New, Courier, mono
">lsof</font></b> command
4233 to perform this task.</li>
4237 Check that the cable connection to the remote power switch is correct.
4238 Verify that you are using the correct type of cable (for example, an RPS-10
4239 power switch requires a null modem cable), and all connections are secure.</li>
4243 Verify that any physical dip switches or rotary switches on the power switch
4244 are set properly. If you are using an RPS-10 power switch, see <a href="#rps-
10">Setting
4245 Up an RPS-10 Power Switch</a> for more information.</li>
4249 For network based power switches:</li>
4253 Verify that the network connection to network based switches is operational.
4254 Most switches have a <i>link</i> light which indicates connectivity.</li>
4257 You should be able to <b>ping </b>the network switch; if not it may not
4258 be properly configured for its network parameters.</li>
4261 Verify that the correct password and login name (depending on switch type)
4262 have been specified in the cluster configuration database (as established
4263 by running <b>cluconfig</b>). A useful diagnostic approach is to
4264 verify that you can <b>telnet</b> to the network switch using the same
4265 parameters as specified in the cluster configuration.</li>
4268 After you have successfully verified communication with the switch, you
4269 can then attempt to power cycle the other cluster member. Prior to
4270 doing this, it would be a good idea to verify that the other cluster member
4271 isn't actively performing any important functions (such as serving cluster
4272 services to active clients). The following command depicts a successful
4273 power cycle operation:
4274 <p>[root@clu4 /]# <b>clustonith -r clu3</b>
4275 <br>Successfully power cycled host clu3.
4276 <p><a NAME="release
"></a>
4278 3.2.3 Displaying the Cluster Software Version</h3>
4279 Invoke the <b><font face="Courier New, Courier, mono
">rpm -qa clumanager
4281 to display the revision of the installed cluster RPM. Ensure that both
4282 cluster systems are running the same version.
4283 <br><a NAME="software-logging
"></a>
4285 3.3 Configuring syslog Event Logging</h2>
4286 You should edit the <b><font face="Courier New, Courier, mono
">/etc/syslog.conf</font></b>
4287 file to enable the cluster to log events to a file that is different from
4288 the <b><font face="Courier New, Courier, mono
">/var/log/messages</font></b>default
4289 log file. Logging cluster messages to a separate file will help you diagnose
4291 <p>The cluster systems use the <b><font face="Courier New, Courier, mono
">syslogd</font></b>
4292 daemon to log cluster-related events to a file, as specified in the <b><font face="Courier New, Courier, mono
">/etc/syslog.conf</font></b>
4293 file. You can use the log file to diagnose problems in the cluster. It
4294 is recommended that you set up event logging so that the <b><font face="Courier New, Courier, mono
">syslogd</font></b>
4295 daemon logs cluster messages only from the system on which it is running.
4296 Therefore, you need to examine the log files on both cluster systems to
4297 get a comprehensive view of the cluster.
4298 <p>The <b><font face="Courier New, Courier, mono
">syslogd</font></b> daemon
4299 logs messages from the following cluster daemons:
4302 <b><font face="Courier New, Courier, mono
">cluquorumd</font></b> - Quorum
4306 <b><font face="Courier New, Courier, mono
">clusvcmgrd</font></b> - Service
4310 <b><font face="Courier New, Courier, mono
">clupowerd</font></b> - Power
4314 <b><font face="Courier New, Courier, mono
">cluhbd</font></b> - Heartbeat
4317 The importance of an event determines the severity level of the log entry.
4318 Important events should be investigated before they affect cluster availability.
4319 The cluster can log messages with the following severity levels, listed
4320 in the order of decreasing severity:
4323 <b><font face="Courier New, Courier, mono
">emerg</font></b> - The cluster
4324 system is unusable.</li>
4327 <b><font face="Courier New, Courier, mono
">alert</font></b> - Action must
4328 be taken immediately to address the problem.</li>
4331 <b><font face="Courier New, Courier, mono
">crit</font></b> - A critical
4332 condition has occurred.</li>
4335 <b><font face="Courier New, Courier, mono
">err </font></b>- An error has
4339 <b><font face="Courier New, Courier, mono
">warning</font></b> - A significant
4340 event that may require attention has occurred.</li>
4343 <b><font face="Courier New, Courier, mono
">notice</font></b> - An event
4344 that does not affect system operation has occurred.</li>
4347 <b><font face="Courier New, Courier, mono
">info</font></b> - An normal
4348 cluster operation has occurred.</li>
4350 The default logging severity levels for the cluster daemons are <b><font face="Courier New, Courier, mono
">warning</font></b>
4352 <p>Examples of log file entries are as follows:
4353 <pre><font size=-1>May 31 20:42:06 clu2 clusvcmgrd[992]: <info> Service Manager starting
4354 May 31 20:42:06 clu2 clusvcmgrd[992]: <info> mount.ksh info: /dev/sda3 is not mounted
4355 May 31 20:49:38 clu2 clulog[1294]: <notice> stop_service.ksh notice: Stopping service dbase_home
4356 May 31 20:49:39 clu2 clusvcmgrd[1287]: <notice> Service Manager received a NODE_UP event for stor5
4357 Jun 01 12:56:51 clu2 cluquorumd[1640]: <err> updateMyTimestamp: unable to update status block.
4358 Jun 01 12:34:24 clu2 cluquorumd[1268]: <warning> Initiating cluster stop
4359 Jun 01 12:34:24 clu2 cluquorumd[1268]: <warning> Completed cluster stop
4360 Jul 27 15:28:40 clu2 cluquorumd[390]: <err> shoot_partner: successfully shot partner. </font>
4361 <b>[1] [2] [3] [4] [5]</b></pre>
4362 Each entry in the log file contains the following information:
4363 <blockquote>[1]Timestamp
4364 <br>[2] Cluster system on which the event was logged
4365 <br>[3] Subsystem that generated the event
4366 <br>[4] Severity level of the event
4367 <br>[5] Description of the event</blockquote>
4368 After you configure the cluster software, you should edit the <b><font face="Courier New, Courier, mono
">/etc/syslog.conf</font></b>
4369 file to enable the cluster to log events to a file that is different from
4370 the default log file, <b><font face="Courier New, Courier, mono
">/var/log/messages</font></b>.
4371 Using a cluster-specific log file facilitates cluster monitoring and problem
4372 solving. To log cluster events to both the <b><font face="Courier New, Courier, mono
">/var/log/cluster</font></b>
4373 and <b><font face="Courier New, Courier, mono
">/var/log/messages</font></b>
4374 files, add lines similar to the following to the <b><font face="Courier New, Courier, mono
">/etc/syslog.conf</font></b>
4377 # Cluster messages coming in on local4 go to /var/log/cluster
4379 local4.* /var/log/cluster</pre>
4380 To prevent the duplication of messages and log cluster events only to the
4381 <b><font face="Courier New, Courier, mono
">/var/log/cluster</font></b>
4382 file, also add lines similar to the following to the <b><font face="Courier New, Courier, mono
">/etc/syslog.conf</font></b>
4384 <pre># Log anything (except mail) of level info or higher.
4385 # Don't log private authentication messages!
4386 *.info;mail.none;news.none;authpriv.none;local4.none /var/log/messages</pre>
4387 To apply the previous changes, you can invoke the <b><font face="Courier New, Courier, mono
">killall
4388 -HUP syslogd</font></b> command, or restart <b><font face="Courier New, Courier, mono
">syslog</font></b>
4389 with a command similar to <b><font face="Courier New, Courier, mono
">/etc/rc.d/init.d/syslog
4391 <p>In addition, you can modify the severity level of the events that are
4392 logged by the individual cluster daemons. See <a href="#cluster-logging
">Modifying
4393 Cluster Event Logging</a> for more information.
4395 <p><a NAME="software-ui
"></a>
4397 3.4 Using the cluadmin Utility</h2>
4398 The <b><font face="Courier New, Courier, mono
">cluadmin</font></b> utility
4399 provides a command-line user interface that enables you to monitor and
4400 manage the cluster systems and services. For example, you can use the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4401 utility to perform the following tasks:
4404 Add, modify, and delete services</li>
4407 Disable and enable services</li>
4410 Display cluster and service status</li>
4413 Modify cluster daemon event logging</li>
4416 Backup and restore the cluster database</li>
4419 <p><br>The cluster uses an advisory lock to prevent the cluster database
4420 from being simultaneously modified by multiple users on either cluster
4421 system. You can only modify the database if you hold the advisory lock.
4422 <p>When you invoke the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4423 utility, the cluster software checks if the lock is already assigned to
4424 a user. If the lock is not already assigned, the cluster software assigns
4425 you the lock. When you exit from the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4426 utility, you relinquish the lock.
4427 <p>If another user holds the lock, a warning will be displayed indicating
4428 that there is already a lock on the database. The cluster software gives
4429 you the option of taking the lock. If you take the lock, the previous holder
4430 of the lock can no longer modify the cluster database.
4431 <p>You should take the lock only if necessary, because uncoordinated simultaneous
4432 configuration sessions may cause unpredictable cluster behavior. In addition,
4433 it is recommended that you make only one change to the cluster database
4434 (for example, adding, modifying, or deleting services) at one time.
4435 <p>You can specify the following <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4436 command line options:
4439 <b><font face="Courier New, Courier, mono
">-d</font></b> or <b><font face="Courier New, Courier, mono
">--debug</font></b></dt>
4442 Displays extensive diagnostic information.</dd>
4446 <b><font face="Courier New, Courier, mono
">-h</font></b>, <b><font face="Courier New, Courier, mono
">-?</font></b>,
4447 or <b><font face="Courier New, Courier, mono
">--help</font></b></dt>
4450 Displays help about the utility, and then exits.</dd>
4454 <b><font face="Courier New, Courier, mono
">-n</font></b> or <b><font face="Courier New, Courier, mono
">--nointeractive</font></b></dt>
4457 Bypasses the cluadmin utility's top-level command loop processing. This
4458 option is used for cluadmin debugging purposes.</dd>
4462 <b><font face="Courier New, Courier, mono
">-t</font></b> or <b><font face="Courier New, Courier, mono
">--tcl</font></b></dt>
4465 Adds a Tcl command to the cluadmin utility's top- level command interpreter.
4466 To pass a Tcl command directly to the utility's internal Tcl interpreter,
4467 at the <b><font face="Courier New, Courier, mono
">cluadmin></font></b>
4468 prompt, preface the Tcl command with <b><font face="Courier New, Courier, mono
">tcl</font></b>.
4469 This option is used for cluadmin debugging purposes.</dd>
4473 <b><font face="Courier New, Courier, mono
">-V</font></b> or <b><font face="Courier New, Courier, mono
">--version</font></b></dt>
4476 Displays information about the current version of cluadmin.</dd>
4478 When you invoke the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4479 utility without the <b><font face="Courier New, Courier, mono
">-n</font></b>
4480 option, the <b><font face="Courier New, Courier, mono
">cluadmin></font></b>
4481 prompt appears. You can then specify commands and subcommands. The following
4482 table describes the commands and subcommands for the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4485 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%
" >
4486 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#
99CCCC
">
4489 cluadmin Command</h3>
4494 cluadmin Subcommand</h3>
4504 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"><b><font face="Courier New, Courier, mono
">help</font></b></td>
4506 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
">None</td>
4508 <td WIDTH="67%
">Displays help for the specified <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4509 command or subcommand. For example:
4510 <pre>cluadmin> <b>help service add </b></pre>
4515 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"><b><font face="Courier New, Courier, mono
">cluster</font></b></td>
4517 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">status</font></b></td>
4519 <td WIDTH="67%
">Displays a snapshot of the current cluster status. See
4520 <a href="#cluster-status
">Displaying
4521 Cluster and Service Status</a> for information. For example:
4522 <pre>cluadmin> <b>cluster status</b></pre>
4527 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"></td>
4529 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">monitor</font></b></td>
4531 <td WIDTH="67%
">Continuously displays snapshots of the cluster status at
4532 five-second intervals. Press the <b><font face="Courier New, Courier, mono
">Return</font></b>
4533 or <b><font face="Courier New, Courier, mono
">Enter</font></b> key to stop
4534 the display. You can specify the <b><font face="Courier New, Courier, mono
">-interval</font></b>
4535 option with a numeric argument to display snapshots at the specified time
4536 interval (in seconds). In addition, you can specify the <b><font face="Courier New, Courier, mono
">-clear</font></b>
4537 option with a yes argument to clear the screen after each snapshot display
4538 or with a no argument to not clear the screen. See <a href="#cluster-status
">Displaying
4539 Cluster and Service Status</a> for information. For example:
4540 <pre>cluadmin> <b>cluster monitor -clear yes -interval 10</b></pre>
4545 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4547 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">loglevel</font></b></td>
4549 <td WIDTH="67%
">Sets the logging for the specified cluster daemon to the
4550 specified severity level. See <a href="#cluster-logging
">Modifying Cluster
4551 Event Logging </a>for information. For example:
4552 <pre>cluadmin> <b>cluster loglevel cluquorumd 7 </b></pre>
4557 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4559 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">reload</font></b></td>
4561 <td WIDTH="67%
">Forces the cluster daemons to re-read the cluster configuration
4562 database. See <a href="#cluster-reload
">Reloading the Cluster Database</a>
4563 for information. For example:
4564 <pre>cluadmin> <b>cluster reload </b></pre>
4569 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4571 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">name</font></b></td>
4573 <td WIDTH="67%
">Sets the name of the cluster to the specified name. The
4574 cluster name is included in the output of the <b><font face="Courier New, Courier, mono
">clustat</font></b>
4575 cluster monitoring command. See <a href="#cluster-name
">Changing the Cluster
4576 Name</a> for information. For example:
4577 <pre>cluadmin> <b>cluster name dbasecluster</b></pre>
4582 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4584 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">backup</font></b></td>
4586 <td WIDTH="67%
">Saves a copy of the cluster configuration database in the
4587 <b><font face="Courier New, Courier, mono
">/etc/cluster.conf.bak</font></b>
4588 file. See <a href="#cluster-backup
">Backing Up and Restoring the Cluster
4589 Database</a> for information. For example:
4590 <pre>cluadmin> <b>cluster backup </b></pre>
4595 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4597 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">restore</font></b></td>
4599 <td WIDTH="67%
">Restores the cluster configuration database from the backup
4600 copy in the <b><font face="Courier New, Courier, mono
">/etc/cluster.conf.bak</font></b>
4601 file. See <a href="#cluster-backup
">Backing Up and Restoring the Cluster
4602 Database</a> for information. For example:
4603 <pre>cluadmin> <b>cluster restore</b></pre>
4608 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4610 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">saveas</font></b></td>
4612 <td WIDTH="67%
">Saves the cluster configuration database to the specified
4613 file. See <a href="#cluster-backup
">Backing Up and Restoring the Cluster
4614 Database</a> for information. For example:
4615 <pre>cluadmin> <b>cluster saveas cluster_backup.conf</b> </pre>
4620 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4622 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">restorefrom</font></b></td>
4624 <td WIDTH="67%
">Restores the cluster configuration database from the specified
4625 file. See <a href="#cluster-backup
">Backing Up and Restoring the Cluster
4626 Database</a> for information. For example:
4627 <pre>cluadmin> <b>cluster restorefrom cluster_backup.conf </b></pre>
4632 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
" HEIGHT="83"><b><font face="Courier New, Courier, mono
">service</font></b></td>
4634 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
" HEIGHT="83"><b><font face="Courier New, Courier, mono
">add</font></b></td>
4636 <td WIDTH="67%
" HEIGHT="83">Adds a cluster service to the cluster database.
4637 The command prompts you for information about service resources and properties.
4639 <a href="#service-configure
">Configuring a Service</a> for information.
4641 <pre>cluadmin> <b>service add </b></pre>
4646 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4648 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">modify</font></b></td>
4650 <td WIDTH="67%
">Modifies the resources or properties of the specified service.
4651 You can modify any of the information that you specified when the service
4652 was created. See <a href="#service-modify
">Modifying a Service</a> for
4653 information. For example:
4654 <pre>cluadmin> <b>service modify dbservice </b></pre>
4659 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4661 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">show
4662 state</font></b></td>
4664 <td WIDTH="67%
">Displays the current status of all services or the specified
4665 service. See <a href="#cluster-status
">Displaying Cluster and Service Status</a>for
4666 information. For example:
4667 <pre>cluadmin> <b>service show state dbservice</b></pre>
4675 <center><b>relocate</b></center>
4678 <td>Editorial comment: Need to add relocate description and corresponding
4683 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4685 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">show
4686 config</font></b></td>
4688 <td WIDTH="67%
">Displays the current configuration for the specified service.
4689 See <a href="#service-status
">Displaying a Service Configuration</a> for
4690 information. For example:
4691 <pre>cluadmin> <b>service show config dbservice</b></pre>
4696 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4698 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">disable</font></b></td>
4700 <td WIDTH="67%
">Stops the specified service. You must enable a service
4701 to make it available again. See <a href="#service-disable
">Disabling a
4702 Service</a> for information. For example:
4703 <pre>cluadmin> <b>service disable dbservice</b></pre>
4708 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4710 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">enable</font></b></td>
4712 <td WIDTH="67%
">Starts the specified disabled service. See <a href="#service-enable
">Enabling
4713 a Service</a> for information. For example:
4714 <pre>cluadmin> <b>service enable dbservice </b></pre>
4719 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"> </td>
4721 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
"><b><font face="Courier New, Courier, mono
">delete</font></b></td>
4723 <td WIDTH="67%
">Deletes the specified service from the cluster configuration
4724 database. See <a href="#service-delete
">Deleting a Service</a> for information.
4726 <pre cluadmin> <b>service delete dbservice </b></pre>
4731 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"><b><font face="Courier New, Courier, mono
">apropos</font></b></td>
4733 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
">None</td>
4735 <td WIDTH="67%
">Displays the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4736 commands that match the specified character string argument or, if no argument
4737 is specified, displays all <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4738 commands. For example:
4739 <pre>cluadmin> <b>apropos service</b></pre>
4744 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
"><b><font face="Courier New, Courier, mono
">clear</font></b></td>
4746 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
">None</td>
4748 <td WIDTH="67%
">Clears the screen display. For example:
4749 <pre>cluadmin> <b>clear </b></pre>
4754 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%
" HEIGHT="27"><b><font face="Courier New, Courier, mono
">exit</font></b></td>
4756 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%
" HEIGHT="27">None</td>
4758 <td WIDTH="67%
" HEIGHT="27">Exits from <b><font face="Courier New, Courier, mono
">cluadmin</font></b>.
4760 <pre>cluadmin> <b>exit</b></pre>
4766 <center><b>quit</b></center>
4770 <center>None</center>
4773 <td>Exits from <b><font face="Courier New, Courier, mono
">cluadmin</font></b>.
4775 <br>cluadmin> <b>quit</b></td>
4779 <p>While using <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4780 utility, you can press the <b><font face="Courier New, Courier, mono
">Tab</font></b>
4781 key to help identify <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4782 commands. For example, pressing the <b><font face="Courier New, Courier, mono
">Tab</font></b>
4783 key at the <b><font face="Courier New, Courier, mono
">cluadmin></font></b>
4784 utility displays a list of all the commands. Entering a letter at the prompt
4785 and then pressing the <b><font face="Courier New, Courier, mono
">Tab</font></b>
4786 key displays the commands that begin with the specified letter. Specifying
4787 a command and then pressing the <b><font face="Courier New, Courier, mono
">Tab</font></b>
4788 key displays a list of all the subcommands that can be specified with that
4790 <p>In addition, you can display the history of <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4791 commands by pressing the up arrow and down arrow keys at the prompt. The
4792 command history is stored in the <b><font face="Courier New, Courier, mono
">.cluadmin_history</font></b>
4793 file in your home directory.
4796 <hr noshade width="80%
">
4797 <p><a NAME="service
"></a>
4799 4 Service Configuration and Administration</h1>
4800 The following sections describe how to set up and administer cluster services:
4803 <a href="#service-configure
">Configuring a Service</a></li>
4806 <a href="#service-status
">Displaying a Service Configuration</a></li>
4809 <a href="#service-disable
">Disabling a Service</a></li>
4812 <a href="#service-enable
">Enabling a Service</a></li>
4815 <a href="#service-modify
">Modifying a Service</a></li>
4818 <a href="#service-relocate
">Relocating a Service</a></li>
4821 <a href="#service-delete
">Deleting a Service</a></li>
4824 <a href="#service-error
">Handling Services in an Error State</a></li>
4829 <a NAME="service-configure
"></a></h2>
4832 4.1 Configuring a Service</h2>
4833 To configure a service, you must prepare the cluster systems for the service.
4834 For example, you must set up any disk storage or applications used in the
4835 services. You can then add information about the service properties and
4836 resources to the cluster database by using the <b>cluadmin</b> utility.
4837 This information is used as parameters to scripts that start and stop the
4839 <p>To configure a service, follow these steps
4842 If applicable, create a script that will start and stop the application
4843 used in the service. See <a href="#service-scripts
">Creating Service Scripts</a>
4844 for information.</li>
4848 Gather information about service resources and properties. See <a href="#service-gather
">Gathering
4849 Service Information</a> for information.</li>
4853 Set up the file systems or raw devices that the service will use. See <a href="#service-storage
">Configuring
4854 Service Disk Storage</a> for information.</li>
4858 Ensure that the application software can run on each cluster system and
4859 that the service script, if any, can start and stop the service application.
4860 See <a href="#service-app
">Verifying Application Software and Service Scripts</a>
4861 for information.</li>
4865 Back up the <b><font face="Courier New, Courier, mono
">/etc/cluster.conf</font></b>
4866 file. See <a href="#cluster-backup
">Backing Up and Restoring the Cluster
4867 Database</a> for information.</li>
4871 Invoke the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
4872 utility and specify the <b><font face="Courier New, Courier, mono
">service
4873 add </font></b>command. You will be prompted for information about the
4874 service resources and properties obtained in step 2. If the service passes
4875 the configuration checks, it will be started on the cluster system on which
4876 you are running <b><font face="Courier New, Courier, mono
">cluadmin</font></b>,
4877 unless you choose to keep the service disabled. For example:</li>
4879 <pre>cluadmin> <b>service add</b></pre>
4881 For more information about adding a cluster service, see the following:
4885 <a href="#service-dbase
">Setting Up an Oracle Service</a></li>
4888 <a href="#service-mysql
">Setting Up a MySQL Service</a></li>
4891 <a href="#service-db2
">Setting Up a DB2 Service</a></li>
4894 <a href="#service-apache
">Setting Up an Apache Service</a></li>
4897 <a href="#service-nfs
">Setting Up an NFS Service</a></li>
4899 See <font size=+0><a href="#software-manual
">Cluster Database Fields</a></font>
4900 for a description of the service fields in the database.
4901 <p><a NAME="service-gather
"></a>
4903 4.1.1 Gathering Service Information</h3>
4904 Before you create a service, you must gather information about the service
4905 resources and properties. When you add a service to the cluster database,
4906 the <b><font face="Courier New, Courier, mono
">cluadmin</font></b> utility
4907 prompts you for this information.
4908 <p>In some cases, you can specify multiple resources for a service. For
4909 example, you can specify multiple IP addresses and disk devices.
4910 <p>The service properties and resources that you can specify are described
4911 in the following table.
4913 <table BORDER CELLSPACING=0 CELLPADDING=5 WIDTH="100%
" >
4914 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#
99CCCC
">
4915 <td WIDTH="22%
" HEIGHT="21">
4917 Service Property or Resource</h3>
4920 <td WIDTH="78%
" HEIGHT="21">
4926 <tr ALIGN=LEFT VALIGN=TOP>
4927 <td WIDTH="22%
" HEIGHT="34"><b>Service name</b></td>
4929 <td WIDTH="78%
" HEIGHT="34">Each service must have a unique name. A service
4930 name can consist of one to 63 characters and must consist of a combination
4931 of letters (either uppercase or lowercase), integers, underscores, periods,
4932 and dashes. However, a service name must begin with a letter or an underscore. </td>
4935 <tr ALIGN=LEFT VALIGN=TOP>
4936 <td WIDTH="22%
" HEIGHT="32"><b>Preferred member</b></td>
4938 <td WIDTH="78%
" HEIGHT="32">Specify the cluster system, if any, on which
4939 you want the service to run unless failover has occurred or unless you
4940 manually relocate the service. </td>
4943 <tr ALIGN=LEFT VALIGN=TOP>
4944 <td WIDTH="22%
" HEIGHT="101"><b>Preferred member relocation policy </b>
4947 <td WIDTH="78%
" HEIGHT="101">If you enable this policy, the service will
4948 automatically relocate to its preferred member when that system joins the
4949 cluster. If you disable this policy, the service will remain running on
4950 the non-preferred member. For example, if you enable this policy and the
4951 failed preferred member for the service reboots and joins the cluster,
4952 the service will automatically restart on the preferred member. </td>
4955 <tr ALIGN=LEFT VALIGN=TOP>
4956 <td WIDTH="22%
"><b>Script location</b></td>
4958 <td WIDTH="78%
">If applicable, specify the full path name for the script
4959 that will be used to start and stop the service. See <a href="#service-scripts
">Creating
4960 Service Scripts</a> for more information.</td>
4963 <tr ALIGN=LEFT VALIGN=TOP>
4964 <td WIDTH="22%
"><b>IP address</b></td>
4966 <td WIDTH="78%
">You can assign one or more Internet protocol (IP) addresses
4967 to a service. This IP address (sometimes called a "floating
" IP address)
4968 is different from the IP address associated with the host name Ethernet
4969 interface for a cluster system, because it is automatically relocated along
4970 with the service resources, when failover occurs. If clients use this IP
4971 address to access the service, they do not know which cluster system is
4972 running the service, and failover is transparent to the clients.
4973 <p>Note that cluster members must have network interface cards configured
4974 in the IP subnet of each IP address used in a service.
4975 <p>You can also specify netmask and broadcast addresses for each IP address.
4976 If you do not specify this information, the cluster uses the netmask and
4977 broadcast addresses from the network interconnect in the subnet. </td>
4980 <tr ALIGN=LEFT VALIGN=TOP>
4981 <td WIDTH="22%
" HEIGHT="48"><b>Disk partition, owner, group, and access
4984 <td WIDTH="78%
" HEIGHT="48">Specify each shared disk partition used in
4985 a service. In addition, you can specify the owner, group, and access mode
4986 (for example, 755) for each mount point or raw device. </td>
4989 <tr ALIGN=LEFT VALIGN=TOP>
4990 <td WIDTH="22%
" HEIGHT="146"><b>Mount points, file system type, mount
4991 and NFS export options</b></td>
4993 <td WIDTH="78%
" HEIGHT="146">If you are using a file system, you must specify
4994 the type of file system, a mount point, and any mount options. Mount options
4995 that you can specify are the standard file system mount options that are
4996 described in the <b><font face="Courier New, Courier, mono
">mount.8</font></b>
4997 manpage. If you are using a raw device, you do not have to specify mount
4999 <p>The ext2and ext3 file systems are the recommended file systems for a
5000 cluster. Although you can use a different file system in a cluster,
5001 other file system types such as reiserfs tested.
5002 <p>You must specify whether you want to enable forced unmount for a file
5003 system. Forced unmount enables the cluster service management infrastructure
5004 to unmount a file system even if it is being accessed by an application
5005 or user (that is, even if the file system is "busy
"). This is accomplished
5006 by terminating any applications that are accessing the file system.
5007 <p>In addition, you are asked whether you wish to NFS export the filesystem
5008 and if so, what access permissions should be applied. Refer to <a href="#service-nfs
">Creating
5009 NFS Services</a> for details. </td>
5012 <tr ALIGN=LEFT VALIGN=TOP>
5013 <td WIDTH="22%
"><b>Disable service policy</b></td>
5015 <td WIDTH="78%
">If you do not want to automatically start a service after
5016 it is added to the cluster, you can choose to keep the new service disabled,
5017 until an administrator explicitly enables the service.</td>
5021 <a NAME="service-scripts
"></a>
5023 4.1.2 Creating Service Scripts</h3>
5024 For services that include an application, you must create a script that
5025 contains specific instructions to start and stop the application (for example,
5026 a database application). The script will be called with a <b><font face="Courier New, Courier, mono
">start</font></b>
5027 or <b><font face="Courier New, Courier, mono
">stop</font></b> argument
5028 and will run at service start time and stop time. The script should be
5029 similar to the scripts found in the System V <b><font face="Courier New, Courier, mono
">init</font></b>
5031 <p><i>Editorial comment: Add description of status argument.</i>
5032 <p>The <b><font face="Courier New, Courier, mono
">/usr/share/cluster/doc/services/examples</font></b>
5033 directory contains a template that you can use to create service scripts,
5034 in addition to examples of scripts. See <a href="#service-dbase
">Setting
5035 Up an Oracle Service</a>, <a href="#service-mysql
">Setting Up a MySQL Service</a>,
5036 <a href="#service-apache
">Setting
5037 Up an Apache Service</a>, and <a href="#service-db2
">Setting Up a DB2 Service</a>
5039 <p><a NAME="service-storage
"></a>
5041 4.1.3 Configuring Service Disk Storage</h3>
5042 Before you create a service, set up the shared file systems and raw devices
5043 that the service will use. See <a href="#hardware-storage
">Configuring
5044 Shared Disk Storage</a> for more information.
5045 <p>If you are using raw devices in a cluster service, you can use the <b><font face="Courier New, Courier, mono
">/etc/sysconfig/rawdevices</font></b>
5046 file to bind the devices at boot time. Edit the file and specify the raw
5047 character devices and block devices that you want to bind each time the
5048 system boots. See <a href="#software-rawdevices
">Editing the rawdevices
5049 File</a> for more information.
5050 <p>Note that software RAID, SCSI adapter-based RAID, and host-based RAID
5051 are not supported for shared disk storage.
5052 <p>You should adhere to these <b>service disk storage recommendations</b>:
5055 For optimal performance, use a 4 KB block size when creating file systems.
5056 Note that some of the <b><font face="Courier New, Courier, mono
">mkfs</font></b>
5057 file system build utilities default to a 1 KB block size, which can cause
5058 long <b><font face="Courier New, Courier, mono
">fsck</font></b> times.</li>
5062 For large file systems, use the <b><font face="Courier New, Courier, mono
">mount</font></b>
5063 command with the <b><font face="Courier New, Courier, mono
">nocheck</font></b>
5064 option to bypass code that checks all the block groups on the partition.
5065 Specifying the <b><font face="Courier New, Courier, mono
">nocheck</font></b>
5066 option can significantly decrease the time required to mount a large file
5069 <a NAME="service-app
"></a>
5071 4.1.4 Verifying Application Software and Service Scripts</h3>
5072 Before you set up a service, install any application that will be used
5073 in a service on each system. After you install the application, verify
5074 that the application runs and can access shared disk storage. To prevent
5075 data corruption, do not run the application simultaneously on both systems.
5076 <p>If you are using a script to start and stop the service application,
5077 you must install and test the script on both cluster systems, and verify
5078 that it can be used to start and stop the application. See <a href="#service-scripts
">Creating
5079 Service Scripts</a> for information.
5080 <p><a NAME="service-dbase
"></a>
5082 4.1.5 Setting Up an Oracle Service</h3>
5083 A database service can serve highly-available data to a database application.
5084 The application can then provide network access to database client systems,
5085 such as Web servers. If the service fails over, the application accesses
5086 the shared database data through the new cluster system. A network-accessible
5087 database service is usually assigned an IP address, which is failed over
5088 along with the service to maintain transparent access for clients.
5089 <p>This section provides an example of setting up a cluster service for
5090 an Oracle database. Although the variables used in the service scripts
5091 depend on the specific Oracle configuration, the example may help you set
5092 up a service for your environment. See <a href="#app-tuning
">Tuning Oracle
5093 Services</a> for information about improving service performance.
5094 <p>In the example that follows:
5097 The service includes one IP address for the Oracle clients to use.</li>
5101 The service has two mounted file systems, one for the Oracle software (<b><font face="Courier New, Courier, mono
">/u01</font></b>)
5102 and the other for the Oracle database (<b><font face="Courier New, Courier, mono
">/u02</font></b>),
5103 which were set up before the service was added.</li>
5107 An Oracle administration account with the name <b><font face="Courier New, Courier, mono
">oracle</font></b>
5108 was created on both cluster systems before the service was added.</li>
5112 Network access in this example is through Perl DBI proxy.</li>
5116 The administration directory is on a shared disk that is used in conjunction
5117 with the Oracle service (for example, <b><font face="Courier New, Courier, mono
">/u01/app/oracle/admin/db1</font></b>).</li>
5119 The Oracle service example uses five scripts that must be placed in <b><font face="Courier New, Courier, mono
">/home/oracle</font></b>
5120 and owned by the Oracle administration account. The <b><font face="Courier New, Courier, mono
">oracle</font></b>
5121 script is used to start and stop the Oracle service. Specify this script
5122 when you add the service. This script calls the other Oracle example scripts.
5123 The <b><font face="Courier New, Courier, mono
">startdb</font></b> and <b><font face="Courier New, Courier, mono
">stopdb</font></b>
5124 scripts start and stop the database. The <b><font face="Courier New, Courier, mono
">startdbi</font></b>
5125 and <b><font face="Courier New, Courier, mono
">stopdbi</font></b> scripts
5126 start and stop a Web application that has been written by using Perl scripts
5127 and modules and is used to interact with the Oracle database. Note that
5128 there are many ways for an application to interact with an Oracle database.
5129 <p>The following is an example of the <b><font face="Courier New, Courier, mono
">oracle</font></b>
5130 script, which is used to start and stop the Oracle service. Note that the
5131 script is run as user <b><font face="Courier New, Courier, mono
">oracle</font></b>,
5132 instead of <b><font face="Courier New, Courier, mono
">root</font></b>.
5133 <pre><font size=-1>#!/bin/sh
5135 # Cluster service script to start/stop oracle
5142 su - oracle -c ./startdbi
5143 su - oracle -c ./startdb
5144 ;;
5146 su - oracle -c ./stopdb
5147 su - oracle -c ./stopdbi
5148 ;;
5150 The following is an example of the <b><font face="Courier New, Courier, mono
">startdb</font></b>
5151 script, which is used to start the Oracle Database Server instance:
5152 <pre><font size=-1>#!/bin/sh
5156 # Script to start the Oracle Database Server instance.
5158 ###########################################################################
5162 # Specifies the Oracle product release.
5164 ###########################################################################
5166 ORACLE_RELEASE=8.1.6
5168 ###########################################################################
5172 # Specifies the Oracle system identifier or "sid
", which is the name of the
5173 # Oracle Server instance.
5175 ###########################################################################
5177 export ORACLE_SID=TESTDB
5179 ###########################################################################
5183 # Specifies the directory at the top of the Oracle software product and
5184 # administrative file structure.
5186 ###########################################################################
5188 export ORACLE_BASE=/u01/app/oracle
5190 ###########################################################################
5194 # Specifies the directory containing the software for a given release.
5195 # The Oracle recommended value is $ORACLE_BASE/product/<release>
5197 ###########################################################################
5199 export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
5201 ###########################################################################
5205 # Required when using Oracle products that use shared libraries.
5207 ###########################################################################
5209 export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
5211 ###########################################################################
5215 # Verify that the users search path includes $ORCLE_HOME/bin
5217 ###########################################################################
5219 export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
5221 ###########################################################################
5223 # This does the actual work.
5225 # The oracle server manager is used to start the Oracle Server instance
5226 # based on the initSID.ora initialization parameters file specified.
5228 ###########################################################################
5230 /u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl << EOF
5231 spool /home/oracle/startdb.log
5233 startup pfile = /u01/app/oracle/admin/db1/pfile/initTESTDB.ora open;
5240 The following is an example of the <b><font face="Courier New, Courier, mono
">stopdb</font></b>
5241 script, which is used to stop the Oracle Database Server instance:
5242 <pre><font size=-1>#!/bin/sh
5245 # Script to STOP the Oracle Database Server instance.
5247 ###########################################################################
5251 # Specifies the Oracle product release.
5253 ###########################################################################
5255 ORACLE_RELEASE=8.1.6
5257 ###########################################################################
5261 # Specifies the Oracle system identifier or "sid
", which is the name of the
5262 # Oracle Server instance.
5264 ###########################################################################
5266 export ORACLE_SID=TESTDB
5268 ###########################################################################
5272 # Specifies the directory at the top of the Oracle software product and
5273 # administrative file structure.
5275 ###########################################################################
5277 export ORACLE_BASE=/u01/app/oracle
5279 ###########################################################################
5283 # Specifies the directory containing the software for a given release.
5284 # The Oracle recommended value is $ORACLE_BASE/product/<release>
5286 ###########################################################################
5288 export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
5290 ###########################################################################
5294 # Required when using Oracle products that use shared libraries.
5296 ###########################################################################
5298 export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
5300 ###########################################################################
5304 # Verify that the users search path includes $ORCLE_HOME/bin
5306 ###########################################################################
5308 export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
5310 ###########################################################################
5312 # This does the actual work.
5314 # The oracle server manager is used to STOP the Oracle Server instance
5315 # in a tidy fashion.
5317 ###########################################################################
5319 /u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl << EOF
5320 spool /home/oracle/stopdb.log
5329 The following is an example of the <b><font face="Courier New, Courier, mono
">startdbi</font></b>
5330 script, which is used to start a networking DBI proxy daemon:
5331 <pre><font size=-1>#!/bin/sh
5334 ###########################################################################
5336 # This script allows are Web Server application (perl scripts) to
5337 # work in a distributed environment. The technology we use is
5338 # base upon the DBD::Oracle/DBI CPAN perl modules.
5340 # This script STARTS the networking DBI Proxy daemon.
5342 ###########################################################################
5344 export ORACLE_RELEASE=8.1.6
5345 export ORACLE_SID=TESTDB
5346 export ORACLE_BASE=/u01/app/oracle
5347 export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
5348 export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
5349 export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
5352 # This line does the real work.
5355 /usr/bin/dbiproxy --logfile /home/oracle/dbiproxy.log --localport 1100 &
5360 The following is an example of the <b><font face="Courier New, Courier, mono
">stopdbi</font></b>
5361 script, which is used to stop a networking DBI proxy daemon:
5362 <pre><font size=-1>#!/bin/sh
5365 #######################################################################
5367 # Our Web Server application (perl scripts) work in a distributed
5368 # environment. The technology we use is base upon the DBD::Oracle/DBI
5369 # CPAN perl modules.
5371 # This script STOPS the required networking DBI Proxy daemon.
5373 ########################################################################
5376 PIDS=$(ps ax | grep /usr/bin/dbiproxy | awk '{print $1}')
5380 kill -9 $pid
5386 The following example shows how to use <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
5387 to add an Oracle service.
5388 <pre><font size=-1>cluadmin> <b>service add oracle
5390 </b> The user interface will prompt you for information about the service.
5391 Not all information is required for all services.
5393 Enter a question mark (?) at a prompt to obtain help.
5395 Enter a colon (:) and a single-character command at a prompt to do
5396 one of the following:
5398 c - Cancel and return to the top-level cluadmin command
5399 r - Restart to the initial prompt while keeping previous responses
5400 p - Proceed with the next prompt
5401
5402 Preferred member [None]: <b><font face="Courier New, Courier, mono
">ministor0
5403 </font></b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">yes
5404 </font></b>User script (e.g., /usr/foo/script or None) [None]: <b><font face="Courier New, Courier, mono
">/home/oracle/oracle
5406 </font></b>Do you want to add an IP address to the service (yes/no/?): <b><font face="Courier New, Courier, mono
">yes
5408 </font></b> IP Address Information
5410 IP address: <b><font face="Courier New, Courier, mono
">10.1.16.132
5411 </font></b>Netmask (e.g. 255.255.255.0 or None) [None]: <b><font face="Courier New, Courier, mono
">255.255.255.0
5412 </font></b>Broadcast (e.g. X.Y.Z.255 or None) [None]: <b><font face="Courier New, Courier, mono
">10.1.16.255
5414 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
5415 or are you (f)inished adding IP addresses: <b><font face="Courier New, Courier, mono
">f
5417 </font></b>Do you want to add a disk device to the service (yes/no/?): <b><font face="Courier New, Courier, mono
">yes
5419 </font></b> Disk Device Information
5421 Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono
">/dev/sda1
5422 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono
">ext2
5423 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono
">/u01
5424 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono
">[Return]
5425 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">yes
5427 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
5428 or are you (f)inished adding device information: <b><font face="Courier New, Courier, mono
">a
5430 </font></b>Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono
">/dev/sda2
5431 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono
">ext2
5432 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono
">/u02
5433 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono
">[Return]
5434 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">yes
5437 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
5438 or are you (f)inished adding devices: <b><font face="Courier New, Courier, mono
">f
5440 </font></b>Disable service (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">no
5442 </font></b>name: oracle
5444 preferred node: ministor0
5446 user script: /home/oracle/oracle
5447 IP address 0: 10.1.16.132
5448 netmask 0: 255.255.255.0
5449 broadcast 0: 10.1.16.255
5451 mount point, device 0: /u01
5452 mount fstype, device 0: ext2
5453 force unmount, device 0: yes
5455 mount point, device 1: /u02
5456 mount fstype, device 1: ext2
5457 force unmount, device 1: yes
5459 Add oracle service as shown? (yes/no/?) <b>y
5460 </b>notice: Starting service oracle ...
5461 info: Starting IP address 10.1.16.132
5462 info: Sending Gratuitous arp for 10.1.16.132 (00:90:27:EB:56:B8)
5463 notice: Running user script '/home/oracle/oracle start'
5464 notice, Server starting
5469 <a NAME="service-mysql
"></a>
5471 4.1.6 Setting Up a MySQL Service</h3>
5472 A database service can serve highly-available data to a database application.
5473 The application can then provide network access to database client systems,
5474 such as Web servers. If the service fails over, the application accesses
5475 the shared database data through the new cluster system. A network-accessible
5476 database service is usually assigned an IP address, which is failed over
5477 along with the service to maintain transparent access for clients.
5478 <p>You can set up a MySQL database service in a cluster. Note that MySQL
5479 does not provide full transactional semantics; therefore, it may not be
5480 suitable for update-intensive applications.
5481 <p>An example of a MySQL database service is as follows:
5484 The MySQL server and the database instance both reside on a file system
5485 that is located on a disk partition on shared storage. This allows the
5486 database data and its run-time state information, which is required for
5487 failover, to be accessed by both cluster systems. In the example, the file
5488 system is mounted as <b><font face="Courier New, Courier, mono
">/var/mysql</font></b>,
5489 using the shared disk partition <b><font face="Courier New, Courier, mono
">/dev/sda1</font></b>.</li>
5493 An IP address is associated with the MySQL database to accommodate network
5494 access by clients of the database service. This IP address will automatically
5495 be migrated among the cluster members as the service fails over. In the
5496 example below, the IP address is 10.1.16.12.</li>
5500 The script that is used to start and stop the MySQL database is the standard
5501 System V <b><font face="Courier New, Courier, mono
">init</font></b> script,
5502 which has been modified with configuration parameters to match the file
5503 system on which the database is installed.</li>
5507 By default, a client connection to a MySQL server will time out after eight
5508 hours of inactivity. You can modify this connection limit by setting the
5509 <b><font face="Courier New, Courier, mono
">wait_timeout</font></b>
5510 variable when you start <b><font face="Courier New, Courier, mono
">mysqld</font></b>.</li>
5514 <p>To check if a MySQL server has timed out, invoke the <b><font face="Courier New, Courier, mono
">mysqladmin
5515 version</font></b> command and examine the uptime. Invoke the query again
5516 to automatically reconnect to the server.
5517 <p>Depending on the Linux distribution, one of the following messages may
5518 indicate a MySQL server timeout:
5520 <pre>CR_SERVER_GONE_ERROR
5521 CR_SERVER_LOST</pre>
5523 A sample script to start and stop the MySQL database is located in <b><font face="Courier New, Courier, mono
">/usr/share/cluster/doc/services/examples/mysql.server</font></b>,
5525 <pre><font size=-1>#!/bin/sh
5526 # Copyright Abandoned 1996 TCX DataKonsult AB & Monty Program KB & Detron HB
5527 # This file is public domain and comes with NO WARRANTY of any kind
5529 # Mysql daemon start/stop script.
5531 # Usually this is put in /etc/init.d (at least on machines SYSV R4
5532 # based systems) and linked to /etc/rc3.d/S99mysql. When this is done
5533 # the mysql server will be started when the machine is started.
5535 # Comments to support chkconfig on RedHat Linux
5536 # chkconfig: 2345 90 90
5537 # description: A very fast and reliable SQL database engine.
5539 PATH=/sbin:/usr/sbin:/bin:/usr/bin
5541 bindir=/var/mysql/bin
5542 datadir=/var/mysql/var
5543 pid_file=/var/mysql/var/mysqld.pid
5544 mysql_daemon_user=root # Run mysqld as this user.
5549 if test -w / # determine if we should look at the root config file
5550 then # or user config file
5551 conf=/etc/my.cnf
5553 conf=$HOME/.my.cnf # Using the users config file
5556 # The following code tries to get the variables safe_mysqld needs from the
5557 # config file. This isn't perfect as this ignores groups, but it should
5558 # work as the options doesn't conflict with anything else.
5560 if test -f "$conf
" # Extract those fields we need from config file.
5562 if grep "^datadir
" $conf > /dev/null
5564 datadir=`grep "^datadir
" $conf | cut -f 2 -d= | tr -d ' '`
5566 if grep "^user
" $conf > /dev/null
5568 mysql_daemon_user=`grep "^user
" $conf | cut -f 2 -d= | tr -d ' ' | head -1`
5570 if grep "^pid-file
" $conf > /dev/null
5572 pid_file=`grep "^pid-file
" $conf | cut -f 2 -d= | tr -d ' '`
5574 if test -d "$datadir
"
5575 then
5576 pid_file=$datadir/`hostname`.pid
5577 fi
5579 if grep "^basedir
" $conf > /dev/null
5581 basedir=`grep "^basedir
" $conf | cut -f 2 -d= | tr -d ' '`
5582 bindir=$basedir/bin
5584 if grep "^bindir
" $conf > /dev/null
5586 bindir=`grep "^bindir
" $conf | cut -f 2 -d=| tr -d ' '`
5591 # Safeguard (relative paths, core dumps..)
5596 # Start daemon
5598 if test -x $bindir/safe_mysqld
5599 then
5600 # Give extra arguments to mysqld with the my.cnf file. This script may
5601 # be overwritten at next upgrade.
5602 $bindir/safe_mysqld --user=$mysql_daemon_user --pid-file=$pid_file --datadir=$datadir &
5603 else
5604 echo "Can't execute $bindir/safe_mysqld
"
5605 fi
5606 ;;
5609 # Stop daemon. We use a signal here to avoid having to know the
5610 # root password.
5611 if test -f "$pid_file
"
5612 then
5613 mysqld_pid=`cat $pid_file`
5614 echo "Killing mysqld with pid $mysqld_pid
"
5615 kill $mysqld_pid
5616 # mysqld should remove the pid_file when it exits.
5617 else
5618 echo "No mysqld pid file found. Looked for $pid_file.
"
5619 fi
5620 ;;
5623 # usage
5624 echo "usage: $
0 start|stop
"
5625 exit 1
5626 ;;
5628 The following example shows how to use <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
5629 to add a MySQL service.
5630 <pre><font size=-1>cluadmin> <b>service add
5632 </b> The user interface will prompt you for information about the service.
5633 Not all information is required for all services.
5635 Enter a question mark (?) at a prompt to obtain help.
5637 Enter a colon (:) and a single-character command at a prompt to do
5638 one of the following:
5640 c - Cancel and return to the top-level cluadmin command
5641 r - Restart to the initial prompt while keeping previous responses
5642 p - Proceed with the next prompt
5643
5644 Currently defined services:
5651 Service name: <b><font face="Courier New, Courier, mono
">mysql_1
5652 </font></b>Preferred member [None]: <b><font face="Courier New, Courier, mono
">devel0
5653 </font></b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">yes
5654 </font></b>User script (e.g., /usr/foo/script or None) [None]: <b><font face="Courier New, Courier, mono
">/etc/rc.d/init.d/mysql.server
5656 </font></b>Do you want to add an IP address to the service (yes/no/?): <b><font face="Courier New, Courier, mono
">yes
5658 </font></b> IP Address Information
5660 IP address: <b><font face="Courier New, Courier, mono
">10.1.16.12
5661 </font></b>Netmask (e.g. 255.255.255.0 or None) [None]: <b><font face="Courier New, Courier, mono
">[Return]
5662 </font></b>Broadcast (e.g. X.Y.Z.255 or None) [None]: <b><font face="Courier New, Courier, mono
">[Return]
5664 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
5665 or are you (f)inished adding IP addresses: <b><font face="Courier New, Courier, mono
">f
5667 </font></b>Do you want to add a disk device to the service (yes/no/?): <b><font face="Courier New, Courier, mono
">yes
5669 </font></b> Disk Device Information
5671 Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono
">/dev/sda1
5672 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono
">ext2
5673 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono
">/var/mysql
5674 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono
">rw
5675 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">yes
5677 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
5678 or are you (f)inished adding device information: <b><font face="Courier New, Courier, mono
">f
5680 </font></b>Disable service (yes/no/?) [no]: <b><font face="Courier New, Courier, mono
">yes
5682 </font></b>name: mysql_1
5684 preferred node: devel0
5686 user script: /etc/rc.d/init.d/mysql.server
5687 IP address 0: 10.1.16.12
5688 netmask 0: None
5689 broadcast 0: None
5691 mount point, device 0: /var/mysql
5692 mount fstype, device 0: ext2
5693 mount options, device 0: rw
5694 force unmount, device 0: yes
5696 Add mysql_1 service as shown? (yes/no/?) <b>y
5697 </b>Added mysql_1.
5698 cluadmin></font></pre>
5709 <p><a NAME="service-db2
"></a>
5711 4.1.7 Setting Up an DB2 Service</h3>
5712 This section provides an example of setting up a cluster service that will
5713 fail over IBM DB2 Enterprise/Workgroup Edition on a cluster. This example
5714 assumes that NIS is not running on the cluster systems.
5715 <p>To install the software and database on the cluster systems, follow
5719 On both cluster systems, log in as root and add the IP address and host
5720 name that will be used to access the DB2 service to <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
5721 file. For example:</li>
5723 <pre>10.1.16.182 ibmdb2.class.cluster.com ibmdb2</pre>
5726 Choose an unused partition on a shared disk to use for hosting DB2 administration
5727 and instance data, and create a file system on it. For example:</li>
5729 <pre># <b>mke2fs /dev/sda3</b></pre>
5732 Create a mount point on both cluster systems for the file system created
5733 in Step 2. For example:</li>
5735 <pre># <b>mkdir /db2home</b></pre>
5738 On the first cluster system, <b><font face="Courier New, Courier, mono
">devel0</font></b>,
5739 mount the file system created in Step 2 on the mount point created in Step
5740 3. For example:</li>
5742 <pre>devel0# <b>mount -t ext2 /dev/sda3 /db2home</b></pre>
5745 On the first cluster system, <b><font face="Courier New, Courier, mono
">devel0</font></b>,
5746 mount the DB2 cdrom and copy the setup response file included in the distribution
5747 to <b><font face="Courier New, Courier, mono
">/root</font></b>. For example:</li>
5749 <pre>devel0% <b>mount -t iso9660 /dev/cdrom /mnt/cdrom
5750 </b>devel0% <b>cp /mnt/cdrom/IBM/DB2/db2server.rsp /root</b></pre>
5753 Modify the setup response file, <b><font face="Courier New, Courier, mono
">db2server.rsp</font></b>,
5754 to reflect local configuration settings. Make sure that the UIDs and GIDs
5755 are reserved on both cluster systems. For example:</li>
5757 <pre>-----------Instance Creation Settings------------
5758 -------------------------------------------------
5761 DB2.HOME_DIRECTORY = /db2home/db2inst1
5763 -----------Fenced User Creation Settings----------
5764 --------------------------------------------------
5767 UDF.HOME_DIRECTORY = /db2home/db2fenc1
5769 -----------Instance Profile Registry Settings------
5770 ---------------------------------------------------
5773 ----------Administration Server Creation Settings---
5774 ----------------------------------------------------
5777 ADMIN.HOME_DIRECTORY = /db2home/db2as
5779 ---------Administration Server Profile Registry Settings-
5780 ---------------------------------------------------------
5781 ADMIN.DB2COMM = TCPIP
5783 ---------Global Profile Registry Settings-------------
5784 ------------------------------------------------------
5785 DB2SYSTEM = ibmdb2</pre>
5788 Start the installation. For example:</li>
5790 <pre>devel0# <b>cd /mnt/cdrom/IBM/DB2
5791 </b>devel0# <b>./db2setup -d -r /root/db2server.rsp 1>/dev/null 2>/dev/null &</b></pre>
5794 Check for errors during the installation by examining the installation
5795 log file, <b><font face="Courier New, Courier, mono
">/tmp/db2setup.log</font></b>.
5796 Every step in the installation must be marked as <b><font face="Courier New, Courier, mono
">SUCCESS</font></b>
5797 at the end of the log file.</li>
5801 Stop the DB2 instance and administration server on the first cluster system.
5804 <pre>devel0# <b>su - db2inst1</b>
5807 </b>devel0# <b>su - db2as</b>
5808 devel0# <b>db2admin stop
5809 </b>devel0# <b>exit</b></pre>
5812 Unmount the DB2 instance and administration data partition on the first
5813 cluster system. For example:</li>
5815 <pre>devel0# <b>umount /db2home</b></pre>
5818 Mount the DB2 instance and administration data partition on the second
5819 cluster system, devel1. For example:</li>
5821 <pre>devel1# <b>mount -t ext2 /dev/sda3 /db2home</b></pre>
5824 Mount the DB2 cdrom on the second cluster system and remotely copy the
5825 <b><font face="Courier New, Courier, mono
">db2server.rsp</font></b>
5826 file to <b><font face="Courier New, Courier, mono
">/root</font></b>. For
5829 <pre>devel1# <b>mount -t iso9660 /dev/cdrom /mnt/cdrom
5830 </b>devel1# <b>rcp devel0:/root/db2server.rsp /root</b></pre>
5833 Start the installation on the second cluster system, <b><font face="Courier New, Courier, mono
">devel1</font></b>.
5836 <pre>devel1# <b>cd /mnt/cdrom/IBM/DB2
5837 </b>devel1# <b>./db2setup -d -r /root/db2server.rsp 1>/dev/null 2>/dev/null &</b></pre>
5840 Check for errors during the installation by examining the installation
5841 log file. Every step in the installation must be marked as <b><font face="Courier New, Courier, mono
">SUCCESS</font></b>
5842 except for the following:</li>
5844 <pre>DB2 Instance Creation FAILURE
5845 Update DBM configuration file for TCP/IP CANCEL
5846 Update parameter DB2COMM CANCEL
5847 Auto start DB2 Instance CANCEL
5848 DB2 Sample Database CANCEL
5850 Administration Server Creation FAILURE
5851 Update parameter DB2COMM CANCEL
5852 Start Administration Serve CANCEL</pre>
5855 Test the database installation by invoking the following commands, first
5856 on one cluster system, and then on the other cluster system:</li>
5858 <pre># <b>mount -t ext2 /dev/sda3 /db2home
5859 </b># <b>su - db2inst1
5861 </b># <b>db2 connect to sample
5862 </b># <b>db2 select tabname from syscat.tables
5863 </b># <b>db2 connect reset
5866 </b># <b>umount /db2home</b></pre>
5869 Create the DB2 cluster start/stop script on the DB2 administration and
5870 instance data partition. For example:</li>
5872 <pre># vi /db2home/ibmdb2
5873 # chmod u+x /db2home/ibmdb2
5877 # IBM DB2 Database Cluster Start/Stop Script
5880 DB2DIR=/usr/IBMdb2/V6.1
5884 $DB2DIR/instance/db2istrt
5887 $DB2DIR/instance/db2ishut
5892 Modify the <b><font face="Courier New, Courier, mono
">/usr/IBMdb2/V6.1/instance/db2ishut</font></b>
5893 file on both cluster systems to forcefully disconnect active applications
5894 before stopping the database. For example:</li>
5896 <pre>for DB2INST in ${DB2INSTLIST?}; do
5897 echo "Stopping DB2 Instance
"${DB2INST?}"...
" >> ${LOGFILE?}
5898 find_homedir ${DB2INST?}
5899 INSTHOME="${USERHOME?}
"
5900 su ${DB2INST?} -c " \
5901 source ${INSTHOME?}/sqllib/db2cshrc
1> /dev/null
2> /dev/null; \
5902 ${INSTHOME?}/sqllib/db2profile
1> /dev/null
2> /dev/null; \
5903 >>>>>>> db2 force application all; \
5904 db2stop
" 1>> ${LOGFILE?} 2>> ${LOGFILE?}
5905 if [ $? -ne 0 ]; then
5906 ERRORFOUND=${TRUE?}
5907 fi
5911 Edit the <b><font face="Courier New, Courier, mono
">inittab</font></b>
5912 file and comment out the DB2 line to enable the cluster service to handle
5913 starting and stopping the DB2 service. This is usually the last line in
5914 the file. For example:</li>
5916 <pre># db:234:once:/etc/rc.db2 > /dev/console 2>&1 # Autostart DB2 Services</pre>
5918 Use the <b><font face="Courier New, Courier, mono
">cluadmin</font></b>
5919 utility to create the DB2 service. Add the IP address from Step 1, the
5920 shared partition created in Step 2, and the start/stop script created in
5922 <p>To install the DB2 client on a third system, invoke these commands:
5923 <pre>display# <b>mount -t iso9660 /dev/cdrom /mnt/cdrom
5924 </b>display# <b>cd /mnt/cdrom/IBM/DB2
5925 </b>display# <b>./db2setup -d -r /root/db2client.rsp</b></pre>
5926 To configure a DB2 client, add the service's IP address to the <b><font face="Courier New, Courier, mono
">/etc/hosts</font></b>
5927 file on the client system. For example:
5928 <pre>10.1.16.182 ibmdb2.lowell.mclinux.com ibmdb2</pre>
5929 Then, add the following entry to the <b><font face="Courier New, Courier, mono
">/etc/services</font></b>
5930 file on the client system:
5931 <pre>db2cdb2inst1 50000/tcp</pre>
5932 Invoke the following commands on the client system:
5933 <pre># <b>su - db2inst1
5934 </b># <b>db2 catalog tcpip node ibmdb2 remote ibmdb2 server db2cdb2inst1
5935 </b># <b>db2 catalog database sample as db2 at node ibmdb2
5936 </b># <b>db2 list node directory
5937 </b># <b>db2 list database directory</b></pre>
5938 To test the database from the DB2 client system, invoke the following commands:
5939 <pre># <b>db2 connect to db2 user db2inst1 using ibmdb2
5940 </b># <b>db2 select tabname from syscat.tables
5941 </b># <b>db2 connect reset</b></pre>
5943 <p><br><a NAME="service-nfs
"></a>
5945 4.1.8 Setting Up an NFS Service</h3>
5946 <i>(Editorial Note: the heading numbers need to be re-indexed.)</i>
5947 <p>Highly available network fileservers (NFS) are one of the key strengths
5948 of the clustering infrastructure. Advantages of clustered NFS services
5952 Ensures that NFS clients maintain uninterrupted access to key data in the
5953 event of server failure.</li>
5956 Facilitates planned maintenance by allowing you to transparently relocate
5957 NFS services to one cluster member, allowing you to fix or upgrade the
5958 other cluster member.</li>
5961 Allows you to setup an active-active configuration to maximize equipment
5962 utilization. More details on active-active configurations appear
5967 NFS Server Requirements</h4>
5968 If you intend to create highly available NFS services, then there are a
5969 few requirements which must be met by each cluster server. <i>(Note: these
5970 requirements do not pertain to NFS client systems.)
5971 </i>These requirements
5975 Kernel support for the NFS server must be enabled. NFS can be either
5976 configured statically or as a module. Both NFS V2 and NFS V3 are
5980 The kernel support for NFS provided with this Red Hat release incorporates
5981 enhancements (initially developed by Mission Critical Linux Inc.) which
5982 allow for transparent relocation of NFS services. These kernel enhancements
5983 prevent NFS clients from receiving <i>Stale file handle</i> errors after
5984 an NFS service has been relocated. If you are using kernel sources
5985 which do not include these NFS enhancements, you will still be able to
5986 configure and run NFS services within the cluster; but you will see warning
5987 messages emitted during service start and stop pointing out the absence
5988 of these kernel enhancements.</li>
5991 The NFS daemons must be running on all cluster servers. This is accomplished
5992 by enabling the <b>nfs</b> init.d run level script. For example:
5994 --level 345 nfs on</b>. NFS services will not start unless the following
5995 NFS daemons are running: <b>nfsd</b>, <b>rpc.mountd</b>,
5996 <b>rpc.statd</b>.</li>
5999 Filesystem mounts and their associated exports for clustered NFS services
6000 should not be included in <b>/etc/fstab</b> and <b>/etc/exports</b> respectively.
6001 Rather, for clustered NFS services, the parameters describing mounts and
6002 exports are entered via the <b>cluadmin</b> configuration utility.</li>
6006 Gathering NFS Service Configuration Parameters</h4>
6007 In preparation of configuring NFS services you need to plan out how the
6008 filesystems will be exported and failed over. The following information
6009 is required in order to configure NFS services:
6012 <b>Service Name</b> - A name used to uniquely identify this service within
6016 <b>Preferred Member</b> - Defines which system will be the NFS server for
6017 this service if more than one cluster member is operational.</li>
6020 <b>Relocation Policy</b> - whether to relocate the service to the preferred
6021 member if the preferred member wasn't running at the time the service was
6022 initially started. This parameter is useful as a means of load balancing
6023 the cluster members as NFS servers by assigning half the load to each.</li>
6026 <b>IP Address</b> - NFS clients access filesystems from an NFS server which
6027 is designated by its IP Address (or associated hostname). In order
6028 to abstract NFS clients from knowing which specific cluster member is the
6029 acting NFS server, the client systems should not use the cluster member's
6030 hostname as the IP address by which a service is mounted. Rather,
6031 clustered NFS services are assigned <i>floating</i> IP addresses which
6032 are distinct from the cluster server's IP addresses. This floating
6033 IP address is then configured on which ever cluster member is actively
6034 serving the NFS export. Following this approach, the NFS clients
6035 are only aware of the floating IP address and are unaware of the fact that
6036 clustered NFS server has been deployed. When you enter an NFS service's
6037 IP address, you will also be prompted to enter an associated netmask and
6038 broadcast address. If you select the default of None, then the assigned
6039 netmask and broadcast will be the same as what the network interface is
6040 currently configured to.</li>
6043 <b>Mount Information</b> - for non-clustered filesystems, the mount information
6044 is typically placed in /<b>etc/fstab</b>. In contrast, clustered
6045 filesystems must <b>not</b> be placed in <b>/etc/fstab</b>. This
6046 is necessary to ensure that only one cluster member at a time has the filesystem
6047 mounted. Failure to do so will result in filesystem corruption and
6048 likely system crashes.</li>
6052 <b>Device special file</b> - The mount information designates the disk's
6053 device special file and the directory on which the filesystem will be mounted.
6054 In the process of configuring an NFS service you will be prompted for this
6058 <b>Mount point directory</b> - An NFS service can include more than one
6059 filesystem mount. In this manner, the filesystems will be grouped
6060 together as a single failover unit.</li>
6063 <b>Mount options</b> - The mount information also designates the mount
6064 options. Note: by default, the Linux NFS server does not guarantee
6065 that all write operations are synchronously written to disk. In order
6066 to ensure synchronous writes you must specify the <b>sync</b> mount option.
6067 Specifying the <b>sync</b> mount option favors data integrity at the expense
6068 of performance. Refer to <i>mount(8)</i> for detailed descriptions of the
6069 mount related parameters.</li>
6072 <b>Forced unmount </b>- As part of the mount information, you will be prompted
6073 as to whether forced unmount should be enabled or not. When forced
6074 unmount is enabled, if any applications running on the cluster server have
6075 the designated filesystem mounted when the service is being disabled or
6076 relocated, then that application will be killed off to allow the unmount
6081 <b>Export Information</b> - for non-clustered NFS services, export information
6082 is typically placed in <b>/etc/exports</b>. In contrast, clustered
6083 NFS services should <b>not </b>place export information in <b>/etc/exports</b>;
6084 rather you will be prompted for this information during service configuration.
6085 Export information includes:</li>
6089 <b>Export directory</b> - the export directory can be the same as the mount
6090 point specified with the mount information. In this case, the entire
6091 filesystem is accessible through NFS. Alternatively, you may wish
6092 to only export a portion (subdirectory) of a mounted filesystem.
6093 By exporting subdirectories of a mountpoint, you can also specify different
6094 access rights to different sets of NFS clients.</li>
6097 <b>Export client names</b> - this parameter defines which systems will
6098 be allowed to access the filesystem as NFS clients. Here you can
6099 individually designate systems (e.g. fred), or you can use wildcards to
6100 allow groups of systems (e.g. *.wizzbang.com). Entering a client
6101 name of * allows any client to mount the filesystem.</li>
6104 <b>Export client options</b> - this parameter defines the access rights
6105 afforded to the corresponding client(s). Examples include <b>ro</b>
6106 (read only), and <b>rw</b> (read write). Unless explicitly
6107 specified otherwise, the default export options are <b>ro,async,wdelay,root_squash</b>.</li>
6109 Refer to <i>exports(5)</i> for detailed descriptions of the export parameter
6111 When running the <b>cluadmin</b> utility to configure NFS services:
6114 Please take care that you correctly enter the service parameters.
6115 The validation logic associated with NFS parameters is currently not very
6119 In response to most of the prompts, you can enter the <b>? </b>character
6120 to obtain descriptive help text.</li>
6124 Example NFS Service Configuration</h4>
6125 In order to illustrate the configuration process for an NFS service, an
6126 example configuration is described in this section. This example
6127 consists of setting up a single NFS export which houses the home directories
6128 of 4 members of the accounting team. NFS client access will be restricted
6129 to these 4 user's systems.
6130 <p>The following are the service configuration parameters which will be
6131 used as well as some descriptive commentary.
6135 Service Name - <b>nfs_accounting</b>. This name was chosen as a reminder
6136 of the service's intended function to provide exports to the members of
6137 the accounting team.</li>
6140 Preferred Member - <b>clu4</b>. In this example cluster, the member
6141 names are clu3 and clu4.</li>
6144 IP Address - <b>10.0.0.10</b>. There is a corresponding hostname
6145 of clunfsacct associated with this IP address, by which NFS clients mount
6146 the filesystem. Note that this IP address is distinct from that of
6147 both cluster members (clu3 and clu4). The default netmask and broadcast
6148 address will be used.</li>
6151 Mount Information - /<b>dev/sdb10</b>, which refers to the partition on
6152 the shared storage RAID box on which the filesystem will be physically
6153 stored. <b>ext3 </b>- referring to the filesystem type which was specified
6154 when the filesystem was created. <b>/mnt/users/accounting</b> - specifies
6155 the filesystem mount point. <b>rw,nosuid,sync</b> - are the mount options.</li>
6158 Export Information - for this example, the entire mounted filesystem will
6159 be made accessible on a read write basis by four members of the accounting
6160 team. The names of the systems used by these four team members are
6164 and <b>dwalsh</b>.</li>
6166 The following is an excerpt of the /etc/hosts file used to represent IP
6167 addresses and associated hostnames used within the cluster:
6168 <pre>10.0.0.3 clu3 # cluster member</pre>
6170 <pre>10.0.0.4 clu4 # second cluster member</pre>
6172 <pre>10.0.0.10 clunfsacct # floating IP address associated with accounting team NFS service</pre>
6174 <pre>10.0.0.11 clunfseng # floating IP address associated with engineering team NFS service</pre>
6175 The following is excerpted from running <b>cluadmin</b> to configure this
6176 example NFS service:
6177 <p>cluadmin> <b>service add</b>
6178 <pre>Service name: <b>nfs_accounting
6179 </b>Preferred member [None]: clu4
6180 Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b>yes
6181 </b>User script (e.g., /usr/foo/script or None) [None]:
6182 Do you want to add an IP address to the service (yes/no/?) [no]: <b>yes
6184 </b> IP Address Information
6186 IP address: <b>10.0.0.10
6187 </b>Netmask (e.g. 255.255.255.0 or None) [None]:
6188 Broadcast (e.g. X.Y.Z.255 or None) [None]:
6189 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]:
6190 Do you want to add a disk device to the service (yes/no/?) [no]: <b>yes
6192 </b>Disk Device Information
6194 Device special file (e.g., /dev/sdb4): <b>/dev/sdb10
6195 </b>Filesystem type (e.g., ext2, ext3 or None): <b>ext3
6196 </b>Mount point (e.g., /usr/mnt/service1) [None]: <b>/mnt/users/accounting
6197 </b>Mount options (e.g., rw,nosuid,sync): <b>rw,nosuid,sync
6198 </b>Forced unmount support (yes/no/?) [yes]:
6199 Would you like to allow NFS access to this filesystem (yes/no/?) [no]: <b>yes
6201 </b>You will now be prompted for the NFS export configuration:
6203 Export directory name: <b>/mnt/users/accounting
6205 </b>Authorized NFS clients
6207 Export client name [*]: <b>burke
6208 </b>Export client options [None]: <b>rw
6209 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>a
6211 </b>Export client name [*]: <b>stevens
6212 </b>Export client options [None]: <b>rw
6213 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>a
6215 </b>Export client name [*]: <b>needle
6216 </b>Export client options [None]: <b>rw
6217 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>a
6219 </b>Export client name [*]: <b>dwalsh
6220 </b>Export client options [None]: <b>rw
6221 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>f
6222 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]:
6223 Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]:
6224 Disable service (yes/no/?) [no]:
6227 preferred node: clu4
6230 IP address 0: 10.0.0.10
6231 netmask 0: None
6232 broadcast 0: None
6233 device 0: /dev/sdb10
6234 mount point, device 0: /mnt/users/accounting
6235 mount fstype, device 0: ext3
6236 mount options, device 0: rw,nosuid,sync
6237 force unmount, device 0: yes
6238 NFS export 0: /mnt/users/accounting
6239 Client 0: burke, rw
6240 Client 1: stevens, rw
6241 Client 2: needle, rw
6242 Client 3: dwalsh, rw
6243 Add nfs_eng service as shown? (yes/no/?) yes
6248 NFS Client Access</h4>
6249 The NFS usage model for clients is completely unchanged from its normal
6250 approach. Following the prior example, if a client system wishes
6251 to mount the highly available NFS service, it simply needs to have an entry
6252 like the following in its <b>/etc/fstab</b> file:
6253 <pre>clunfsacct:/mnt/users/accounting /mnt/users/ nfs bg 0 0</pre>
6256 Active-Active NFS Configuration</h4>
6257 In the previous section, an example configuration of a simple NFS service
6258 was discussed. This section describes how to setup a more complex
6260 <p>The example in this section involves configuring a pair of highly available
6261 NFS services. In this example, suppose you had 2 separate teams of
6262 users who will be accessing NFS filesystems served by the cluster.
6263 To serve these users, two separate NFS services will be configured.
6264 Each service will have its own separate IP address and be preferred to
6265 distinct cluster members. In this manner, under normal operating
6266 circumstances, when both cluster members are running, each will be NFS
6267 exporting one of the filesystems. This enables you to most effectively
6268 utilize the capacity of your two server systems. In the event of
6269 a failure (or planned maintenance) on either of the cluster members, then
6270 both NFS services will be running on the surviving cluster member.
6271 <p>This example configuration will expand upon the NFS service created
6272 in the prior section by adding in a second service. The following
6273 service configuration parameters apply to this second service:
6277 Service Name - <b>nfs_engineering</b>. This name was chosen as a reminder
6278 of the service's intended function to provide NFS exports to the members
6279 of the engineering team.</li>
6282 Preferred Member - <b>clu3</b>. In this example cluster, the member
6283 names are clu3 and clu4. Note that here we specify clu3 because the
6284 other cluster service (nfs_accounting) has clu4 specified as its preferred
6288 IP Address - <b>10.0.0.11</b>. There is a corresponding hostname
6289 of clunfseng associated with this IP address, by which NFS clients mount
6290 the filesystem. Note that this IP address is distinct from that of
6291 both cluster members (clu3 and clu4). Also note that this IP address
6292 is different from the one associated with the other NFS service (nfs_accounting).
6293 The default netmask and broadcast address will be used.</li>
6296 Mount Information - /<b>dev/sdb11</b>, which refers to the partition on
6297 the shared storage RAID box on which the filesystem will be physically
6298 stored. <b>ext2 </b>- referring to the filesystem type which was specified
6299 when the filesystem was created. <b>/mnt/users/engineering</b> -
6300 specifies the filesystem mount point. r<b>w,nosuid,sync</b> - are the mount
6304 Export Information - for this example, individual subdirectories of the
6305 mounted filesystem will be made accessible on a read write basis by three
6306 members of the engineering team. The names of the systems used by
6307 these three team members are <b>ferris</b>,
6308 <b>denham</b>, and <b>brown</b>.
6309 Also to make this example more illustrative, you will see that each team
6310 member will only be able to NFS mount their specific subdirectory.</li>
6312 Shown below is an excerpt output from running cluadmin to create this second
6313 NFS service on the same cluster as used in the prior example when the service
6314 nfs_accounting was created.
6316 <pre>cluadmin> <b>service add
6318 </b>Service name: nfs_engineering
6319 Preferred member [None]: <b>clu3
6320 </b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b>yes
6321 </b>User script (e.g., /usr/foo/script or None) [None]:
6322 Do you want to add an IP address to the service (yes/no/?) [no]: <b>yes
6324 </b> IP Address Information
6326 IP address: <b>10.0.0.11
6327 </b>Netmask (e.g. 255.255.255.0 or None) [None]:
6328 Broadcast (e.g. X.Y.Z.255 or None) [None]:
6329 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]: <b>f
6330 </b>Do you want to add a disk device to the service (yes/no/?) [no]: <b>yes
6332 </b>Disk Device Information
6334 Device special file (e.g., /dev/sdb4): <b>/dev/sdb11
6335 </b>Filesystem type (e.g., ext2, ext3 or None): <b>ext2
6336 </b>Mount point (e.g., /usr/mnt/service1) [None]: <b>/mnt/users/engineering
6337 </b>Mount options (e.g., rw,nosuid,sync): <b>rw,nosuid,sync
6338 </b>Forced unmount support (yes/no/?) [yes]:
6339 Would you like to allow NFS access to this filesystem (yes/no/?) [no]: <b>yes
6341 </b>You will now be prompted for the NFS export configuration:
6343 Export directory name: <b>/mnt/users/engineering/ferris
6345 </b>Authorized NFS clients
6347 Export client name [*]: <b>ferris
6348 </b>Export client options [None]: <b>rw
6349 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>f
6350 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: <b>a
6352 </b>Export directory name: <b>/mnt/users/engineering/denham
6354 </b>Authorized NFS clients
6356 Export client name [*]: <b>denham
6357 </b>Export client options [None]: <b>rw
6358 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]:
6359 Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: <b>a
6361 </b>Export directory name: <b>/mnt/users/engineering/brown
6363 </b>Authorized NFS clients
6365 Export client name [*]: <b>brown
6366 </b>Export client options [None]: <b>rw
6367 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>f
6368 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: <b>a
6369 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]:
6370 Disable service (yes/no/?) [no]:
6371 name: nfs_engineering
6373 preferred node: clu3
6376 IP address 0: 10.0.0.11
6377 netmask 0: None
6378 broadcast 0: None
6379 device 0: /dev/sdb11
6380 mount point, device 0: /mnt/users/engineering
6381 mount fstype, device 0: ext2
6382 mount options, device 0: rw,nosuid,sync
6383 force unmount, device 0: yes
6384 NFS export 0: /mnt/users/engineering/ferris
6385 Client 0: ferris, rw
6386 NFS export 0: /mnt/users/engineering/denham
6387 Client 0: denham, rw
6388 NFS export 0: /mnt/users/engineering/brown
6389 Client 0: brown, rw
6390 Add nfs_engineering service as shown? (yes/no/?) yes
6391 Added nfs_engineering.
6396 The following points need to be taken into consideration when clustered
6397 NFS services are configured.
6399 Avoid using `exportfs -r`</h5>
6400 Filesystems being NFS exported by cluster members do not get specified
6401 in the conventional <b>/etc/exports </b>file. Rather, the NFS exports
6402 associated with a cluster services are specified in the cluster configuration
6403 file (as established by <b>cluadmin</b>).
6404 <p>The command <b><i>exportfs -r </i></b>removes any exports which are
6405 not explicitly specified in the <b>/etc/exports</b> file. Running
6406 this command will cause the clustered NFS services to become unavailable
6407 until the service is restarted. For this reason you should avoid using
6408 the <b><i>exportfs -r </i></b>command on a cluster on which highly available
6409 NFS services are configured. To recover from unintended usage of
6411 -r</b>, the NFS cluster service must be stopped and then restarted.
6414 NFS File Locking</h5>
6415 NFS file locks are <b>not</b> preserved across a failover or service relocation.
6416 This is due to the fact that the Linux NFS implementation stores file locking
6417 information in system files. These system files representing NFS
6418 locking state are not replicated across the cluster. The implication
6419 is that locks may be regranted subsequent to the failover operation.
6422 <a NAME="Setting up a Samba Service
"></a></h3>
6425 4.1.8 Setting Up a High Availability Samba Service</h3>
6426 <i>(Editorial Note: this is a preliminary writeup, its rough - needing
6427 editorial cleanup.)</i>
6428 <p>Highly available network file services are one of the key strengths
6429 of the clustering infrastructure. Advantages of high availibility
6430 Samba services include:
6433 Provides heterogeneous file serving capabilities to Windows (trademark
6434 symbol?) based clients using the CIFS/SMB protocol.</li>
6437 Allows the same set of filesystems to be simultaneously network served
6438 to both NFS and Windows based clients.</li>
6441 Ensures that Windows based clients maintain access to key data, or allowed
6442 to quickly reestablish connection in the event of server failure.</li>
6445 Facilitates planned maintenance by allowing you to transparently relocate
6446 Samba services to one cluster member, allowing you to fix or upgrade the
6447 other cluster member.</li>
6450 Allows you to setup an active-active configuration to maximize equipment
6451 utilization. More details on active-active configurations appear
6454 Note: a complete description of Samba configuration is beyond the scope
6455 of this document. Rather, this documention merely highlights aspects
6456 which are crucial for clustered operation. Refer to <i><<tbd
6457 link in RH documentation>></i> for more details on Samba configuration.
6458 As a prerequisite to configuring high availability Samba services, you
6459 should know how to configure conventional non-clustered Samba fileserving.
6462 Samba Server Requirements</h4>
6463 If you intend to create highly available Samba services, then there are
6464 a few requirements which must be met by each cluster server. These requirements
6468 The Samba RPM packages must be installed. For example: <b>samba</b>,
6469 <b>samba-common</b>.
6470 There have been no modifications to the Samba RPMs themselves in support
6471 of high availability.</li>
6474 The Samba daemons will be started and stopped up by the cluster infrastructure
6475 on a per-service basis. Consequently, the Samba configuration information
6476 should not be specified in the conventional <b>/etc/samba/smb.conf</b>.
6477 The automated system startup of the samba daemons smbd and nmbd should
6478 not be enabled in init.d run levels For example:
6483 Since the cluster infrastructure stops the cluster related samba deamons
6484 appropriately, system administrators should not manually run the conventional
6485 samba stop script (e.g. <b>service smb stop</b>) as this will terminate
6486 all cluster related samba daemons.</li>
6489 Filesystem mounts for clustered Samba services should not be included
6490 in <b>/etc/fstab.</b> Rather, for clustered services, the parameters
6491 describing mounts are entered via the <b>cluadmin</b> configuration utility.</li>
6494 Failover of samba printer shares is not currently supported.</li>
6497 <i>Editorial Note - need to describe the incorporation of kernel patches,
6498 once we work that out.</i></li>
6502 Samba Operating Model</h4>
6503 This section provides background information describing the implementation
6504 model in support of Samba high availability services. Knowledge of
6505 this information will provide the context for understanding the configuration
6506 requirements of clustered Samba services.
6507 <p>The conventional non-clustered Samba configuration model consists of
6508 editing the <b>/etc/samba/smb.conf </b>file to designate which filesystems
6509 are to be made network accessible to the specified Windows clients.
6510 It also designates access permissions and other mapping capabilities.
6511 In the single system model, a single instance of each of the <b>smbd</b>
6512 and <b>nmbd </b>daemons are automatically started up by the init.d run
6513 level script <b>smb</b>.
6514 <p>In order to implement high availibility Samba services, rather than
6515 having a single /<b>etc/samba/smb.conf </b>file; there is an individual
6516 per-service samba configuration file. These files are called /<b>etc/samba/smb.conf.sharename</b>;
6517 where <b>sharename</b> is replaced by the specific name of the individual
6518 configuration file associated with a Samba service. For example,
6519 suppose you wished to call one share <b>eng </b>and another share <b>acct,</b>
6520 the corresponding Samba configuration files would be <b>/etc/samba/smb.conf.eng</b>
6521 and /<b>etc/samba/smb.conf.acct.</b>
6522 <p>The format of the <b>smb.conf.sharename</b> file is identical to the
6523 conventional <b>smb.conf</b> format. No additional fields have been
6524 created for clustered operation. There are several fields within the <b>smb.conf.sharename</b>
6525 file which are required for correct cluster operation; these fields will
6526 be described in an upcoming section. When a new Samba service is
6527 created using the <b>cluadmin</b> utility, a default template <b>smb.conf.sharename</b>
6528 file will be created based on the service specific parameters. This
6529 file should be used as a starting point from which the system administrator
6530 should then adjust to add in the appropriate Windows client systems, specific
6531 directories to share as well as permissions.
6532 <p>The system administrator is required to copy the <b>/etc/samba/smb.conf.sharename</b>
6533 files onto both cluster members. After the initial configuration
6534 time, should any changes be made to any <b>smb.conf.sharename</b> file,
6535 it is necessary to also copy this updated version to the other cluster
6537 <p>To facilitate high availibility Samba functionality, each individual
6538 Samba service configured within the cluster (via <b>cluadmin</b>) will
6539 have its own individual pair of <b>smbd</b>/<b>nmbd</b> daemons.
6540 Consequently, if there are more than one Samba services configured with
6541 the cluster, you may see multiple instances of these daemon pairs running
6542 on an individual cluster server. These Samba daemons <b>smbd</b>/<b>nmbd
6544 not initiated via the conventional init.d run level scripts; rather they
6545 are initiated by the cluster infrastructure based on which node is the
6546 active service provider.
6547 <p>In order to allow a single system to run multiple instances of the Samba
6548 daemons, each pair of daemons is required to have its own locking directory.
6549 Consequently, there will be a separate per-service Samba daemon locking
6550 directory. This directory is given a the name <b>/var/cache/samba/sharename</b>;
6551 where <b>sharename</b> is replaced by the Samba share name specified within
6552 the service configuration information (via <b>cluadmin</b>). Following
6553 the prior example, the corresponding lock directories would be <b>/var/cache/samba/eng</b>
6554 and <b>/var/cache/samba/acct</b>.
6555 <p>When the <b>cluadmin</b> utility is used to configure a Samba service,
6556 the <b>/var/cache/samba/sharename</b> directory will be automatically created
6557 on the system on which the <b>cluadmin</b> utility is running. At
6558 this time a reminder will be displayed that you need to manually create
6559 this lock directory on the other cluster member. For example: <b>mkdir
6560 /var/cache/samba/eng</b>.
6563 Gathering Samba Service Configuration Parameters</h4>
6564 In preparation of configuring Samba services you need to determine configuration
6565 information such as which filesystems will be presented as shares to Windows
6566 based clients. The following information is required in order to
6567 configure NFS services:
6570 <b>Service Name</b> - A name used to uniquely identify this service within
6574 <b>Preferred Member</b> - Defines which system will be the Samba server
6575 for this service when more than one cluster member is operational.</li>
6578 <b>Relocation Policy</b> - whether to relocate the service to the preferred
6579 member if the preferred member wasn't running at the time the service was
6580 initially started. This parameter is useful as a means of load balancing
6581 the cluster members as Sambaservers by assigning half the load to each.</li>
6584 <b>Status Check Interval - </b>specifies how often (in seconds) the cluster
6585 subsystem should verify that the pair of Samba daemons <b>smbd</b>/<b>nmbd</b>
6586 which are associated with this service are running. In the event
6587 that either of these daemons have unexpectedly exited, they will be automatically
6588 restarted to resume services. If you specify a value of 0, then no
6589 monitoring will be performed. For example, designating an interval
6590 of 90 seconds will result in monitoring at that interval.</li>
6593 <b>IP Address</b> - Windows clients access file shares from an server as
6594 designated by its IP Address (or associated hostname). In order to
6595 abstract Windows clients from knowing which specific cluster member is
6596 the acting Samba server, the client systems should not use the cluster
6597 member's hostname as the IP address by which a service is accessed.
6598 Rather, clustered Samba services are assigned <i>floating</i> IP addresses
6599 which are distinct from the cluster server's IP addresses. This floating
6600 IP address is then configured on which ever cluster member is actively
6601 serving the share. Following this approach, the Windows clients are
6602 only aware of the floating IP address and are unaware of the fact that
6603 clustered SAmba services have been deployed. When you enter a Samba
6604 service's IP address, you will also be prompted to enter an associated
6605 netmask and broadcast address. If you select the default of None,
6606 then the assigned netmask and broadcast will be the same as what the network
6607 interface is currently configured to.</li>
6610 <b>Mount Information</b> - for non-clustered filesystems, the mount information
6611 is typically placed in /<b>etc/fstab</b>. In contrast, clustered
6612 filesystems must <b>not</b> be placed in <b>/etc/fstab</b>. This
6613 is necessary to ensure that only one cluster member at a time has the filesystem
6614 mounted. Failure to do so will result in filesystem corruption and
6615 likely system crashes.</li>
6619 <b>Device special file</b> - The mount information designates the disk's
6620 device special file and the directory on which the filesystem will be mounted.
6621 In the process of configuring a Samba service you will be prompted for
6622 this information.</li>
6625 <b>Mount point directory</b> - A Samba service can include more than one
6626 filesystem mount. In this manner, the filesystems will be grouped
6627 together as a single failover unit.</li>
6630 <b>Mount options</b> - The mount information also designates the mount
6634 <b>Forced unmount </b>- As part of the mount information, you will be prompted
6635 as to whether forced unmount should be enabled or not. When forced
6636 unmount is enabled, if any applications running on the cluster server have
6637 the designated filesystem mounted when the service is being disabled or
6638 relocated, then that application will be killed off to allow the unmount
6643 <b>Export Information</b> - this information is required for NFS services
6644 only. If you are only performing file serving to Windows based clients,
6645 answer <i>no</i> when prompted regarding NFS exports. Alternatively,
6646 you can configure a service to perform heterogeneous file serving by designating
6647 both NFS exports parameters and the Samba share parameter.</li>
6650 <b>Samba Share Name</b> - In the process of configuring a service
6651 you will be asked if you wish to share the filesystem to Windows clients.
6652 If you answer <i>yes</i> to this question, you will then be prompted for
6653 the Samba share name. Based on the name you specify here, there will
6654 be a corresponding <b>/etc/samba/smb.conf.sharename</b> file and lock directory
6655 <b>/var/cache/samba/sharename</b>.
6656 By convention the actual Windows share name specified within the smb.conf.sharename
6657 will be set in accordance with this parameter. In practice, you can
6658 designate more than one Samba share within an individual <b>smb.conf.sharename</b>
6659 file. There can be at most 1 samba configuration specified per service;
6660 which must be specified with the first device. For example, if you
6661 have multiple disk devices (and corresponding filesystem mounts) within
6662 a single service, then specify a single <b>sharename</b> for the service.
6663 Then within the <b>/etc/samab/smb.conf.sharename</b> file, designate multiple
6664 individual samba shares to share directories from the multiple devices.
6665 To disable samba sharing of a service, the share name should be set to
6668 When running the <b>cluadmin</b> utility to configure Samba services:
6671 Please take care that you correctly enter the service parameters.
6672 The validation logic associated with Samba parameters is currently not
6676 In response to most of the prompts, you can enter the <b>? </b>character
6677 to obtain descriptive help text.</li>
6680 After configuring a Samba service via <b>cluadmin</b>, remember to tune
6681 the <b>/etc/samba/smb.conf.sharename</b> file for each service in accordance
6682 with the clients and authorization scheme you desire.</li>
6685 Remember to copy the <b>smb.conf.sharename</b> file over to the other cluster
6689 Perform the recommended step to create the Samba daemon's lock directory
6690 on the other cluster member, eg <b>mkdir /var/cache/samba/acct.</b></li>
6693 If you delete a Samba service, be sure to manually remove the <b>/etc/samba/smb.conf/sharename
6695 The <b>cluadmin</b> utility does not autmoatically delete this file in
6696 order to preserve your site specific configuration parameters for possible
6701 Example Samba Service Configuration</h4>
6702 In order to illustrate the configuration process for a Samba service, an
6703 example configuration is described in this section. This example
6704 consists of setting up a single Samba share which houses the home directories
6705 of 4 members of the accounting team. The accounting team will then
6706 access this share from their Windows based systems.
6707 <p>The following are the service configuration parameters which will be
6708 used as well as some descriptive commentary.
6712 Service Name - <b>samba_acct.</b> This name was chosen as a reminder of
6713 the service's intended function to provide exports to the members of the
6714 accounting team.</li>
6717 Preferred Member - <b>clu4</b>. In this example cluster, the member
6718 names are clu3 and clu4.</li>
6721 Monitoring Interval - <b>90</b> seconds.</li>
6724 IP Address - <b>10.0.0.10</b>. There is a corresponding hostname
6725 of cluacct associated with this IP address, by which Windows based clients
6726 access the share. Note that this IP address is distinct from that
6727 of both cluster members (clu3 and clu4). The default netmask and
6728 broadcast address will be used.</li>
6731 Mount Information - /<b>dev/sdb10</b>, which refers to the partition on
6732 the shared storage RAID box on which the filesystem will be physically
6733 stored. <b>ext2 </b>- referring to the filesystem type which was specified
6734 when the filesystem was created. <b>/mnt/users/accounting</b> - specifies
6735 the filesystem mount point. <b>rw,nosuid,sync</b> - are the mount options.</li>
6738 Export Information - for simplicity in this example, the filesystem is
6739 not being NFS exported.</li>
6742 Share Name - <b>acct</b> - this is the share name by which Windows based
6743 clients will access this Samba share, e.g. \\10.0.0.10\acct.</li>
6745 The following is an excerpt of the /etc/hosts file used to represent IP
6746 addresses and associated hostnames used within the cluster:
6747 <pre>10.0.0.3 clu3 # cluster member</pre>
6749 <pre>10.0.0.4 clu4 # second cluster member</pre>
6751 <pre>10.0.0.10 cluacct # floating IP address associated with accounting team NFS service</pre>
6752 The following is excerpted from running <b>cluadmin</b> to configure this
6753 example Samba service:
6754 <pre>Service name: <b>samba_acct
6755 </b>Preferred member [None]: <b>clu4
6756 </b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes
6757 User script (e.g., /usr/foo/script or None) [None]:
6758 Status check interval [0]: <b>90
6759 </b>Do you want to add an IP address to the service (yes/no/?) [no]: yes
6761 IP Address Information
6763 IP address: <b>10.0.0.10
6764 </b>Netmask (e.g. 255.255.255.0 or None) [None]:
6765 Broadcast (e.g. X.Y.Z.255 or None) [None]:
6766 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]:
6767 Do you want to add a disk device to the service (yes/no/?) [no]: yes
6769 Disk Device Information
6771 Device special file (e.g., /dev/sdb4): <b>/dev/sdb12
6772 </b>Filesystem type (e.g., ext2, ext3 or None): <b>ext2
6773 </b>Mount point (e.g., /usr/mnt/service1) [None]: <b>/mnt/users/accounting
6774 </b>Mount options (e.g., rw,nosuid,sync): <b>rw,nosuid,sync
6775 </b>Forced unmount support (yes/no/?) [yes]:
6776 Would you like to allow NFS access to this filesystem (yes/no/?) [no]:
6777 Would you like to share to Windows clients (yes/no/?) [no]: <b>yes
6779 </b>You will now be prompted for the Samba configuration:
6780 Samba share name: <b>acct
6782 </b>The samba config file /etc/samba/smb.conf.acct does not exist.
6784 Would you like a default config file created (yes/no/?) [no]: <b>yes
6786 </b>Successfully created daemon lock directory /var/cache/samba/acct.
6787 Please run `mkdir /var/cache/samba/acct` on the other cluster member.
6789 Successfully created /etc/samba/smb.conf.acct.
6790 Please remember to make necessary customizations and then copy the file
6791 over to the other cluster member.
6793 Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]: <b>f
6794 </b>name: samba_acct
6795 preferred node: clu4
6798 monitor interval: 90
6799 IP address 0: 10.0.0.10
6800 netmask 0: None
6801 broadcast 0: None
6802 device 0: /dev/sdb12
6803 mount point, device 0: /mnt/users/accounting
6804 mount fstype, device 0: ext2
6805 mount options, device 0: rw,nosuid,sync
6806 force unmount, device 0: yes
6807 samba share, device 0: acct
6808 Add samba_acct service as shown? (yes/no/?) <b>yes</b></pre>
6809 After running cluadmin as shown above to configure the service, remember
6813 Customize <b>/etc/samba/smb.conf.sharename</b> accordingly.</li>
6816 Copy /<b>etc/samba/smb.conf.sharename</b> over to the other cluster member.</li>
6819 Create the suggested lock directory on the other cluster member, e.g. <b>mkdir
6820 /var/cache/samba/acct</b></li>
6824 smb.conf.sharename File Fields</h4>
6825 This section describes the fieles within the <b>smb.conf.sharename</b>
6826 file which are most relevent to the correct operation of highly available
6827 Samba services. It is beyond the scope of this document to completely
6828 describe all of the fields within a Samba configuration file. There
6829 have been no additional field names added in support of clustering, the
6830 file format follows the normal Samba conventions.
6831 <p>Shown below is an example <b>smb.conf.sharename</b> file which was automatically
6832 generated by <b>cluadmin</b> in response to the service specific parameters.
6833 This example file matches the above <b>cluadmin</b> service configuration
6834 example. Following the file will be a description of the most relevent
6836 <pre># Template samba service configuration file - please modify to specify
6837 # subdirectories and client access permissions.
6838 # Remember to copy this file over to other cluster member, and create
6839 # the daemon lock directory /var/cache/samba/acct.
6841 # From a cluster perspective, the key fields are:
6842 # lock directory - must be unique per samba service.
6843 # bind interfaces only - must be present set to yes.
6844 # interfaces - must be set to service floating IP address.
6845 # path - must be the service mountpoint or subdirectory thereof.
6846 # Refer to the cluster documentation for details.
6849 workgroup = RHCLUSTER
6850 lock directory = /var/cache/samba/acct
6851 log file = /var/log/samba/%m.log
6852 encrypt passwords = yes
6853 bind interfaces only = yes
6854 interfaces = 10.0.0.10
6857 comment = High Availability Samba Service
6858 browsable = yes
6859 writable = no
6860 public = yes
6861 path = /mnt/service12</pre>
6862 The following is a description of the most relevent fields, from a cluster
6863 perspective, in the <b>/etc/samba/smb.conf.sharename</b> file. In
6864 this example, the file is named <b>/etc/samba/smb.conf.acct </b>in
6865 accordance with the share name being specified as <b>acct</b> while running
6866 cluadmin. Only the cluster specific fields are described below.
6867 The remaining fields follow standard Samba convention and should be tailored
6869 <p>Global Parameters - These parameters pertain to all shares which are
6870 specified in this smb.conf.sharename file. Remember that you are
6871 free to designate more than one share within this file; provided that the
6872 directories described within it are within the service's filesystem mounts.
6873 <p><b>lock directory</b> - dictates the name of the directory in which
6874 the Samba daemons <b>smbd</b>/<b>nmbd</b> will place thier locking files.
6875 This must be set to <b>/var/cache/samba/sharename</b>, where <b>sharename</b>
6876 varies based on the parameter specified in <b>cluadmin</b>. Specification
6877 of a lock directory is required in order to allow a separate per-service
6878 instance of <b>smbd</b>/<b>nmbd</b>.
6879 <br><b>bind interfaces only</b> - This parameter must be set to <b>yes</b>
6880 in order to allow each <b>smbd</b>/<b>nmbd</b> pair to bind to the floating
6881 IP address associated with this clustered Samba service.
6882 <br><b>interfaces</b> -specifies the IP address associated with the Samba
6883 service. If you specified a netmask within the service, this field
6884 would appear like the following example: i<b>nterfaces = 10.0.0.10/255.255.254.0</b>.
6885 <p>Share specific parameters - these parameters pertain to a specific Samba
6887 <br><b>writable</b> - by default, this share access permissions are conservatively
6888 set as non-writable. Tune according to your site-specific preferences.
6889 <br><b>path</b> - defaults to the first filesystem mount point specified
6890 within the service configuration. This should be adjusted to match
6891 the specific directory or subdirectory you intend to make available as
6892 a share to Windows clients.
6895 Windows Client Access to Samba Shares</h4>
6896 Windows clients are oblivious to the fact that the shares are being served
6897 by a high availability cluster. From the windows client's perspective
6898 the only requirement is that they access the Samba share via its floating
6899 IP address (or associated hostname) which was configured using cluadmin,
6900 e.g. 10.0.0.10. The Windows clients should not directly access the
6901 share from either of the cluster member system's IP address (e.g. clu3
6903 <p>Depending upon the authorization scheme you intend to utilize in your
6904 environment, you may have to use the <b>smbpasswd</b> command to establish
6905 Windows account information on the cluster servers. When establishing these
6906 accounts it is required that the same Samba related account information
6907 be setup on both cluster members. This can either be accomplished
6908 by running smbpassword similarly on both cluster members, or by copying
6909 over the resulting <b>/etc/smbpasswd</b> file. For example, to enable a
6910 Windows client system named <b>sarge</b> to access a Samba share served
6911 by the cluster members, you would run the following command on both cluster
6912 members, taking care to specify the same username and password each time:
6915 <p>On a Windows client, the Samba share can then be accessed in the conventional
6916 manner. For example you could click on the <b>Start</b> button on
6917 the main taskbar, followed by selecting <b>Run</b>. This brings up
6918 a dialog box in which you can specify the clustered Samba share name.
6919 For example: \<b>\10.0.0.10\acct </b>or equivalently <b>\\cluacct\acct</b>.
6920 To access the samba share from a Windows client you can also use the <b>Map
6921 Network Drive </b>feature. It is important to take care to ensure that
6922 the hostname portion of the share name refers to the floating service IP
6923 address. Following the hostname / IP addresses from the above <b>/etc/hosts</b>
6924 excerpt; the correct name to refer to this highly available cluster share
6925 is \<b>\cluacct\acct</b>. The share should not be accessed by referring
6926 to the name of the cluster server. For example, do not access this
6927 share as either <b>\\clu3\acct </b>or \<b>\clu4\acct</b>. If a share
6928 is incorrectly referred to by the cluster server name (e.g. \<b>\clu3\acct</b>),
6929 then the Windows client will only be able to access the share while it
6930 is being actively served by <b>clu3</b>; thereby subverting the high availability
6932 <p>Unlike the NFS protocol, the Windows based CIFS/SMB protocol is much
6933 more stateful. As a consequence, in the Windows environment, it is
6934 the responsibility of the individual application to take appropriate measures
6935 in response to lack of immediate response from the Samba server.
6936 In the case of either a planned service relocation or a true failover scenario,
6937 there is a period of time where the Windows clients will not get immediate
6938 response from the Samba server. Robust Windows applications will
6939 retry requests which timeout during this interval.
6940 <p>We have observed that well behaved applications correctly retry appropriately
6941 resulting in Windows clients being completely unaware of service relocations
6942 or failover operations. In contrast, poorly behaved Windows applications
6943 will result in error messages in the event of a failover or relocation
6944 indicating inability to access the share. It may be necessary to
6945 retry the operation or restart the application in order to enable Windows
6946 client systems to re-attach to a Samba share for poorly written applications.
6947 <p>The behavior of a Windows based client in response to either failover
6948 or relocation of a samba service also varies on which release of windows
6949 is installed on each client system. For example, Windows 98 based
6950 systems often enounter errors like <i>The network path was not found</i>.
6951 Whereas, later versions such as Windows 2000 transparently recover under
6952 the same set of circumstances.
6953 <p><i>Editorial comment: Add in description of the impact of the kernel
6954 patches for Stale File Handle errors once we determine whether that will
6955 be incorporated.</i>
6957 <p><a NAME="service-apache
"></a>
6959 Setting Up an Apache Service</h3>
6960 This section provides an example of setting up a cluster service that will
6961 fail over an Apache Web server. Although the actual variables that you
6962 use in the service depend on your specific configuration, the example may
6963 help you set up a service for your environment.
6964 <p><i>Editorial comment: Here the distinction of Piranha as a load balancer
6965 vs a highly available apache server for static content should be discussed.</i>
6966 <p>To set up an Apache service, you must configure both cluster systems
6967 as Apache servers. The cluster software ensures that only one cluster system
6968 runs the Apache software at one time. The Apache configuration will
6969 consist of installing the apache rpm's on both cluster members and configuring
6970 a shared filesystem to house the web site's content.
6971 <p>When you install the Apache software on the cluster systems, do not
6972 configure the cluster systems so that Apache automatically starts when
6973 the system boots. For example, running <b>chkconfig --del httpd</b>.
6974 Rather than having the system startup scripts spawn httpd, the cluster
6975 infrastructure will do that on the active cluster server for the Apache
6976 service. This will ensure that the corresponding IP address and filesystem
6977 mounts are active on only one cluster member at a time.
6978 <p>When you add an Apache service, you must assign it a "floating
" IP address.
6979 The cluster infrastructure binds this IP address to the network interface
6980 on the cluster system that is currently running the Apache service. This
6981 IP address ensures that the cluster system running the Apache software
6982 is transparent to the HTTP clients accessing the Apache server.
6983 <p>The file systems that contain the Web content must not be automatically
6984 mounted on shared disk storage when the cluster systems boot. Instead,
6985 the cluster software must mount and unmount the file systems as the Apache
6986 service is started and stopped on the cluster systems. This prevents both
6987 cluster systems from accessing the same data simultaneously, which may
6988 result in data corruption. Therefore, do not include the file systems in
6989 the <b><font face="Courier New, Courier, mono
">/etc/fstab </font></b>file.
6990 <p>Setting up an Apache service involves the following four steps:
6993 Set up the shared file system for the service. This filesystem is
6994 used to house the web site's content.</li>
6997 Install the Apache software on both cluster systems.</li>
7000 Configure the Apache software on both cluster systems.</li>
7003 Add the service to the cluster database.</li>
7005 To set up the shared file systems for the Apache service, become root and
7006 perform the following tasks on one cluster system:
7009 On a shared disk, use the interactive <b><font face="Courier New, Courier, mono
">fdisk</font></b>
7010 command to create a partition that will be used for the Apache document
7011 root directory. Note that you can create multiple document root directories
7012 on different disk partitions. See <a href="#partition
">Partitioning Disks</a>
7013 for more information.</li>
7017 Use the <b><font face="Courier New, Courier, mono
">mkfs</font></b> command
7018 to create an ext2 file system on the partition you created in the previous
7019 step. Specify the drive letter and the partition number. For example:</li>
7021 <pre># <b>mkfs /dev/sde3</b></pre>
7024 Mount the file system that will contain the Web content on the Apache document
7025 root directory. For example:</li>
7027 <pre># <b>mount /dev/sde3 /var/www/html</b></pre>
7028 Do not add this mount information to the <b><font face="Courier New, Courier, mono
">/etc/fstab</font></b>
7029 file, because only the cluster software can mount and unmount file systems
7032 Copy all the required files to the document root directory.</li>
7036 If you have CGI files or other files that must be in different directories
7037 or is separate partitions, repeat these steps, as needed.</li>
7039 You must install the Apache software on both cluster systems. Note that
7040 the basic Apache server configuration must be the same on both cluster
7041 systems in order for the service to fail over correctly. The following
7042 example shows a basic Apache Web server installation, with no third-party
7043 modules or performance tuning. To install Apache with modules, or to tune
7044 it for better performance, see the Apache documentation that is located
7045 in the Apache installation directory, or on the Apache Web site, <a href="http://www.apache.org
" target="_blank
">www.apache.org</a>.
7046 <p>On both cluster systems, install the Apache RPM's. For example:
7047 <b>apache-1.3.20-16</b>
7048 <p>To configure the cluster systems as Apache servers, customize the <b><font face="Courier New, Courier, mono
">httpd.conf</font></b>
7049 Apache configuration file, and create a script that will start and stop
7050 the Apache service. Then, copy the files to the other cluster system. The
7051 files must be identical on both cluster systems in order for the Apache
7052 service to fail over correctly.
7053 <p>On one system, perform the following tasks:
7056 Edit the <b><font face="Courier New, Courier, mono
">/etc/httpd/conf/httpd.conf</font></b>
7057 Apache configuration file and customize the file according to your configuration.
7062 Specify the directory that will contain the HTML files. You will specify
7063 this mount point when you add the Apache service to the cluster database.
7064 You are only required to change this field if the mountpoint for the web
7065 site's content differs from the default setting of /<b>var/www/html.</b>
7068 <pre>DocumentRoot "/mnt/apacheservice/html
"</pre>
7071 If you have modified the script directory to reside in a non-standard location,
7072 specify the directory that will contain the CGI programs. For example:</li>
7074 <pre>ScriptAlias /cgi-bin/ "/mnt/apacheservice/cgi-bin/
"</pre>
7077 Specify the path that was used in the previous step, and set the access
7078 permissions to default for that directory. For example:</li>
7080 <pre><Directory mnt/apacheservice/cgi-bin">
7081 AllowOverride None
7082 Options None
7083 Order allow,deny
7084 Allow from all
7085 </Directory
></pre>
7087 If you want to tune Apache or add third-party module functionality, you
7088 may have to make additional changes. For information on setting up other
7089 options, see the Apache project documentation.
7092 The standard Apache start script,
<b>/etc/rc.d/init.d/httpd
</b>will also
7093 be used within the cluster framework to start and stop the Apache server
7094 on the active cluster member.
Accordingly, when configuring the service,
7095 specify that script when prompted for the
<b>User script.
</b> <i>Editorial
7096 comment: unclear if a modified status section is needed in the httpd init.d
7099 Before you add the Apache service to the cluster database, ensure that
7100 the Apache directories are not mounted. Then, on one cluster system, add
7101 the service. You must specify an IP address, which the cluster infrastructure
7102 will bind to the network interface on the cluster system that runs the
7104 <p>The following is an example of using the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7105 utility to add an Apache service.
7106 <pre><font size=-
1>cluadmin
> <b>service add apache
7108 </b> The user interface will prompt you for information about the service.
7109 Not all information is required for all services.
7111 Enter a question mark (?) at a prompt to obtain help.
7113 Enter a colon (:) and a single-character command at a prompt to do
7114 one of the following:
7116 c - Cancel and return to the top-level cluadmin command
7117 r - Restart to the initial prompt while keeping previous responses
7118 p - Proceed with the next prompt
7119
7120 Preferred member [None]:
<b><font face=
"Courier New, Courier, mono">devel0
7121 </font></b>Relocate when the preferred member joins the cluster (yes/no/?) [no]:
<b><font face=
"Courier New, Courier, mono">yes
7122 </font></b>User script (e.g., /usr/foo/script or None) [None]:
<b><font face=
"Courier New, Courier, mono">/etc/rc.d/init.d/httpd
7124 </font></b>Do you want to add an IP address to the service (yes/no/?):
<b><font face=
"Courier New, Courier, mono">yes
7126 </font></b> IP Address Information
7128 IP address:
<b><font face=
"Courier New, Courier, mono">10.1.16.150
7129 </font></b>Netmask (e.g.
255.255.255.0 or None) [None]:
<b><font face=
"Courier New, Courier, mono">255.255.255.0
7130 </font></b>Broadcast (e.g. X.Y.Z
.255 or None) [None]:
<b><font face=
"Courier New, Courier, mono">10.1.16.255
7132 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
7133 or are you (f)inished adding IP addresses:
<b><font face=
"Courier New, Courier, mono">f
7135 </font></b>Do you want to add a disk device to the service (yes/no/?):
<b><font face=
"Courier New, Courier, mono">yes
7137 </font></b> Disk Device Information
7139 Device special file (e.g., /dev/sda1):
<b><font face=
"Courier New, Courier, mono">/dev/sdb3
7140 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None):
<b><font face=
"Courier New, Courier, mono">ext3
7141 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]:
<b><font face=
"Courier New, Courier, mono">/var/www/html
7142 </font></b>Mount options (e.g., rw, nosuid):
<b><font face=
"Courier New, Courier, mono">rw
7143 </font></b>Forced unmount support (yes/no/?) [no]:
<b><font face=
"Courier New, Courier, mono">yes
7145 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,
7146 or are you (f)inished adding device information:
<b><font face=
"Courier New, Courier, mono">f
7148 </font></b>Disable service (yes/no/?) [no]:
<b><font face=
"Courier New, Courier, mono">no
7150 </font></b>name: apache
7152 preferred node: node1
7154 user script: /etc/rc.d/init/httpd
7155 IP address
0:
10.1.16.150
7156 netmask
0:
255.255.255.0
7157 broadcast
0:
10.1.16.255
7158 device
0: /dev/sde3
7159 mount point, device
0: /var/www/html
7160 mount fstype, device
0: ext3
7161 mount options, device
0: rw,sync
7162 force unmount, device
0: yes
7163 owner, device
0: nobody
7164 group, device
0: nobody
7165 Add apache service as shown? (yes/no/?)
<b>y
7167 </b>Added apache.
7168 cluadmin
></font></pre>
7170 <p><br><a NAME=
"service-status"></a>
7172 4.2 Displaying a Service Configuration
</h2>
7173 You can display detailed information about the configuration of a service.
7174 This information includes the following:
7180 Whether the service was disabled after it was added
</li>
7183 Preferred member system
</li>
7186 Whether the service will relocate to its preferred member when it joins
7190 Service start script location
</li>
7196 Disk partitions
</li>
7199 File system type
</li>
7202 Mount points and mount options
</li>
7207 To display cluster service status, see
<a href=
"#cluster-status">Displaying
7208 Cluster and Service Status
</a>.
7209 <p>To display service configuration information, invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7210 utility and specify the
<b><font face=
"Courier New, Courier, mono">service
7211 show config
</font></b> command. For example:
7212 <pre><font size=-
1>cluadmin
> service show config
7214 1) nfs_pref_clu4
7215 2) nfs_pref_clu3
7216 3) nfs_nopref
7219 6) nfs_engineering
7223 name: nfs_engineering
7225 preferred node: clu3
7227 IP address
0:
172.16.33.164
7228 device
0: /dev/sdb11
7229 mount point, device
0: /mnt/users/engineering
7230 mount fstype, device
0: ext2
7231 mount options, device
0: rw,nosuid,sync
7232 force unmount, device
0: yes
7233 NFS export
0: /mnt/users/engineering/ferris
7234 Client
0: ferris, rw
7235 NFS export
0: /mnt/users/engineering/denham
7236 Client
0: denham, rw
7237 NFS export
0: /mnt/users/engineering/brown
7238 Client
0: brown, rw
7239 cluadmin
></font></pre>
7240 If you know the name of the service, you can specify the
<b><font face=
"Courier New, Courier, mono">service
7241 show config
<i>service_name
</i></font></b> command.
7245 <p><a NAME=
"service-disable"></a>
7247 4.3 Disabling a Service
</h2>
7248 You can disable a running service to stop the service and make it unavailable.
7249 To start a disabled service, you must enable it. See
<a href=
"#service-enable">Enabling
7250 a Service
</a> for information.
7251 <p>There are several situations in which you may need to disable a running
7255 You want to modify a service.
</li>
7261 <p>You must disable a running service before you can modify it. See
7262 <a href=
"#service-modify">Modifying
7263 a Service
</a> for more information.
7266 You want to temporarily stop a service.
</li>
7272 <p>For example, you can disable a service to make it unavailable to clients,
7273 without having to delete the service.
</ul>
7274 To disable a running service, invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7275 utility and specify the
<font face=
"Courier New, Courier, mono"><b>service
7276 disable
</b> <b><i>service_name
</i></b></font> command. For example:
7277 <pre>cluadmin
> <b>service disable user_home
7278 </b>Are you sure? (yes/no/?)
<b>y
7279 </b>notice: Stopping service user_home
...
7280 notice: Service user_home is disabled
7281 service user_home disabled
</pre>
7288 <p><a NAME=
"service-enable"></a>
7290 4.4 Enabling a Service
</h2>
7291 You can enable a disabled service to start the service and make it available.
7292 See
<a href=
"#service-error">Handling Services in an Error State
</a> for
7294 <p>To enable a disabled service, invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7295 utility and specify the
<b><font face=
"Courier New, Courier, mono">service
7296 enable
<i>service_name
</i></font></b> command. For example:
7297 <pre>cluadmin
> <b>service enable user_home
7298 </b>Are you sure? (yes/no/?)
<b>y
7299 </b>notice: Starting service user_home ...
7300 notice: Service user_home is running
7301 service user_home enabled
</pre>
7302 <i>Editorial comment: probably need a new cluadmin output here as it probably
7303 prompts for which member to start the service on.
</i>
7318 <p><a NAME=
"service-modify"></a>
7320 4.5 Modifying a Service
</h2>
7321 You can modify any property that you specified when you created the service.
7322 For example, you can change the IP address. You can also add more resources
7323 to a service. For example, you can add more file systems. See
<a href=
"#service-gather">Gathering
7324 Service Information
</a> for information.
7325 <p>You must disable a service before you can modify it. If you attempt
7326 to modify a running service, you will be prompted to disable it. See
<a href=
"#service-disable">Disabling
7327 a Service
</a>for more information.
7328 <p>Because a service is unavailable while you modify it, be sure to gather
7329 all the necessary service information before you disable the service, in
7330 order to minimize service down time. In addition, you may want to back
7331 up the cluster database before modifying a service. See
<a href=
"#cluster-backup">Backing
7332 Up and Restoring the Cluster Database
</a> for more information.
7333 <p>To modify a disabled service, invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7334 utility and specify the
<b><font face=
"Courier New, Courier, mono">service
7335 modify
<i>service_name
</i></font></b> command.
7336 <pre>cluadmin
> <b>service modify web1
</b></pre>
7337 You can then modify the service properties and resources, as needed. The
7338 cluster will check the service modifications, and allow you to correct
7339 any mistakes. If you submit the changes, the cluster verifies the service
7340 modification and then starts the service, unless you chose to keep the
7341 service disabled. If you do not submit the changes, the service will be
7342 started, if possible, using the original configuration.
7345 <a NAME=
"service-relocate"></a></h2>
7348 4.6 Relocating a Service
</h2>
7349 In addition to providing automatic service failover, a cluster enables
7350 you to cleanly stop a service on one cluster system and then start it on
7351 the other cluster system. This service relocation functionality enables
7352 administrators to perform maintenance on a cluster system, while maintaining
7353 application and data availability.
7354 <p>To relocate a service by using the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7355 utility, invole the
<b>service relocate
</b> command.
7356 <p><i>Editorial comment: include cluadmin output example.
</i>
7359 <p><a NAME=
"service-delete"></a>
7361 4.7 Deleting a Service
</h2>
7362 You can delete a cluster service. You may want to back up the cluster database
7363 before deleting a service. See
<a href=
"#cluster-backup">Backing Up and
7364 Restoring the Cluster Database
</a> for information.
7365 <p>To delete a service by using the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7366 utility, follow these steps:
7369 Invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7370 utility on the cluster system that is running the service, and specify
7371 the
<b><font face=
"Courier New, Courier, mono">service disable
<i>service_name
</i></font></b>
7372 command. See
<a href=
"#service-disable">Disabling a Service
</a> for more
7377 Specify the
<b><font face=
"Courier New, Courier, mono">service delete
<i>service_name
</i></font></b>
7378 command to delete the service.
</li>
7381 <pre>cluadmin
> <b>service disable user_home
7382 </b>Are you sure? (yes/no/?)
<b>y
7383 </b>notice: Stopping service user_home
...
7384 notice: Service user_home is disabled
7385 service user_home disabled
7387 cluadmin
> <b>service delete user_home
7388 </b>Deleting user_home, are you sure? (yes/no/?):
<b>y
7389 </b>user_home deleted.
7392 <p><br><a NAME=
"service-error"></a>
7394 4.8 Handling Services in an Error State
</h2>
7395 <i>Editorial comment: the error state no longer exists.
Need to rework
7396 this section somewhat and incorporate with the disabled service section.
</i>
7397 <p>A service in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7398 state is still owned by a cluster system, but the status of its resources
7399 cannot be determined (for example, part of the service has stopped, but
7400 some service resources are still configured on the owner system). See
<a href=
"#cluster-status">Displaying
7401 Cluster and Service Status
</a> for detailed information about service states.
7402 <p>The cluster puts a service into the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7403 state if it cannot guarantee the integrity of the service. An
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7404 state can be caused by various problems, such as a service start did not
7405 succeed, and the subsequent service stop also failed.
7406 <p>You must carefully handle services in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7407 state. If service resources are still configured on the owner system, starting
7408 the service on the other cluster system may cause significant problems.
7409 For example, if a file system remains mounted on the owner system, and
7410 you start the service on the other cluster system, the file system will
7411 be mounted on both systems, which can cause data corruption. Therefore,
7412 you can only enable or disable a service that is in the
7413 <b><font face=
"Courier New, Courier, mono">error
</font></b>
7414 state on the system that owns the service. If the enable or disable fails,
7415 the service will remain in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7417 <p>You can also modify a service that is in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7418 state. You may need to do this in order to correct the problem that caused
7419 the
<b><font face=
"Courier New, Courier, mono">error
</font></b> state.
7420 After you modify the service, it will be enabled on the owner system, if
7421 possible, or it will remain in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7422 state. The service will not be disabled.
7423 <p>If a service is in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7424 state, follow these steps to resolve the problem:
7427 Modify cluster event logging to log debugging messages. See
<a href=
"#cluster-logging">Modifying
7428 Cluster Event Logging
</a> for more information.
</li>
7432 Use the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7433 utility to attempt to enable or disable the service on the cluster system
7434 that owns the service. See
<a href=
"#service-disable">Disabling a Service
</a>
7435 and
<a href=
"#service-enable">Enabling a Service
</a> for more information.
</li>
7439 If the service does not start or stop on the owner system, examine the
7440 <b><font face=
"Courier New, Courier, mono">/var/log/cluster
</font></b>
7441 log file, and diagnose and correct the problem. You may need to modify
7442 the service to fix incorrect information in the cluster database (for example,
7443 an incorrect start script), or you may need to perform manual tasks on
7444 the owner system (for example, unmounting file systems).
</li>
7448 Repeat the attempt to enable or disable the service on the owner system.
7449 If repeated attempts fail to correct the problem and enable or disable
7450 the service, reboot the owner system.
</li>
7453 <hr noshade
width=
"80%">
7454 <p><a NAME=
"admin"></a>
7456 5 Cluster Administration
</h1>
7457 After you set up a cluster and configure services, you may need to administer
7458 the cluster, as described in the following sections:
7461 <a href=
"#cluster-status">Displaying Cluster and Service Status
</a></li>
7464 <a href=
"#cluster-start">Starting and Stopping the Cluster Software
</a></li>
7467 <a href=
"#cluster-config">Modifying the Cluster Configuration
</a></li>
7470 <a href=
"#cluster-backup">Backing Up and Restoring the Cluster Database
</a></li>
7473 <a href=
"#cluster-logging">Modifying Cluster Event Logging
</a></li>
7476 <a href=
"#cluster-reinstall">Updating the Cluster Software
</a></li>
7479 <a href=
"#cluster-reload">Reloading the Cluster Database
</a></li>
7482 <a href=
"#cluster-name">Changing the Cluster Name
</a></li>
7485 <a href=
"#cluster-init">Reinitializing the Cluster
</a></li>
7488 <a href=
"#cluster-remove">Removing a Cluster Member
</a></li>
7491 <a href=
"#diagnose">Diagnosing and Correcting Problems in a Cluster
</a></li>
7493 <a NAME=
"cluster-status"></a>
7496 5.1 Displaying Cluster and Service Status
</h2>
7497 Monitoring cluster and service status can help you identify and solve problems
7498 in the cluster environment. You can display status by using the following
7502 The
<b><font face=
"Courier New, Courier, mono">clustat
</font></b> command
</li>
7505 Log file messages
</li>
7507 Note that status is always from the point of view of the cluster system
7508 on which you are running a tool. To obtain comprehensive cluster status,
7509 run a tool on all cluster systems.
7510 <p>Cluster and service status includes the following information:
7513 Cluster member system status
</li>
7516 Power switch status
</li>
7519 Heartbeat channel status
</li>
7522 Service status and which cluster system is running the service or owns
7526 <i>Editorial comment: add bullett and subsequent description of service
7527 monitoring status.
</i></li>
7529 The following table describes how to analyze the status information shown
7530 by the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b> utility,
7531 the
<b><font face=
"Courier New, Courier, mono">clustat
</font></b> command,
7532 and the cluster GUI.
7534 <table BORDER CELLSPACING=
0 CELLPADDING=
3 WIDTH=
"92%" >
7535 <tr ALIGN=CENTER VALIGN=CENTER
BGCOLOR=
"#F8FCF8">
7536 <td WIDTH=
"22%" HEIGHT=
"39"><b>Member Status
</b></td>
7538 <td WIDTH=
"78%" HEIGHT=
"39"><b>Description
</b></td>
7542 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">UP
</font></b></td>
7544 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The member system is
7545 communicating with the other member system and accessing the quorum partitions.
</td>
7549 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">DOWN
</font></b></td>
7551 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The member system is
7552 unable to communicate with the other member system.
</td>
7556 <table BORDER CELLSPACING=
0 CELLPADDING=
3 WIDTH=
"92%" >
7557 <tr ALIGN=CENTER VALIGN=CENTER
BGCOLOR=
"#F8FCF8">
7558 <td WIDTH=
"22%" HEIGHT=
"9"><b>Power Switch Status
</b></td>
7560 <td WIDTH=
"78%" HEIGHT=
"9"><b>Description
</b></td>
7564 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">OK
</font></b></td>
7566 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The power switch is operating
7571 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">Wrn
</font></b></td>
7573 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">Could not obtain power
7578 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">Err
</font></b></td>
7580 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">A failure or error has
7581 occurred.
</td>
7585 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">Good
</font></b></td>
7587 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The power switch is operating
7592 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">Unknown
</font></b></td>
7594 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The other cluster member
7595 is
<b><font face=
"Courier New, Courier, mono">DOWN
</font></b>.
</td>
7599 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">Timeout
</font></b></td>
7601 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The power switch is not
7602 responding to power daemon commands, possibly because of a disconnected
7607 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">Error
</font></b></td>
7609 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">A failure or error has
7610 occurred.
</td>
7614 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"29"><b><font face=
"Courier New, Courier, mono">None
</font></b></td>
7616 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"29">The cluster configuration
7617 does not include power switches.
</td>
7622 <center><b>Initializing
</b></center>
7625 <td>The switch is in the process of being initialized and its definitive
7626 status has not been concluded.
</td>
7630 <table BORDER CELLSPACING=
0 CELLPADDING=
3 WIDTH=
"92%" >
7631 <tr ALIGN=CENTER VALIGN=CENTER
BGCOLOR=
"#F8FCF8">
7632 <td WIDTH=
"22%" HEIGHT=
"27"><b>Heartbeat Channel Status
</b></td>
7634 <td WIDTH=
"78%" HEIGHT=
"27"><b>Description
</b></td>
7638 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">OK
</font></b></td>
7640 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">The heartbeat channel
7641 is operating properly.
</td>
7645 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">Wrn
</font></b></td>
7647 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">Could not obtain channel
7652 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">Err
</font></b></td>
7654 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">A failure or error has
7655 occurred.
</td>
7659 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">ONLINE
</font></b></td>
7661 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">The heartbeat channel
7662 is operating properly.
</td>
7666 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">OFFLINE
</font></b></td>
7668 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">The other cluster member
7669 appears to be
<b><font face=
"Courier New, Courier, mono">UP
</font></b>,
7670 but it is not responding to heartbeat requests on this channel.
</td>
7674 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"26"><b><font face=
"Courier New, Courier, mono">UNKNOWN
</font></b></td>
7676 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"26">Could not obtain the
7677 status of the other cluster member system over this channel, possibly because
7678 the system is
<b><font face=
"Courier New, Courier, mono">DOWN
</font></b>
7679 or the cluster daemons are not running.
</td>
7683 <p><i>Editorial comment: many of these service states no longer exist.
</i>
7685 <table BORDER CELLSPACING=
0 CELLPADDING=
3 WIDTH=
"92%" >
7686 <tr ALIGN=CENTER VALIGN=CENTER
BGCOLOR=
"#F8FCF8">
7687 <td WIDTH=
"22%" HEIGHT=
"43"><b>Service Status
</b></td>
7689 <td WIDTH=
"78%" HEIGHT=
"43"><b>Description
</b></td>
7693 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"48"><b><font face=
"Courier New, Courier, mono">running
</font></b></td>
7695 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"48">The service resources
7696 are configured and available on the cluster system that owns the service.
7697 The
<b><font face=
"Courier New, Courier, mono">running
</font></b> state
7698 is a persistent state. From this state, a service can enter the
<b><font face=
"Courier New, Courier, mono">stopping
</font></b>
7699 state (for example, if the preferred member rejoins the cluster), the
<b><font face=
"Courier New, Courier, mono">disabling
</font></b>
7700 state (if a user initiates a request to disable the service), or the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7701 state (if the status of the service resources cannot be determined).
</td>
7705 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"53"><b><font face=
"Courier New, Courier, mono">disabling
</font></b></td>
7707 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"53">The service is in the
7708 process of being disabled (for example, a user has initiated a request
7709 to disable the service). The
<b><font face=
"Courier New, Courier, mono">disabling
</font></b>
7710 state is a transient state. The service remains in the
<b><font face=
"Courier New, Courier, mono">disabling
</font></b>state
7711 until the service disable succeeds or fails. From this state, the service
7712 can enter the
<b><font face=
"Courier New, Courier, mono">disabled
</font></b>
7713 state (if the disable succeeds), the
<b><font face=
"Courier New, Courier, mono">running
</font></b>
7714 state (if the disable fails and the service is restarted), or the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7715 state (if the status of the service resources cannot be determined).
</td>
7719 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"77"><b><font face=
"Courier New, Courier, mono">disabled
</font></b></td>
7721 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"77">The service has been
7722 disabled, and does not have an assigned owner. The
<b><font face=
"Courier New, Courier, mono">disabled
</font></b>
7723 state is a persistent state. From this state, the service can enter the
7724 <b><font face=
"Courier New, Courier, mono">starting
</font></b>
7725 state (if a user initiates a request to start the service), or the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7726 state (if a request to start the service failed and the status of the service
7727 resources cannot be determined).
</td>
7731 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"51"><b><font face=
"Courier New, Courier, mono">starting
</font></b></td>
7733 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"51">The service is in the
7734 process of being started. The
<b><font face=
"Courier New, Courier, mono">starting
</font></b>
7735 state is a transient state. The service remains in the
<b><font face=
"Courier New, Courier, mono">starting
</font></b>
7736 state until the service start succeeds or fails. From this state, the service
7737 can enter the
<b><font face=
"Courier New, Courier, mono">running
</font></b>
7738 state (if the service start succeeds), the
<b><font face=
"Courier New, Courier, mono">stopped
</font></b>
7739 state (if the service stop fails), or the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7740 state (if the status of the service resources cannot be determined).
</td>
7744 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"55"><b><font face=
"Courier New, Courier, mono">stopping
</font></b></td>
7746 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"55">The service is in the
7747 process of being stopped. The
<b><font face=
"Courier New, Courier, mono">stopping
</font></b>
7748 state is a transient state. The service remains in the
<b><font face=
"Courier New, Courier, mono">stopping
</font></b>
7749 state until the service stop succeeds or fails. From this state, the service
7750 can enter the
<b><font face=
"Courier New, Courier, mono">stopped
</font></b>
7751 state (if the service stop succeeds), the
<b><font face=
"Courier New, Courier, mono">running
</font></b>
7752 state (if the service stop failed and the service can be started), or the
7753 <b><font face=
"Courier New, Courier, mono">error
</font></b>
7754 state (if the status of the service resources cannot be determined).
</td>
7758 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"52"><b><font face=
"Courier New, Courier, mono">stopped
</font></b></td>
7760 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"52">The service is not running
7761 on any cluster system, does not have an assigned owner, and does not have
7762 any resources configured on a cluster system. The
<b><font face=
"Courier New, Courier, mono">stopped
</font></b>
7763 state is a persistent state. From this state, the service can enter the
7764 <b><font face=
"Courier New, Courier, mono">disabled
</font></b>
7765 state (if a user initiates a request to disable the service), or the
<b><font face=
"Courier New, Courier, mono">starting
</font></b>
7766 state (if the preferred member joins the cluster).
</td>
7770 <td ALIGN=CENTER VALIGN=CENTER
WIDTH=
"22%" HEIGHT=
"81"><b><font face=
"Courier New, Courier, mono">error
</font></b></td>
7772 <td ALIGN=LEFT VALIGN=TOP
WIDTH=
"78%" HEIGHT=
"81">The status of the service
7773 resources cannot be determined. For example, some resources associated
7774 with the service may still be configured on the cluster system that owns
7775 the service. The
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7776 state is a persistent state. To protect data integrity, you must ensure
7777 that the service resources are no longer configured on a cluster system,
7778 before trying to start or stop a service in the
<b><font face=
"Courier New, Courier, mono">error
</font></b>
7783 <p>To display a snapshot of the current cluster status, invoke the
<b><font face=
"Courier New, Courier, mono">clustat
</font></b>
7784 utility. For example:
7786 <font size=-
1>Thu Jul
20 16:
23:
54 EDT
2000
7787 Cluster Configuration (cluster_1):
7791 Member
Id
System Status
Power Switch
7792 ---------- ------ -------------
------------
7793 stor4
0 Up
Good
7794 stor5
1 Up
Good
7796 Channel status:
7798 Name
Type
Status
7799 -------------------------
---------- --------
7800 stor4
<--> stor5
network
ONLINE
7801 /dev/ttyS1
<--> /dev/ttyS1
serial
OFFLINE
7806 Service
Status
Owner
7807 ----------------
----------
----------------
7808 diskmount
disabled
None
7809 database1
running
stor5
7810 database2
starting
stor4
7811 user_mail
disabling
None
7812 web_home
running
stor4
</font></pre>
7813 <i>Editorial comment: need a more recent screenshot of clustat output above.
</i>
7814 <br>To monitor the cluster and display status at specific time intervals,
7815 invoke
<b><font face=
"Courier New, Courier, mono">clustat
</font></b> with
7816 the
<b><font face=
"Courier New, Courier, mono">-i
<i>time
</i></font></b>
7817 command option, where
<b><i><font face=
"Courier New, Courier, mono">time
</font></i></b>
7818 specifies the number of seconds between status shapshots.
7819 <p><a NAME=
"cluster-start"></a>
7822 5.2 Starting and Stopping the Cluster Software
</h2>
7823 You can start the cluster software on a cluster system by invoking the
7824 <b><font face=
"Courier New, Courier, mono">cluster
7825 start
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
7826 directory. For example:
7827 <pre>#
<b>service cluster start
</b></pre>
7828 You can stop the cluster software on a cluster system by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
7829 stop
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
7830 directory. For example:
7831 <pre>#
<b>service cluster stop
</b></pre>
7832 The previous command willcause the cluster system's services to relocate
7833 over to the other cluster system.
7835 <p><a NAME=
"cluster-config"></a>
7837 5.3 Modifying the Cluster Configuration
</h2>
7838 You may need to modify the cluster configuration. For example, you may
7839 need to correct heartbeat channel or quorum partition entries in the cluster
7840 database, a copy of which is located in the
<b><font face=
"Courier New, Courier, mono">/etc/cluster.conf
</font></b>
7842 <p>You must use the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
7843 utility to modify the cluster configuration. Do not modify the
<b><font face=
"Courier New, Courier, mono">cluster.conf
</font></b>
7844 file. To modify the cluster configuration, stop the cluster software on
7845 one cluster system, as described in
<a href=
"#cluster-start">Starting and
7846 Stopping the Cluster Software
</a>.
7847 <p>Then, invoke the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
7848 utility, and specify the correct information at the prompts. After running
7849 the utility, restart the cluster software.
7850 <p><a NAME=
"cluster-backup"></a>
7852 5.4 Backing Up and Restoring the Cluster Database
</h2>
7853 It is recommended that you regularly back up the cluster database. In addition,
7854 you should back up the database before making any significant changes to
7855 the cluster configuration.
7856 <p>To back up the cluster database to the
<b><font face=
"Courier New, Courier, mono">/etc/cluster.conf.bak
</font></b>
7857 file, invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7858 utility, and specify the
<b><font face=
"Courier New, Courier, mono">cluster
7859 backup
</font></b> command. For example:
7860 <pre>cluadmin
> <b>cluster backup
</b></pre>
7861 You can also save the cluster database to a different file by invoking
7862 the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b> utility
7863 and specifying the
<b><font face=
"Courier New, Courier, mono">cluster saveas
7864 <i>filename
</i></font></b>command.
7865 <p>To restore the cluster database, follow these steps:
7868 Stop the cluster software on one system by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
7869 stop
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
7870 directory. For example:
</li>
7872 <pre>#
<b>/etc/rc.d/init.d/cluster stop
</b></pre>
7873 The previous command may cause the cluster system's services to fail over
7874 to the other cluster system.
7877 On the remaining cluster system, invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7878 utility and restore the cluster database. To restore the database from
7879 the
<b><font face=
"Courier New, Courier, mono">/etc/cluster.conf.bak
</font></b>
7880 file, specify the
<b><font face=
"Courier New, Courier, mono">cluster restore
</font></b>
7881 command. To restore the database from a different file, specify the
<b><font face=
"Courier New, Courier, mono">cluster
7882 restorefrom
<i>file_name
</i></font></b> command.
</li>
7928 <p>The cluster will disable all running services, delete all the services,
7929 and then restore the database.
7932 Restart the cluster software on the stopped system by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
7933 start
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
7934 directory. For example:
</li>
7936 <pre>#
<b>service cluster start
</b></pre>
7939 Restart each cluster service by invoking the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7940 utility on the cluster system on which you want to run the service and
7941 specifying the
<b><font face=
"Courier New, Courier, mono">service enable
7942 <i>service_name
</i></font></b>
7945 <a NAME=
"cluster-logging"></a>
7947 5.5 Modifying Cluster Event Logging
</h2>
7948 You can modify the severity level of the events that are logged by the
7949 <b><font face=
"Courier New, Courier, mono">clupowerd
</font></b>,
7950 <b><font face=
"Courier New, Courier, mono">cluquorumd
</font></b>,
7951 <b><font face=
"Courier New, Courier, mono">cluhbd
</font></b>,
7952 and
<b><font face=
"Courier New, Courier, mono">clusvcmgrd
</font></b> daemons.
7953 You may want the daemons on the cluster systems to log messages at the
7955 <p>To change a cluster daemon's logging level on all the cluster systems,
7956 invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7957 utility, and specify the
<b><font face=
"Courier New, Courier, mono">cluster
7958 loglevel
</font></b> command, the name of the daemon, and the severity level.
7959 You can specify the severity level by using the name or the number that
7960 corresponds to the severity level. The values
0 to
7 refer to the following
7962 <blockquote><b><font face=
"Courier New, Courier, mono">0 - emerg
</font></b>
7963 <br><b><font face=
"Courier New, Courier, mono">1 - alert
</font></b>
7964 <br><b><font face=
"Courier New, Courier, mono">2 - crit
</font></b>
7965 <br><b><font face=
"Courier New, Courier, mono">3 - err
</font></b>
7966 <br><b><font face=
"Courier New, Courier, mono">4 - warning
</font></b>
7967 <br><b><font face=
"Courier New, Courier, mono">5 - notice
</font></b>
7968 <br><b><font face=
"Courier New, Courier, mono">6 - info
</font></b>
7969 <br><b><font face=
"Courier New, Courier, mono">7 - debug
</font></b></blockquote>
7970 Note that the cluster logs messages with the designated severity level
7971 and also messages of a higher severity. For example, if the severity level
7972 for quorum daemon messages is
2 (
<b><font face=
"Courier New, Courier, mono">crit
</font></b>),
7973 then the cluster logs messages or
<b><font face=
"Courier New, Courier, mono">crit
</font></b>,
7974 <b><font face=
"Courier New, Courier, mono">alert
</font></b>,
7975 and
<b><font face=
"Courier New, Courier, mono">emerg
</font></b> severity
7976 levels. Be aware that setting the logging level to a low severity level,
7977 such as
7 (
<b><font face=
"Courier New, Courier, mono">debug
</font></b>),
7978 will result in large log files over time.
7979 <p>The following example enables the
<b><font face=
"Courier New, Courier, mono">cluquorumd
</font></b>
7980 daemon to log messages of all severity levels:
7982 </b>cluadmin
> <b>cluster loglevel cluquorumd
7
7986 <a NAME=
"cluster-reinstall"></a>
7988 5.6 Updating the Cluster Software
</h2>
7989 You can update the cluster software, but preserve the existing cluster
7990 database. Updating the cluster software on a system can take from
10 to
7991 20 minutes, depending on whether you must rebuild the kernel.
7992 <p>To update the cluster software while minimizing service downtime, follow
7996 On a cluster system that you want to update, run the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
7997 utility and back up the current cluster database. For example:
</li>
7999 <pre>cluadmin
> <b>cluster backup
</b></pre>
8002 Stop the cluster software on the first cluster system that you want to
8003 update, by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
8004 stop
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8005 directory. For example:
</li>
8007 <pre>#
<b>service cluster stop
</b></pre>
8010 Install the latest cluster software on the first cluster system that you
8011 want to update, by following the instructions described in
<a href=
"#software-steps">Steps
8012 for Installing and Initializing the Cluster Software.
</a> However, when
8013 prompted by the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8014 utility whether to use the existing cluster database, specify
<b><font face=
"Courier New, Courier, mono">yes
</font></b>.
</li>
8018 Stop the cluster software on the second cluster system that you want to
8019 update, by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
8020 stop
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8021 directory. At this point, no services are available.
</li>
8025 Start the cluster software on the first updated cluster system by invoking
8026 the
<b><font face=
"Courier New, Courier, mono">cluster start
</font></b>
8027 command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8028 directory. At this point, services may become available.
</li>
8032 Install the latest cluster software on the second cluster system that you
8033 want to update, by following the instructions described in
<a href=
"#software-steps">Steps
8034 for Installing and Initializing the Cluster Software.
</a> When prompted
8035 by the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8036 utility whether to use the existing cluster database, specify
<b><font face=
"Courier New, Courier, mono">yes
</font></b>.
</li>
8040 Start the cluster software on the second updated cluster system, by invoking
8041 the
<b><font face=
"Courier New, Courier, mono">cluster start
</font></b>
8042 command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8043 directory. For example:
<b>service cluster start
</b></li>
8045 <a NAME=
"cluster-reload"></a>
8047 5.7 Reloading the Cluster Database
</h2>
8048 Invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
8049 utility and use the
<b><font face=
"Courier New, Courier, mono">cluster
8050 reload
</font></b>command to force the cluster to re-read the cluster database.
8052 <pre>cluadmin
> <b>cluster reload
</b></pre>
8059 <p><a NAME=
"cluster-name"></a>
8061 5.8 Changing the Cluster Name
</h2>
8062 Invoke the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
8063 utility and use the
<b><font face=
"Courier New, Courier, mono">cluster
8064 name
</font></b> <b><i><font face=
"Courier New, Courier, mono">cluster_name
</font></i></b>
8065 command to specify a name for the cluster. The cluster name is used in
8066 the display of the
<b><font face=
"Courier New, Courier, mono">clustat
</font></b>command.
8068 <pre>cluadmin
> <b>cluster name Accounting Team Fileserver
8069 Accounting Team Fileserver
</b></pre>
8071 <p><br><a NAME=
"cluster-init"></a>
8073 5.9 Reinitializing the Cluster
</h2>
8074 In rare circumstances, you may want to reinitialize the cluster systems,
8075 services, and database. Be sure to back up the cluster database before
8076 reinitializing the cluster. See
<a href=
"#cluster-backup">Backing Up and
8077 Restoring the Cluster Database
</a> for information.
8078 <p>To completely reinitialize the cluster, follow these steps:
8081 Disable all the running cluster services.
</li>
8085 Stop the cluster daemons on both cluster systems by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
8086 stop
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8087 directory on both cluster systems. For example:
</li>
8089 <pre>#
<b>service cluster stop
</b></pre>
8092 Install the cluster software on both cluster systems. See
<a href=
"#software-steps">Steps
8093 for Installing and Initializing the Cluster Software
</a> for information.
</li>
8097 On one cluster system, run the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8098 utility. When prompted whether to use the existing cluster database, specify
8099 no.
This will delete any state information and cluster database from
8100 the quorum partitions.
</li>
8104 After
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b> completes,
8105 follow the utility's instruction to run the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8106 command on the other cluster system. For example:
</li>
8108 <pre>#
<b>/sbin/cluconfig --init=/dev/raw/raw1
</b></pre>
8111 Start the cluster daemons by invoking the
<b><font face=
"Courier New, Courier, mono">cluster
8112 start
</font></b>command located in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8113 directory on both cluster systems. For example:
</li>
8115 <pre>#
<b>service cluster start
</b></pre>
8117 <a NAME=
"cluster-remove"></a>
8119 5.10 Removing a Cluster Member
</h2>
8120 In some cases, you may want to temporarily remove a member system from
8121 the cluster. For example, if a cluster system experiences a hardware failure,
8122 you may want to reboot the system, but prevent it from rejoining the cluster,
8123 in order to perform maintenance on the system.
8124 <p>If you are running a Red Hat distribution, use the
<b><font face=
"Courier New, Courier, mono">chkconfig
</font></b>
8125 utility to be able to boot a cluster system, without allowing it to rejoin
8126 the cluster. For example:
8127 <pre>#
<b>chkconfig --del cluster
</b></pre>
8128 When you want the system to rejoin the cluster, use the following command:
8129 <pre>#
<b>chkconfig --add cluster
</b></pre>
8130 You can then reboot the system or run the cluster start command located
8131 in the System V
<b><font face=
"Courier New, Courier, mono">init
</font></b>
8132 directory. For example:
8133 <pre>#
<b>service cluster start
</b></pre>
8135 <p><br><a NAME=
"diagnose"></a>
8137 5.11 Diagnosing and Correcting Problems in a Cluster
</h2>
8138 To ensure that you can identify any problems in a cluster, you must enable
8139 event logging. In addition, if you encounter problems in a cluster, be
8140 sure to set the severity level to
<b><font face=
"Courier New, Courier, mono">debug
</font></b>
8141 for the cluster daemons. This will log descriptive messages that may help
8142 you solve problems. Once you have resolved any problems, you should reset
8143 the debug level back down to its default value of
<b>info
</b> to avoid
8144 excessively large log message files from generating.
8145 <p>If you have problems while running the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b>
8146 utility (for example, you cannot enable a service), set the severity level
8147 for the
<b><font face=
"Courier New, Courier, mono">clusvcmgrd
</font></b>
8148 daemon to
<b><font face=
"Courier New, Courier, mono">debug
</font></b>.
8149 This will cause debugging messages to be displayed while you are running
8150 the
<b><font face=
"Courier New, Courier, mono">cluadmin
</font></b> utility.
8151 See
<a href=
"#cluster-logging">Modifying Cluster Event Logging
</a> for
8153 <p>Use the following table to diagnose and correct problems in a cluster.
8155 <table BORDER CELLSPACING=
0 CELLPADDING=
3 WIDTH=
"95%" >
8156 <tr ALIGN=CENTER VALIGN=CENTER
>
8157 <td WIDTH=
"19%" HEIGHT=
"42">
8158 <center><b><font size=+
1>Problem
</font></b></center>
8161 <td WIDTH=
"19%" HEIGHT=
"42">
8162 <center><b><font size=+
1>Symptom
</font></b></center>
8165 <td WIDTH=
"62%" HEIGHT=
"42">
8166 <center><b><font size=+
1>Solution
</font></b></center>
8170 <tr ALIGN=LEFT VALIGN=TOP
>
8171 <td WIDTH=
"19%">SCSI bus not terminated
</td>
8173 <td WIDTH=
"19%">SCSI errors appear in the log file
</td>
8175 <td WIDTH=
"62%">Each SCSI bus must be terminated only at the beginning
8176 and end of the bus. Depending on the bus configuration, you may need to
8177 enable or disable termination in host bus adapters, RAID controllers, and
8178 storage enclosures. If you want to support hot plugging, you must use external
8179 termination to terminate a SCSI bus.
8180 <p>In addition, be sure that no devices are connected to a SCSI bus using
8181 a stub that is longer than
0.1 meter.
8182 <p>See
<a href=
"#hardware-storage">Configuring Shared Disk Storage
</a>
8183 and
<a href=
"#scsi-term">SCSI Bus Termination
</a> for information about
8184 terminating different types of SCSI buses.
</td>
8187 <tr ALIGN=LEFT VALIGN=TOP
>
8188 <td WIDTH=
"19%">SCSI bus length greater than maximum limit
</td>
8190 <td WIDTH=
"19%">SCSI errors appear in the log file
</td>
8192 <td WIDTH=
"62%">Each type of SCSI bus must adhere to restrictions on length,
8193 as described in
<a href=
"#scsi-length">SCSI Bus Length
</a>.
8194 <p>In addition, ensure that no single-ended devices are connected to the
8195 LVD SCSI bus, because this will cause the entire bus to revert to a single-ended
8196 bus, which has more severe length restrictions than a differential bus.
</td>
8199 <tr ALIGN=LEFT VALIGN=TOP
>
8200 <td WIDTH=
"19%">SCSI identification numbers not unique
</td>
8202 <td WIDTH=
"19%">SCSI errors appear in the log file
</td>
8204 <td WIDTH=
"62%">Each device on a SCSI bus must have a unique identification
8205 number. If you have a multi-initiator SCSI bus, you must modify the default
8206 SCSI identification number (
7) for one of the host bust adapters connected
8207 to the bus, and ensure that all disk devices have unique identification
8208 numbers. See
<a href=
"#scsi-ids">SCSI Identification Numbers
</a> for more
8209 information.
</td>
8212 <tr ALIGN=LEFT VALIGN=TOP
>
8213 <td WIDTH=
"19%">SCSI commands timing out before completion
</td>
8215 <td WIDTH=
"19%">SCSI errors appear in the log file
</td>
8217 <td WIDTH=
"62%">The prioritized arbitration scheme on a SCSI bus can result
8218 in low-priority devices being locked out for some period of time. This
8219 may cause commands to time out, if a low-priority storage device, such
8220 as a disk, is unable to win arbitration and complete a command that a host
8221 has queued to it. For some workloads, you may be able to avoid this problem
8222 by assigning low-priority SCSI identification numbers to the host bus adapters.
8223 <p>See
<a href=
"#scsi-ids">SCSI Identification Numbers
</a> for more information.
</td>
8226 <tr ALIGN=LEFT VALIGN=TOP
>
8227 <td WIDTH=
"19%">Mounted quorum partition
</td>
8229 <td WIDTH=
"19%">Messages indicating checksum errors on a quorum partition
8230 appear in the log file
</td>
8232 <td WIDTH=
"62%">Be sure that the quorum partition raw devices are used
8233 only for cluster state information. They cannot be used for cluster services
8234 or for non-cluster purposes, and cannot contain a file system. See
<a href=
"#state-partitions">Configuring
8235 the Quorum Partitions
</a> for more information.
8236 <p>These messages could also indicate that the underlying block device
8237 special file for the quorum partition has been erroneously used for non-cluster
8238 purposes.
</td>
8241 <tr ALIGN=LEFT VALIGN=TOP
>
8242 <td WIDTH=
"19%" HEIGHT=
"111">Service file system is unclean
</td>
8244 <td WIDTH=
"19%" HEIGHT=
"111">A disabled service cannot be enabled
</td>
8246 <td WIDTH=
"62%" HEIGHT=
"111">Manually run a checking program such as
<b><font face=
"Courier New, Courier, mono">fsck
</font></b>.
8247 Then, enable the service.
8248 <p>Note that the cluster infrastructure does by defaule run
<b>fsck
</b>with
8249 the
<b>-p
</b> option to automatically repair file system inconsistencies.
8250 For particularly egregious error types you may be required to manually
8251 initiate filesystem repair options.
</td>
8254 <tr ALIGN=LEFT VALIGN=TOP
>
8255 <td WIDTH=
"19%">Quorum partitions not set up correctly
</td>
8257 <td WIDTH=
"19%">Messages indicating that a quorum partition cannot be accessed
8258 appear in the log file
</td>
8260 <td WIDTH=
"62%">Run the
<b><font face=
"Courier New, Courier, mono">cludiskutil
8261 -t
</font></b>command to check that the quorum partitions are accessible.
8262 If the command succeeds, run the
<b><font face=
"Courier New, Courier, mono">cludiskutil
8263 -p
</font></b> command on both cluster systems. If the output is different
8264 on the systems, the quorum partitions do not point to the same devices
8265 on both systems. Check to make sure that the raw devices exist and are
8266 correctly specified in the
<b><font face=
"Courier New, Courier, mono">/etc/sysconfig/rawdevices
</font></b>
8267 file. See
<a href=
"#state-partitions">Configuring the Quorum Partitions
</a>
8268 for more information.
8269 <p>These messages could also indicate that you did not specify
<b><font face=
"Courier New, Courier, mono">yes
</font></b>
8270 when prompted by the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8271 utility to initialize the quorum partitions. To correct this problem, run
8272 the utility again.
</td>
8275 <tr ALIGN=LEFT VALIGN=TOP
>
8276 <td WIDTH=
"19%" HEIGHT=
"87">Cluster service operation fails
</td>
8278 <td WIDTH=
"19%" HEIGHT=
"87">Messages indicating the operation failed appear
8279 on the console or in the log file
</td>
8281 <td WIDTH=
"62%" HEIGHT=
"87">There are many different reasons for the failure
8282 of a service operation (for example, a service stop or start). To help
8283 you identify the cause of the problem, set the severity level for the cluster
8284 daemons to
<b><font face=
"Courier New, Courier, mono">debug
</font></b>
8285 in order to log descriptive messages. Then, retry the operation and examine
8286 the log file. See
<a href=
"#cluster-logging">Modifying Cluster Event Logging
</a>
8287 for more information.
</td>
8290 <tr ALIGN=LEFT VALIGN=TOP
>
8291 <td WIDTH=
"19%" HEIGHT=
"151">Cluster service stop fails because a file
8292 system cannot be unmounted
</td>
8294 <td WIDTH=
"19%" HEIGHT=
"151">Messages indicating the operation failed appear
8295 on the console or in the log file
</td>
8297 <td WIDTH=
"62%" HEIGHT=
"151">Use the
<b><font face=
"Courier New, Courier, mono">fuser
</font></b>
8298 and
<b><font face=
"Courier New, Courier, mono">ps
</font></b> commands to
8299 identify the processes that are accessing the file system. Use the
<b><font face=
"Courier New, Courier, mono">kill
</font></b>
8300 command to stop the processes. You can also use the
<b><font face=
"Courier New, Courier, mono">lsof
8301 -t
<i>file_system
</i></font></b> command to display the identification
8302 numbers for the processes that are accessing the specified file system.
8303 You can pipe the output to the
<b><font face=
"Courier New, Courier, mono">kill
</font></b>
8305 <p>To avoid this problem, be sure that only cluster-related processes can
8306 access shared storage data. In addition, you may want to modify the service
8307 and enable forced unmount for the file system. This enables the cluster
8308 service to unmount a file system even if it is being accessed by an application
8312 <tr ALIGN=LEFT VALIGN=TOP
>
8313 <td WIDTH=
"19%" HEIGHT=
"71">Incorrect entry in the cluster database
</td>
8315 <td WIDTH=
"19%" HEIGHT=
"71">Cluster operation is impaired
</td>
8317 <td WIDTH=
"62%" HEIGHT=
"71">The
<b>cluadmin
</b>utility can be used to
8318 examine and modify service configuration.
Additionally, the
<b>cluconfig
8320 is used to modify cluster parameters.
</td>
8323 <tr ALIGN=LEFT VALIGN=TOP
>
8324 <td WIDTH=
"19%" HEIGHT=
"265">Incorrect Ethernet heartbeat entry in the
8325 cluster database or
<b><font face=
"Courier New, Courier, mono">/etc/hosts
</font></b>
8328 <td WIDTH=
"19%" HEIGHT=
"265">Cluster status indicates that a Ethernet heartbeat
8329 channel is
<b><font face=
"Courier New, Courier, mono">OFFLINE
</font></b>
8330 even though the interface is valid
</td>
8332 <td WIDTH=
"62%" HEIGHT=
"265">You can examine and, modify the cluster configuration
8333 by running the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8334 utility, as specified in
<a href=
"#cluster-config">Modifying the Cluster
8335 Configuration
</a>, and correct the problem.
8336 <p>In addition, be sure that you can use the
<b><font face=
"Courier New, Courier, mono">ping
</font></b>
8337 command to send a packet to all the network interfaces used in the cluster.
</td>
8340 <tr ALIGN=LEFT VALIGN=TOP
>
8341 <td WIDTH=
"19%" HEIGHT=
"50">Loose cable connection to power switch
</td>
8343 <td WIDTH=
"19%" HEIGHT=
"50">Power switch status is
<b><font face=
"Courier New, Courier, mono">Timeout
</font></b></td>
8345 <td WIDTH=
"62%" HEIGHT=
"50">Check the serial cable connection.
</td>
8348 <tr ALIGN=LEFT VALIGN=TOP
>
8349 <td WIDTH=
"19%" HEIGHT=
"95">Power switch serial port incorrectly specified
8350 in the cluster database
</td>
8352 <td WIDTH=
"19%" HEIGHT=
"95">Power switch status indicates a problem
</td>
8354 <td WIDTH=
"62%" HEIGHT=
"95">You can examine the current settings and modify
8355 the cluster configuration by running the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8356 utility, as specified in
<a href=
"#cluster-config">Modifying the Cluster
8357 Configuration
</a>, and correct the problem.
</td>
8360 <tr ALIGN=LEFT VALIGN=TOP
>
8361 <td WIDTH=
"19%" HEIGHT=
"50">Heartbeat channel problem
</td>
8363 <td WIDTH=
"19%" HEIGHT=
"50">Heartbeat channel status is
<b><font face=
"Courier New, Courier, mono">OFFLINE
</font></b></td>
8365 <td WIDTH=
"62%" HEIGHT=
"50">You can examine the current settings and modify
8366 the cluster configuration by running the
<b><font face=
"Courier New, Courier, mono">cluconfig
</font></b>
8367 utility, as specified in
<a href=
"#cluster-config">Modifying the Cluster
8368 Configuration
</a>, and correct the problem.
8369 <p>Verify that the correct type of cable is used for each heartbeat channel
8371 <p>Verify that you can
"ping" each cluster system over the network interface
8372 for each Ethernet heartbeat channel.
</td>
8377 <hr noshade
width=
"80%"><a NAME=
"supplement"></a>
8379 A Supplementary Hardware Information
</h1>
8380 The information in the following sections can help you set up a cluster
8381 hardware configuration. In some cases, the information is vendor specific.
8384 <a href=
"#rps-10">Setting Up an RPS-
10 Power Switch
</a></li>
8387 <a href=
"#scsi-reqs">SCSI Bus Configuration Requirements
</a></li>
8390 <a href=
"#hba">Host Bus Adapter Features and Configuration Requirements
</a></li>
8393 <a href=
"#adaptec">Adaptec Host Bus Adapter Requirement
</a></li>
8398 <p><a NAME=
"power-setup"></a>
8400 A
.2 Setting Up
Power Switches
</h2>
8403 <a NAME=
"rps-10"></a></h3>
8406 Setting up RPS-
10 Power Switches
</h3>
8407 If you are using an RPS-
10 Series power switch in your cluster, you must:
8410 Set the rotary address on both power switches to
0. Be sure that the switch
8411 is positioned correctly and is not between settings.
</li>
8415 Toggle the four SetUp switches on both power switches, as follows:
</li>
8418 <table BORDER CELLSPACING=
0 CELLPADDING=
3 WIDTH=
"49%" >
8420 <td WIDTH=
"15%"><b>Switch
</b></td>
8422 <td WIDTH=
"28%"><b>Function
</b></td>
8424 <td WIDTH=
"25%"><b>Up Position
</b></td>
8426 <td WIDTH=
"32%"><b>Down Position
</b></td>
8430 <td ALIGN=CENTER
WIDTH=
"15%" HEIGHT=
"24">1</td>
8432 <td WIDTH=
"28%" HEIGHT=
"24">Data rate
</td>
8434 <td ALIGN=CENTER
WIDTH=
"25%" HEIGHT=
"24"> </td>
8436 <td ALIGN=CENTER
WIDTH=
"32%" HEIGHT=
"24">X
</td>
8440 <td ALIGN=CENTER
WIDTH=
"15%">2</td>
8442 <td WIDTH=
"28%">Toggle delay
</td>
8444 <td ALIGN=CENTER
WIDTH=
"25%"> </td>
8446 <td ALIGN=CENTER
WIDTH=
"32%">X
</td>
8450 <td ALIGN=CENTER
WIDTH=
"15%">3</td>
8452 <td WIDTH=
"28%">Power up default
</td>
8454 <td ALIGN=CENTER
WIDTH=
"25%">X
</td>
8456 <td ALIGN=CENTER
WIDTH=
"32%"> </td>
8460 <td ALIGN=CENTER
WIDTH=
"15%">4</td>
8462 <td WIDTH=
"28%">Unused
</td>
8464 <td ALIGN=CENTER
WIDTH=
"25%"> </td>
8466 <td ALIGN=CENTER
WIDTH=
"32%">X
</td>
8473 Ensure that the serial port device special file (for example,
<b><font face=
"Courier New, Courier, mono">/dev/ttyS1
</font></b>)
8474 that is specified in the
<b><font face=
"Courier New, Courier, mono">/etc/cluster.conf
</font></b>
8475 file corresponds to the serial port to which the power switch's serial
8476 cable is connected.
</li>
8480 Connect the power cable for each cluster system to its own power switch.
</li>
8484 Use null modem cables to connect each cluster system to the serial port
8485 on the power switch that provides power to the other cluster system.
</li>
8487 The following figure shows an example of an RPS-
10 Series power switch
8490 RPS-
10 Power Switch Hardware Configuration
</h4>
8491 <img SRC=
"powerswitch.gif" height=
259 width=
360>
8492 <p>See the RPS-
10 documentation supplied by the vendor for additional installation
8493 information. Note that the information provided in this document supersedes
8494 the vendor information.
8497 <a NAME=
"power-wti-nps"></a></h3>
8500 Setting up WTI NPS Power Switches
</h3>
8501 The WTI NPS-
115 and NPS-
230 power switch is a network attached device.
8502 Essentially it is a power strip with network connectivity enabling power
8503 cycling of individual outlets.
Only
1 NPS is needed within the cluster
8504 (unlike the RPS-
10 model where a separate switch per cluster member is
8506 <p>Since there is no independent means whereby the cluster software can
8507 verify that you have plugged each cluster member system into the appropriate
8508 plug on the back of the NPS power switch, please take care to ensure correct
8509 setup.
Failure to do so will cause the cluster software to incorrectly
8510 conclude a successful power cycle has occurred.
8511 <p>When setting up the NPS switch the following configuration guidelines
8513 <p>When configuring the power switch itself:
8516 You must assign a
"System Password" (under the
"General Parameters"menu).
8517 Note: this password is stored in clear text in the cluster configuration
8518 file, so choose a password which differs from your system's password.
8519 (Although, the file permissions for that file /etc/cluster.conf are only
8520 readable by root.)
</li>
8523 Do not assign a password under the
"Plug Parameters".
</li>
8526 Assign system names to the Plug Parameters, (eg
<i>clu1
</i> to plug
1,
<i>clu2
</i>
8527 to plug
2 - assuming these are the cluster member names).
</li>
8530 <p><br>When running
<b>cluconfig
</b> to specify power switch parameters:
8534 Specify a switch type of wti_nps.
</li>
8537 Specify the password you assigned to the NPS switch (ref step
1 in prior
8541 When prompted for the plug/port number, specify the same name as assigned
8542 in step
3 in prior section.
</li>
8544 Note: we have observed that the NPS power switch may become unresponsive
8545 when placed on networks which have high occurances of broadcast or multicast
8546 packets.
In these cases you may have to isolate the power switch
8547 to a private subnet.
8548 <p>The NPS-
115 power switch is has a very useful feature which can accommodate
8549 power cycling cluster members with dual power supplies.
The NPS-
115
8550 consists of
2 banks of power outlets, each of which is independently powered
8551 and has
4 plugs.
Each power plug of the NPS-
115 gets plugged into
8552 a separate power source (presumably a separate UPS).
For cluster
8553 members with dual power supplies, you plug their power cords into an outlet
8554 in each bank.
Then when you configure the NPS-
115 and assign ports,
8555 simply assign the same name to outlets in each bank that you have plugged
8556 the corresponding cluster member into.
For example, suppose the cluster
8557 members were clu3 and clu4, where clu3 is plugged into outlets
1 and
5,
8558 and clu4 is plugged into outlets
2 and
6:
8559 <p>Plug | Name
8560 | Status
| Boot Delay | Password
8562 <br>-----+------------------+---------+------------+------------------+---------+
8563 <br> 1 | clu3
8564 |
ON
|
5 sec
8565 | (undefined)
|
ON
8567 <br> 2 | clu4
8568 |
ON
|
5 sec
8569 | (undefined)
|
ON
8571 <br> 3 | (undefined)
|
8572 ON
|
5 sec
| (undefined)
8573 |
ON
|
8574 <br> 4 | (undefined)
|
8575 ON
|
5 sec
| (undefined)
8576 |
ON
|
8577 <br> 5 | clu3
8578 |
ON
|
5 sec
8579 | (undefined)
|
ON
8581 <br> 6 | clu4
8582 |
ON
|
5 sec
8583 | (undefined)
|
ON
8585 <br> 7 | (undefined)
|
8586 ON
|
5 sec
| (undefined)
8587 |
ON
|
8588 <br> 8 | (undefined)
|
8589 ON
|
5 sec
| (undefined)
8590 |
ON
|
8591 <br>-----+------------------+---------+------------+------------------+---------+
8592 <p>By specifying the same name to multiple outlets, in response to a power
8593 cycle command, all outlets with the same name will be power cycled.
8594 In this manner, a cluster member with dual power supplies can be successfully
8595 power cycled.
Under this dual configuration, the parameters specified
8596 to
<b>cluconfig
</b>are the same as the single configuration described
8599 Setting up Baytech Power Switches
</h3>
8600 The following information pertains to the RPC-
3 and PRC-
5 power switches.
8601 <p>The Baytech power switch is a network attached device.
Essentially
8602 it is a power strip with network connectivity enabling power cycling of
8603 individual outlets.
Only
1 Baytech switch is needed within the cluster
8604 (unlike the RPS-
10 model where a separate switch per cluster member is
8606 <p>Since there is no independent means whereby the cluster software can
8607 verify that you have plugged each cluster member system into the appropriate
8608 plug on the back of the Baytech power switch, please take care to ensure
8609 correct setup.
Failure to do so will cause the cluster software to
8610 incorrectly conclude a successful power cycle has occurred.
8611 <p>When setting up the Baytech switch the following configuration guidelines
8613 <p>When configuring the Baytech power switch itself:
8616 Usinga serial connection, assign the IP address related parameters.
</li>
8619 You must assign a user name and password (under the
"Manage Users"menu).
8620 Note: this password is stored in clear text in the cluster configuration
8621 file, so choose a password which differs from your system's password.
8622 (Although, the file permissions for that file /etc/cluster.conf are only
8623 readable by root.)
</li>
8626 To assign system names to the corresponding outlets, go to the
"Configuration"
8627 menu, followed by the
"Outlets" menu., (eg
<i>clu1
</i> to outlet
1,
<i>clu2
</i>
8628 to outlet
2 - assuming these are the cluster member names).
</li>
8631 <p><br>When running
<b>cluconfig
</b> to specify power switch parameters:
8635 Specify a switch type of baytech.
</li>
8638 Specify the password you assigned to the Baytech switch (ref step
2 in
8639 prior section).
</li>
8642 When prompted for the plug/port number, specify the same name as assigned
8643 in step
3 in prior section.
</li>
8647 <a NAME=
"power-other"></a></h3>
8650 Other Network Power Switches
</h3>
8651 The cluster software includes support for a range of power switch types.
8652 This range of power switch module support originated from developers at
8653 Mission Critical Linux, Inc. and as part of the open source Linux-HA project.
8654 Time and hardware resource constraints did not allow us to fully test the
8655 complete range of switch types.
As such the associated power switch
8656 STONITH modules are considered latent features.
Examples of these
8657 other power switch modules include:
8660 APC Master Switch,
<a href=
"http://www.apc.com">www.apc.com
</a>
8661 Note: we have observed that the Master Switch may become unresponsive when
8662 placed on networks which have high occurances of broadcast or multicast
8663 packets.
In these cases you may have to isolate the power switch
8664 to a private subnet.
</li>
8667 APC Serial On/Off Switch (partAP9211),
<a href=
"http://www.apc.com">www.apc.com
</a>
8668 Note: this switch type does not provide a means for the cluster to query
8669 its status.
Therefore the cluster always assumes it it connected
8670 and operational.
</li>
8673 Baytech RPC-
3 and RPC-
5,
<a href=
"http://www.baytech.net">www.baytech.net
</a>
8674 Note: this power switch performs well on networks which include high frequency
8675 of broadcast and multicast packets.
</li>
8678 <p><br><a NAME=
"scsi-reqs"></a>
8680 A
.3 SCSI Bus Configuration Requirements
</h2>
8681 SCSI buses must adhere to a number of configuration requirements in order
8682 to operate correctly. Failure to adhere to these requirements will adversely
8683 affect cluster operation and application and data availability.
8684 <p>You must adhere to the following
<b>SCSI bus configuration requirements
</b>:
8687 Buses must be terminated at each end. In addition, how you terminate a
8688 SCSI bus affects whether you can use hot plugging. See
<a href=
"#scsi-term">SCSI
8689 Bus Termination
</a> for more information.
</li>
8693 TERMPWR (terminator power) must by provided by the host bus adapters connected
8694 to a bus. See
<a href=
"#scsi-term">SCSI Bus Termination
</a> for more information.
</li>
8698 Active SCSI terminators must be used in a multi-initiator bus. See
<a href=
"#scsi-term">SCSI
8699 Bus Termination
</a> for more information.
</li>
8703 Buses must not extend beyond the maximum length restriction for the bus
8704 type. Internal cabling must be included in the length of the SCSI bus.
8705 See
<a href=
"#scsi-length">SCSI Bus Length
</a> for more information.
</li>
8709 All devices (host bus adapters and disks) on a bus must have unique SCSI
8710 identification numbers. See
<a href=
"#scsi-ids">SCSI Identification Numbers
</a>
8711 for more information.
</li>
8715 The Linux device name for each shared SCSI device must be the same on each
8716 cluster system. For example, a device named
<b><font face=
"Courier New, Courier, mono">/dev/sdc
</font></b>
8717 on one cluster system must be named
<b><font face=
"Courier New, Courier, mono">/dev/sdc
</font></b>
8718 on the other cluster system. You can usually ensure that devices are named
8719 the same by using identical hardware for both cluster systems.
</li>
8723 Bus resets must be disabled for the host bus adapters used in a cluster.
</li>
8725 To set SCSI identification numbers, disable host bus adapter termination,
8726 and disable bus resets, use the system's configuration utility. When the
8727 system boots, a message is displayed describing how to start the utility.
8728 For example, you may be instructed to press Ctrl-A, and follow the prompts
8729 to perform a particular task. To set storage enclosure and RAID controller
8730 termination, see the vendor documentation. See
<a href=
"#scsi-term">SCSI
8731 Bus Termination
</a> and
<a href=
"#scsi-ids">SCSI Identification Numbers
</a>
8732 for more information.
8733 <p>See
<a href=
"http://www.scsita.org" target=
"_blank">www.scsita.org
</a>
8734 and the following sections for detailed information about SCSI bus requirements.
8736 <p><a NAME=
"scsi-term"></a>
8738 A
.3.1 SCSI Bus Termination
</h3>
8739 A SCSI bus is an electrical path between two terminators. A device (host
8740 bus adapter, RAID controller, or disk) attaches to a SCSI bus by a short
8742 which is an unterminated bus segment that usually must be less than
0.1
8744 <p>Buses must have only two terminators located at the ends of the bus.
8745 Additional terminators, terminators that are not at the ends of the bus,
8746 or long stubs will cause the bus to operate incorrectly. Termination for
8747 a SCSI bus can be provided by the devices connected to the bus or by external
8748 terminators, if the internal (onboard) device termination can be disabled.
8749 <p>Terminators are powered by a SCSI power distribution wire (or signal),
8750 TERMPWR, so that the terminator can operate as long as there is one powering
8751 device on the bus. In a cluster, TERMPWR must be provided by the host bus
8752 adapters, instead of the disks in the enclosure. You can usually disable
8753 TERMPWR in a disk by setting a jumper on the drive. See the disk drive
8754 documentation for information.
8755 <p>In addition, there are two types of SCSI terminators. Active terminators
8756 provide a voltage regulator for TERMPWR, while passive terminators provide
8757 a resistor network between TERMPWR and ground. Passive terminators are
8758 also susceptible to fluctuations in TERMPWR. Therefore, it is recommended
8759 that you use active terminators in a cluster.
8760 <p>For maintenance purposes, it is desirable for a storage configuration
8761 to support hot plugging (that is, the ability to disconnect a host bus
8762 adapter from a SCSI bus, while maintaining bus termination and operation).
8763 However, if you have a single-initiator SCSI bus, hot plugging is not necessary
8764 because the private bus does not need to remain operational when you remove
8765 a host. See
<a href=
"#multiinit">Setting Up a Multi-Initiator SCSI Bus
8766 Configuration
</a> for examples of hot plugging configurations.
8767 <p>If you have a multi-initiator SCSI bus, you must adhere to the following
8768 requirements for hot plugging:
8771 SCSI devices, terminators, and cables must adhere to stringent hot plugging
8772 requirements described in the latest SCSI specifications described in SCSI
8773 Parallel Interface-
3 (SPI-
3), Annex D. You can obtain this document from
<a href=
"http://www.t10.org" target=
"_blank">www.t10.org
</a>.
</li>
8777 Internal host bus adapter termination must be disabled. Not all adapters
8778 support this feature.
</li>
8782 If a host bus adapter is at the end of the SCSI bus, an external terminator
8783 must provide the bus termination.
</li>
8787 The stub that is used to connect a host bus adapter to a SCSI bus must
8788 be less than
0.1 meter in length. Host bus adapters that use a long cable
8789 inside the system enclosure to connect to the bulkhead cannot support hot
8790 plugging. In addition, host bus adapters that have an internal connector
8791 and a cable that extends the bus inside the system enclosure cannot support
8792 hot plugging. Note that any internal cable must be included in the length
8793 of the SCSI bus.
</li>
8795 When disconnecting a device from a single-initiator SCSI bus or from a
8796 multi-initiator SCSI bus that supports hot plugging, follow these guidelines:
8799 Unterminated SCSI cables must not be connected to an operational host bus
8800 adapter or storage device.
</li>
8804 Connector pins must not bend or touch an electrical conductor while the
8805 SCSI cable is disconnected.
</li>
8809 To disconnect a host bus adapter from a single-initiator bus, you must
8810 disconnect the SCSI cable first from the RAID controller and then from
8811 the adapter. This ensures that the RAID controller is not exposed to any
8812 erroneous input.
</li>
8816 Protect connector pins from electrostatic discharge while the SCSI cable
8817 is disconnected by wearing a grounded anti-static wrist guard and physically
8818 protecting the cable ends from contact with other objects.
</li>
8822 Do not remove a device that is currently participating in any SCSI bus
8825 To enable or disable an adapter's internal termination, use the system
8826 BIOS utility. When the system boots, a message is displayed describing
8827 how to start the utility. For example, you may be instructed to press Ctrl-A.
8828 Follow the prompts for setting the termination. At this point, you can
8829 also set the SCSI identification number, as needed, and disable SCSI bus
8830 resets. See
<a href=
"#scsi-ids">SCSI Identification Numbers
</a> for more
8832 <p>To set storage enclosure and RAID controller termination, see the vendor
8836 <p><a NAME=
"scsi-length"></a>
8838 A
.3.2 SCSI Bus Length
</h3>
8839 A SCSI bus must adhere to length restrictions for the bus type. Buses that
8840 do not adhere to these restrictions will not operate properly. The length
8841 of a SCSI bus is calculated from one terminated end to the other, and must
8842 include any cabling that exists inside the system or storage enclosures.
8843 <p>A cluster supports LVD (low voltage differential) buses. The maximum
8844 length of a single-initiator LVD bus is
25 meters. The maximum length of
8845 a multi-initiator LVD bus is
12 meters. According to the SCSI standard,
8846 a single-initiator LVD bus is a bus that is connected to only two devices,
8847 each within
0.1 meter from a terminator. All other buses are defined as
8848 multi-initiator buses.
8849 <p>Do not connect any single-ended devices to a LVD bus, or the bus will
8850 convert to a single-ended bus, which has a much shorter maximum length
8851 than a differential bus.
8854 <p><a NAME=
"scsi-ids"></a>
8856 A
.3.3 SCSI Identification Numbers
</h3>
8857 Each device on a SCSI bus must have a unique SCSI identification number.
8858 Devices include host bus adapters, RAID controllers, and disks.
8859 <p>The number of devices on a SCSI bus depends on the data path for the
8860 bus. A cluster supports wide SCSI buses, which have a
16-bit data path
8861 and support a maximum of
16 devices. Therefore, there are sixteen possible
8862 SCSI identification numbers that you can assign to the devices on a bus.
8863 <p>In addition, SCSI identification numbers are prioritized. Use the following
8864 priority order to assign SCSI identification numbers:
8865 <p>7 -
6 -
5 -
4 -
3 -
2 -
1 -
0 -
15 -
14 -
13 -
12 -
11 -
10 -
9 -
8
8866 <p>The previous order specifies that
7 is the highest priority, and
8 is
8867 the lowest priority. The default SCSI identification number for a host
8868 bus adapter is
7, because adapters are usually assigned the highest priority.
8869 On a multi-initiator bus, be sure to change the SCSI identification number
8870 of one of the host bus adapters to avoid duplicate values.
8871 <p>A disk in a JBOD enclosure is assigned a SCSI identification number
8872 either manually (by setting jumpers on the disk) or automatically (based
8873 on the enclosure slot number). You can assign identification numbers for
8874 logical units in a RAID subsystem by using the RAID management interface.
8875 <p>To modify an adapter's SCSI identification number, use the system BIOS
8876 utility. When the system boots, a message is displayed describing how to
8877 start the utility. For example, you may be instructed to press Ctrl-A,
8878 and follow the prompts for setting the SCSI identification number. At this
8879 point, you can also enable or disable the adapter's internal termination,
8880 as needed, and disable SCSI bus resets. See
<a href=
"#scsi-term">SCSI Bus
8881 Termination
</a> for more information.
8882 <p>The prioritized arbitration scheme on a SCSI bus can result in low-priority
8883 devices being locked out for some period of time. This may cause commands
8884 to time out, if a low-priority storage device, such as a disk, is unable
8885 to win arbitration and complete a command that a host has queued to it.
8886 For some workloads, you may be able to avoid this problem by assigning
8887 low-priority SCSI identification numbers to the host bus adapters.
8889 <p><a NAME=
"hba"></a>
8891 A
.4 Host Bus Adapter Features and Configuration Requirements
</h2>
8892 Not all host bus adapters can be used with all cluster shared storage configurations.
8893 For example, some host bus adapters do not support hot plugging or cannot
8894 be used in a multi-initiator SCSI bus. You must use host bus adapters with
8895 the features and characteristics that your shared storage configuration
8896 requires. See
<a href=
"#hardware-storage">Configuring Shared Disk Storage
</a>
8897 for information about supported storage configurations.
8898 <p>The following table describes some recommended SCSI and Fibre Channel
8899 host bus adapters. It includes information about adapter termination and
8900 how to use the adapters in single and multi-initiator SCSI buses and Fibre
8901 Channel interconnects.
8902 <p>The specific product devices listed in the table have been tested. However,
8903 other devices may also work well in a cluster. If you want to use a host
8904 bus adapter other than a recommended one, the information in the table
8905 can help you determine if the device has the features and characteristics
8906 that will enable it to work in a cluster.
8908 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
8909 <caption><col width=
55*
><col width=
99*
><col width=
103*
><thead>
8910 <br></thead></caption>
8912 <tr ALIGN=CENTER VALIGN=CENTER
>
8913 <th WIDTH=
"17%">Host Bus Adapter
</th>
8915 <th WIDTH=
"20%">Features
</th>
8917 <th WIDTH=
"22%">Single-Initiator Configuration
</th>
8919 <th WIDTH=
"41%">Multi-Initiator Configuration
</th>
8923 <td WIDTH=
"17%" HEIGHT=
"217"><font size=-
1>Adaptec
2940U2W (minimum driver:
8924 AIC7xxx V5.1
.28)
</font></td>
8926 <td WIDTH=
"20%" HEIGHT=
"217"><font size=-
1>Ultra2, wide, LVD
</font>
8927 <p><font size=-
1>HD68 external connector
</font>
8928 <p><font size=-
1>One channel, with two bus segments
</font>
8929 <p><font size=-
1>Set the onboard termination by using the BIOS utility.
</font>
8930 <p><font size=-
1>Onboard termination is disabled when the power is off.
</font></td>
8932 <td WIDTH=
"22%" HEIGHT=
"217"><font size=-
1>Set the onboard termination
8933 to automatic (the default).
</font>
8934 <p><font size=-
1>You can use the internal SCSI connector for private (non-cluster)
8935 storage.
</font></td>
8937 <td WIDTH=
"41%" HEIGHT=
"217"><font size=-
1>This configuration is not supported,
8938 because the adapter and its Linux driver do not reliably recover from SCSI
8939 bus resets that can be generated by the host bus adapter on the other cluster
8940 system.
</font>
8941 <p><font size=-
1>To use the adapter in a multi-initiator bus, the onboard
8942 termination must be disabled. This ensures proper termination when the
8943 power is off.
</font>
8944 <p><font size=-
1>For hot plugging support, disable the onboard termination
8945 for the Ultra2 segment, and connect an external terminator, such as a pass-through
8946 terminator, to the adapter. You cannot connect a cable to the internal
8947 Ultra2 connector.
</font>
8948 <p><font size=-
1>For no hot plugging support, disable the onboard termination
8949 for the Ultra2 segment, or set it to automatic. Connect a terminator to
8950 the end of the internal cable attached to the internal Ultra2 connector.
</font></td>
8954 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
8955 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
8958 <td WIDTH=
"17%" HEIGHT=
"224"><font size=-
1>Qlogic QLA1080 (minimum driver:
8959 QLA1x160 V3.12, obtained from
<a href=
"http://www.qlogic.com/bbs-html/drivers.html" target=
"new_window">www.qlogic.com/
8960 bbs-html /drivers.html
</a>)
</font></td>
8962 <td WIDTH=
"20%" HEIGHT=
"224"><font size=-
1>Ultra2, wide, LVD
</font>
8963 <p><font size=-
1>VHDCI external connector
</font>
8964 <p><font size=-
1>One channel
</font>
8965 <p><font size=-
1>Set the onboard termination by using the BIOS utility.
</font>
8966 <p><font size=-
1>Onboard termination is disabled when the power is off,
8967 unless jumpers are used to enforce termination.
</font></td>
8969 <td WIDTH=
"22%" HEIGHT=
"224"><font size=-
1>Set the onboard termination
8970 to automatic (the default).
</font>
8971 <p><font size=-
1>You can use the internal SCSI connector for private (non-cluster)
8972 storage.
</font></td>
8974 <td WIDTH=
"41%" HEIGHT=
"224"><font size=-
1>This configuration is not supported,
8975 because the adapter and its Linux driver do not reliably recover from SCSI
8976 bus resets that can be generated by the host bus adapter on the other cluster
8978 <p><font size=-
1> For hot plugging support, disable the onboard termination,
8979 and use an external terminator, such as a VHDCI pass-through terminator,
8980 a VHDCI y-cable or a VHDCI trilink connector. You cannot connect a cable
8981 to the internal Ultra2 connector.
</font>
8982 <p><font size=-
1>For no hot plugging support, disable the onboard termination,
8983 or set it to automatic. Connect a terminator to the end of the internal
8984 cable connected to the internal Ultra2 connector.
</font>
8985 <p><font size=-
1>For an alternate configuration without hot plugging support,
8986 enable the onboard termination with jumpers, so the termination is enforced
8987 even when the power is off. You cannot connect a cable to the internal
8988 Ultra2 connector.
</font></td>
8992 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
8993 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
8996 <td WIDTH=
"17%" HEIGHT=
"267"><font size=-
1>Tekram DC-
390U2W (minimum driver
8997 SYM53C8xx V1.3G)
</font></td>
8999 <td WIDTH=
"20%" HEIGHT=
"267"><font size=-
1>Ultra2, wide, LVD
</font>
9000 <p><font size=-
1>HD68 external connector
</font>
9001 <p><font size=-
1>One channel, two segments
</font>
9002 <p><font size=-
1>Onboard termination for a bus segment is disabled if internal
9003 and external cables are connected to the segment. Onboard termination is
9004 enabled if there is only one cable connected to the segment.
</font>
9005 <p><font size=-
1>Termination is disabled when the power is off.
</font></td>
9007 <td WIDTH=
"22%" HEIGHT=
"267"><font size=-
1>You can use the internal SCSI
9008 connector for private (non-cluster) storage.
</font></td>
9010 <td WIDTH=
"41%" HEIGHT=
"267"><font size=-
1>Testing has shown that the adapter
9011 and its Linux driver reliably recover from SCSI bus resets that can be
9012 generated by the host bus adapter on the other cluster system.
</font>
9013 <p><font size=-
1>The adapter cannot be configured to use external termination,
9014 so it does not support hot plugging.
</font>
9015 <p><font size=-
1>Disable the onboard termination by connecting an internal
9016 cable to the internal Ultra2 connector, and then attaching a terminator
9017 to the end of the cable. This ensures proper termination when the power
9023 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
9024 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
9027 <td WIDTH=
"17%" HEIGHT=
"240"><font size=-
1>Adaptec
29160 (minimum driver:
9028 AIC7xxx V5.1
.28)
</font></td>
9030 <td WIDTH=
"20%" HEIGHT=
"240"><font size=-
1>Ultra160
</font>
9031 <p><font size=-
1>HD68 external connector
</font>
9032 <p><font size=-
1>One channel, with two bus segments
</font>
9033 <p><font size=-
1>Set the onboard termination by using the BIOS utility.
</font>
9034 <p><font size=-
1>Termination is disabled when the power is off, unless
9035 jumpers are used to enforce termination.
</font></td>
9037 <td WIDTH=
"22%" HEIGHT=
"240"><font size=-
1>Set the onboard termination
9038 to automatic (the default).
</font>
9039 <p><font size=-
1>You can use the internal SCSI connector for private (non-cluster)
9040 storage.
</font></td>
9042 <td WIDTH=
"41%" HEIGHT=
"240"><font size=-
1>This configuration is not supported,
9043 because the adapter and its Linux driver do not reliably recover from SCSI
9044 bus resets that can be generated by the host bus adapter on the other cluster
9046 <p><font size=-
1> You cannot connect the adapter to an external terminator,
9047 such as a pass-through terminator, because the adapter does not function
9048 correctly with external termination. Therefore, the adapter does not support
9049 hot plugging.
</font>
9050 <p><font size=-
1>Use jumpers to enable the onboard termination for the
9051 Ultra160 segment. You cannot connect a cable to the internal Ultra160 connector.
</font>
9052 <p><font size=-
1>For an alternate configuration, disable the onboard termination
9053 for the Ultra160 segment, or set it to automatic. Then, attach a terminator
9054 to the end of an internal cable that is connected to the internal Ultra160
9055 connector.
</font></td>
9059 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
9060 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
9063 <td WIDTH=
"17%" HEIGHT=
"221"><font size=-
1>Adaptec
29160LP (minimum driver:
9064 AIC7xxx V5.1
.28)
</font></td>
9066 <td WIDTH=
"20%" HEIGHT=
"221"><font size=-
1>Ultra160
</font>
9067 <p><font size=-
1>VHDCI external connector
</font>
9068 <p><font size=-
1>One channel
</font>
9069 <p><font size=-
1>Set the onboard termination by using the BIOS utility.
</font>
9070 <p><font size=-
1>Termination is disabled when the power is off, unless
9071 jumpers are used to enforce termination.
</font></td>
9073 <td WIDTH=
"22%" HEIGHT=
"221"><font size=-
1>Set the onboard termination
9074 to automatic (the default).
</font>
9075 <p><font size=-
1>You can use the internal SCSI connector for private (non-cluster)
9076 storage.
</font></td>
9078 <td WIDTH=
"41%" HEIGHT=
"221"><font size=-
1>This configuration is not supported,
9079 because the adapter and its Linux driver do not reliably recover from SCSI
9080 bus resets that can be generated by the host bus adapter on the other cluster
9082 <p><font size=-
1> You cannot connect the adapter to an external terminator,
9083 such as a pass-through terminator, because the adapter does not function
9084 correctly with external termination. Therefore, the adapter does not support
9085 hot plugging.
</font>
9086 <p><font size=-
1>Use jumpers to enable the onboard termination. You cannot
9087 connect a cable to the internal Ultra160 connector.
</font>
9088 <p><font size=-
1>For an alternate configuration, disable the onboard termination,
9089 or set it to automatic. Then, attach a terminator to the end of an internal
9090 cable that is connected to the internal Ultra160 connector.
</font></td>
9094 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
9095 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
9098 <td WIDTH=
"17%" HEIGHT=
"239"><font size=-
1>Adaptec
39160 (minimum driver:
9099 AIC7xxx V5.1
.28)
</font>
9100 <p><font size=-
1>Qlogic QLA12160 (minimum driver: QLA1x160 V3.12, obtained
9101 from
<a href=
"http://www.qlogic.com/bbs-html/drivers.html" target=
"new_window">www.qlogic.com/
9102 bbs-html /drivers.html
</a>)
</font></td>
9104 <td WIDTH=
"20%" HEIGHT=
"239"><font size=-
1>Ultra160
</font>
9105 <p><font size=-
1>Two VHDCI external connectors
</font>
9106 <p><font size=-
1>Two channels
</font>
9107 <p><font size=-
1>Set the onboard termination by using the BIOS utility.
</font>
9108 <p><font size=-
1>Termination is disabled when the power is off, unless
9109 jumpers are used to enforce termination.
</font></td>
9111 <td WIDTH=
"22%" HEIGHT=
"239"><font size=-
1>Set onboard termination to automatic
9112 (the default).
</font>
9113 <p><font size=-
1>You can use the internal SCSI connectors for private (non-cluster)
9114 storage.
</font></td>
9116 <td WIDTH=
"41%" HEIGHT=
"239"><font size=-
1>This configuration is not supported,
9117 because the adapter and its Linux driver do not reliably recover from SCSI
9118 bus resets that can be generated by the host bus adapter on the other cluster
9120 <p><font size=-
1> You cannot connect the adapter to an external terminator,
9121 such as a pass-through terminator, because the adapter does not function
9122 correctly with external termination. Therefore, the adapter does not support
9123 hot plugging.
</font>
9124 <p><font size=-
1>Use jumpers to enable the onboard termination for a multi-initiator
9125 SCSI channel. You cannot connect a cable to the internal connector for
9126 the multi-initiator SCSI channel.
</font>
9127 <p><font size=-
1>For an alternate configuration, disable the onboard termination
9128 for the multi-initiator SCSI channel or set it to automatic. Then, attach
9129 a terminator to the end of an internal cable that is connected to the multi-initiator
9130 SCSI channel.
</font></td>
9134 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
9135 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
9138 <td WIDTH=
"17%" HEIGHT=
"131"><font size=-
1>LSI Logic SYM22915 (minimum
9139 driver: SYM53c8xx V1.6b, obtained from
<a href=
"ftp://ftp.lsil.com/HostAdapterDrivers/linux" target=
"new_window">ftp.lsil.com
9140 /HostAdapter Drivers/linux
</a>)
</font></td>
9142 <td WIDTH=
"20%" HEIGHT=
"239"><font size=-
1>Ultra160
</font>
9143 <p><font size=-
1>Two VHDCI external connectors
</font>
9144 <p><font size=-
1>Two channels
</font>
9145 <p><font size=-
1>Set the onboard termination by using the BIOS utility.
</font>
9146 <p><font size=-
1>The onboard termination is automatically enabled or disabled,
9147 depending on the configuration, even when the module power is off. Use
9148 jumpers to disable the automatic termination.
</font></td>
9150 <td WIDTH=
"22%" HEIGHT=
"239"><font size=-
1>Set onboard termination to automatic
9151 (the default).
</font>
9152 <p><font size=-
1>You can use the internal SCSI connectors for private (non-cluster)
9153 storage.
</font></td>
9155 <td WIDTH=
"41%" HEIGHT=
"239"><font size=-
1>Testing has shown that the adapter
9156 and its Linux driver reliably recover from SCSI bus resets that can be
9157 generated by the host bus adapter on the other cluster system.
</font>
9158 <p><font size=-
1>For hot plugging support, use an external terminator,
9159 such as a VHDCI pass-through terminator, a VHDCI y-cable, or a VHDCI trilink
9160 connector. You cannot connect a cable to the internal connector.
</font>
9161 <p><font size=-
1>For no hot plugging support, connect a cable to the internal
9162 connector, and connect a terminator to the end of the internal cable attached
9163 to the internal connector.
</font></td>
9167 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
9168 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
9171 <td WIDTH=
"17%" HEIGHT=
"131"><font size=-
1>Adaptec AIC-
7896 on the Intel
9172 L440GX+ motherboard (as used on the VA Linux
2200 series) (minimum driver:
9173 AIC7xxx V5.1
.28)
</font></td>
9175 <td WIDTH=
"20%" HEIGHT=
"131"><font size=-
1>One Ultra2, wide, LVD port,
9176 and one Ultra, wide port
</font>
9177 <p><font size=-
1>Onboard termination is permanently enabled, so the adapter
9178 must be located at the end of the bus.
</font></td>
9180 <td WIDTH=
"22%" HEIGHT=
"131"><font size=-
1>Termination is permanently enabled,
9181 so no action is needed in order to use the adapter in a single-initiator
9184 <td WIDTH=
"41%" HEIGHT=
"131"><font size=-
1>The adapter cannot be used in
9185 a multi-initiator configuration, because it does not function correctly
9186 in this configuration.
</font></td>
9190 <table BORDER CELLSPACING=
0 CELLPADDING=
4 WIDTH=
"100%" style=
"page-break-before: always" >
9191 <caption><col width=
55*
><col width=
99*
><col width=
103*
></caption>
9194 <td WIDTH=
"17%" HEIGHT=
"131"><font size=-
1>QLA2200 (minimum driver: QLA2x00
9195 V2.23, obtained from
<a href=
"http://www.qlogic.com/bbs-html/drivers.html" target=
"new_window">www.qlogic.com
9196 /bbs-html /drivers.html
</a>)
</font></td>
9198 <td WIDTH=
"20%" HEIGHT=
"239"><font size=-
1>Fibre Channel arbitrated loop
9200 <p><font size=-
1>One channel
</font></td>
9202 <td WIDTH=
"22%" HEIGHT=
"239"><font size=-
1>Can be implemented with point-to-point
9203 links or with hubs. Configurations with switches have not been tested.
</font>
9204 <p><font size=-
1>Hubs are required for connection to a dual-controller
9205 RAID array or to multiple RAID arrays.
</font></td>
9207 <td WIDTH=
"41%" HEIGHT=
"239"><font size=-
1>This configuration has not been
9208 tested.
</font></td>
9212 <p><a NAME=
"adaptec"></a>
9214 A
.5 Adaptec Host Bus Adapter Requirement
</h2>
9215 If you are using Adaptec host bus adapters in multi-initiator shared disk
9216 storage connection, edit the
<b><font face=
"Courier New, Courier, mono">/etc/lilo.conf
</font></b>
9217 file and either add the following line or edit the
<b><font face=
"Courier New, Courier, mono">append
</font></b>
9218 line to match the following line:
9219 <p><b><font face=
"Courier New, Courier, mono">append=
"aic7xxx=no_reset"</font></b>
9221 <p><a NAME=
"supp-software"></a>
9223 B Supplementary Software Information
</h1>
9224 The information in the following sections can help you manage the cluster
9225 software configuration:
9228 <a href=
"#cluster-com">Cluster Communication Mechanisms
</a></li>
9231 <a href=
"#cluster-daemons">Cluster Daemons
</a></li>
9234 <a href=
"#admin-scenarios">Failover and Recovery Scenarios
</a></li>
9237 <a href=
"#app-tuning">Tuning Oracle Services
</a></li>
9240 <a href=
"#lvs">Using a Cluster in an LVS Environment
</a></li>
9244 <p><a NAME=
"cluster-com"></a>
9246 B
.1 Cluster Communication Mechanisms
</h2>
9247 A cluster uses several intracluster communication mechanisms to ensure
9248 data integrity and correct cluster behavior when a failure occurs. The
9249 cluster uses these mechanisms to:
9252 Control when a system can become a cluster member
</li>
9255 Determine the state of the cluster systems
</li>
9258 Control the behavior of the cluster when a failure occurs
</li>
9260 The cluster communication mechanisms are as follows:
9263 Quorum disk partitions
</li>
9269 <p>Periodically, each cluster system writes a timestamp and system status
9270 (UP or DOWN) to the primary and backup quorum partitions, which are raw
9271 partitions located on shared storage. Each cluster system reads the system
9272 status and timestamp that were written by the other cluster system and
9273 determines if they are up to date. The cluster systems attempt to read
9274 the information from the primary quorum partition. If this partition is
9275 corrupted, the cluster systems read the information from the backup quorum
9276 partition and simultaneously repair the primary partition. Data consistency
9277 is maintained through checksums and any inconsistencies between the partitions
9278 are automatically corrected.
9279 <p>If a cluster system reboots but cannot write to both quorum partitions,
9280 the system will not be allowed to join the cluster. In addition, if an
9281 existing cluster system can no longer write to both partitions, it removes
9282 itself from the cluster by shutting down.
9285 Remote power switch monitoring
</li>
9291 <p>Periodically, each cluster system monitors the health of the remote
9292 power switch connection, if any. The cluster system uses this information
9293 to help determine the status of the other cluster system. The complete
9294 failure of the power switch communication mechanism does not automatically
9295 result in a failover.
9298 Ethernet and serial heartbeats
</li>
9304 <p>The cluster systems are connected together by using point-to-point Ethernet
9305 and serial lines. Periodically, each cluster system issues heartbeats (pings)
9306 across these lines. The cluster uses this information to help determine
9307 the status of the systems and to ensure correct cluster operation. The
9308 complete failure of the heartbeat communication mechanism does not automatically
9309 result in a failover.
</ul>
9310 If a cluster system determines that the quorum timestamp from the other
9311 cluster system is not up-to-date, it will check the heartbeat status. If
9312 heartbeats to the system are still operating, the cluster will take no
9313 action at this time. If a cluster system does not update its timestamp
9314 after some period of time, and does not respond to heartbeat pings, it
9316 <p>Note that the cluster will remain operational as long as one cluster
9317 system can write to the quorum disk partitions, even if all other communication
9320 <p><a NAME=
"cluster-daemons"></a>
9322 B
.2 Cluster Daemons
</h2>
9323 The cluster daemons are as follows:
9332 <p>On each cluster system, the
<b><font face=
"Courier New, Courier, mono">cluquorumd
</font></b>
9333 quorum daemon periodically writes a timestamp and system status to a specific
9334 area on the primary and backup quorum disk partitions. The daemon also
9335 reads the other cluster system's timestamp and system status information
9336 from the primary quorum partition or, if the primary partition is corrupted,
9337 from the backup partition.
9340 Heartbeat daemon
</li>
9346 <p>On each cluster system, the
<b><font face=
"Courier New, Courier, mono">cluhbd
</font></b>
9347 heartbeat daemon issues pings across the point-to-point Ethernet and serial
9348 lines to which both cluster systems are connected.
9357 <p>On each cluster system, the
<b><font face=
"Courier New, Courier, mono">clupowerd
</font></b>
9358 power daemon monitors the remote power switch connection, if any.
9359 You will notice that there are
2 separate
<b>clupowerd
</b>processes running.
9360 One is the
<i>master
</i> process which responds to message requests (e.g.
9361 status and power cycle); the other process does periodic polling of the
9362 power switch status.
9365 Service manager daemon
</li>
9371 <p>On each cluster system, the
<b><font face=
"Courier New, Courier, mono">clusvcmgrd
</font></b>
9372 service manager daemon responds to changes in cluster membership by stopping
9373 and starting services.
9377 <p><a NAME=
"admin-scenarios"></a>
9379 B
.3 Failover and Recovery Scenarios
</h2>
9380 Understanding cluster behavior when significant events occur can help you
9381 manage a cluster. Note that cluster behavior depends on whether you are
9382 using power switches in the configuration. Power switches enable the cluster
9383 to maintain complete data integrity under all failure conditions.
9384 <p>The following sections describe how the system will respond to various
9385 failure and error scenarios:
9388 <a href=
"#admin-failure">System Hang
</a></li>
9391 <a href=
"#admin-panic">System Panic
</a></li>
9394 <a href=
"#admin-storage">Inaccessible Quorum Partitions
</a></li>
9397 <a href=
"#admin-network">Total Network Connection Failure
</a></li>
9400 <a href=
"#admin-power">Remote Power Switch Connection Failure
</a></li>
9403 <a href=
"#admin-quorum">Quorum Daemon Failure
</a></li>
9406 <a href=
"#admin-heartbeat">Heartbeat Daemon Failure
</a></li>
9409 <a href=
"#admin-powerd">Power Daemon Failure
</a></li>
9412 <a href=
"#admin-serviceman">Service Manager Daemon Failure
</a></li>
9417 <p><a NAME=
"admin-failure"></a>
9419 B
.3.1 System Hang
</h3>
9420 In a cluster configuration that uses power switches, if a system
"hangs,"
9421 the cluster behaves as follows:
9424 The functional cluster system detects that the
"hung" cluster system is
9425 not updating its timestamp on the quorum partitions and is not communicating
9426 over the heartbeat channels.
</li>
9430 The functional cluster system power-cycles the
"hung" system.
</li>
9434 The functional cluster system restarts any services that were running on
9435 the
"hung" system.
</li>
9439 If the previously
"hung" system reboots, and can join the cluster (that
9440 is, the system can write to both quorum partitions), services are re-balanced
9441 across the member systems, according to each service's placement policy.
</li>
9443 In a cluster configuration that does not use power switches, if a system
9444 "hangs," the cluster behaves as follows:
9447 The functional cluster system detects that the
"hung" cluster system is
9448 not updating its timestamp on the quorum partitions and is not communicating
9449 over the heartbeat channels.
</li>
9453 The functional cluster system sets the status of the
"hung" system to
<b><font face=
"Courier New, Courier, mono">DOWN
</font></b>
9454 on the quorum partitions, and then restarts the
"hung" system's services.
</li>
9458 If the
"hung" system becomes
"unhung," it notices that its status is
<b><font face=
"Courier New, Courier, mono">DOWN
</font></b>,
9459 and initiates a system reboot.
</li>
9465 <p>If the system remains
"hung," you must manually power-cycle the
"hung"
9466 system in order for it to resume cluster operation.
9469 If the previously
"hung" system reboots, and can join the cluster, services
9470 are re-balanced across the member systems, according to each service's
9471 placement policy.
</li>
9475 <p><a NAME=
"admin-panic"></a>
9477 B
.3.2 System Panic
</h3>
9478 A system panic (crash) is a controlled response to a software-detected
9479 error. A panic attempts to return the system to a consistent state by shutting
9480 down the system. If a cluster system panics, the following occurs:
9483 The functional cluster system detects that the cluster system that is experiencing
9484 the panic is not updating its timestamp on the quorum partitions and is
9485 not communicating over the heartbeat channels.
</li>
9489 The cluster system that is experiencing the panic initiates a system shut
9490 down and reboot.
</li>
9494 If you are using power switches, the functional cluster system power-cycles
9495 the cluster system that is experiencing the panic.
</li>
9499 The functional cluster system restarts any services that were running on
9500 the system that experienced the panic.
</li>
9504 When the system that experienced the panic reboots, and can join the cluster
9505 (that is, the system can write to both quorum partitions), services are
9506 re-balanced across the member systems, according to each service's placement
9511 <p><a NAME=
"admin-storage"></a>
9513 B
.3.3 Inaccessible Quorum Partitions
</h3>
9514 Inaccessible quorum partitions can be caused by the failure of a SCSI (or
9515 FibreChannel) adapter that is connected to the shared disk storage, or
9516 by a SCSI cable becoming disconnected to the shared disk storage. If one
9517 of these conditions occurs, and the SCSI bus remains terminated, the cluster
9521 The cluster system with the inaccessible quorum partitions notices that
9522 it cannot update its timestamp on the quorum partitions and initiates a
9527 If the cluster configuration includes power switches, the functional cluster
9528 system power-cycles the rebooting system.
</li>
9532 The functional cluster system restarts any services that were running on
9533 the system with the inaccessible quorum partitions.
</li>
9537 If the cluster system reboots, and can join the cluster (that is, the system
9538 can write to both quorum partitions), services are re-balanced across the
9539 member systems, according to each service's placement policy.
</li>
9544 <a NAME=
"admin-network"></a></h3>
9547 B
.3.4 Total Network Connection Failure
</h3>
9548 A total network connection failure occurs when all the heartbeat network
9549 connections between the systems fail. This can be caused by one of the
9553 All the heartbeat network cables are disconnected from a system.
</li>
9557 All the serial connections and network interfaces used for heartbeat communication
9560 If a total network connection failure occurs, both systems detect the problem,
9561 but they also detect that the SCSI disk connections are still active. Therefore,
9562 services remain running on the systems and are not interrupted.
9563 <p>If a total network connection failure occurs, diagnose the problem and
9564 then do one of the following:
9567 If the problem affects only one cluster system, relocate its services to
9568 the other system. You can then correct the problem, and relocate the services
9569 back to the original system.
</li>
9573 Manually stop the services on one cluster system. In this case, services
9574 do not automatically fail over to the other system. Instead, you must manually
9575 restart the services on the other system. After you correct the problem,
9576 you can re-balance the services across the systems.
</li>
9580 Shut down one cluster system. In this case, the following occurs:
</li>
9585 Services are stopped on the cluster system that is shut down.
</li>
9589 The remaining cluster system detects that the system is being shut down.
</li>
9593 Any services that were running on the system that was shut down are restarted
9594 on the remaining cluster system.
</li>
9598 If the system reboots, and can join the cluster (that is, the system can
9599 write to both quorum partitions), services are re-balanced across the member
9600 systems, according to each service's placement policy.
</li>
9605 <p><a NAME=
"admin-power"></a>
9607 B
.3.5 Remote Power Switch Connection Failure
</h3>
9608 If a query to a remote power switch connection fails, but both systems
9609 continue to have power, there is no change in cluster behavior unless a
9610 cluster system attempts to use the failed remote power switch connection
9611 to power-cycle the other system. The power daemon will continually log
9612 high-priority messages indicating a power switch failure or a loss of connectivity
9613 to the power switch (for example, if a cable has been disconnected).
9614 <p>If a cluster system attempts to use a failed remote power switch, services
9615 running on the system that experienced the failure are stopped. However,
9616 to ensure data integrity, they are not failed over to the other cluster
9617 system. Instead, they remain stopped until the hardware failure is corrected.
9620 <p><a NAME=
"admin-quorum"></a>
9622 B
.3.6 Quorum Daemon Failure
</h3>
9623 If a quorum daemon fails on a cluster system, the system is no longer able
9624 to monitor the quorum partitions. If you are not using power switches in
9625 the cluster, this error condition may result in services being run on more
9626 than one cluster system, which can cause data corruption.
9627 <p>If a quorum daemon fails, and power switches are used in the cluster,
9628 the following occurs:
9631 The functional cluster system detects that the cluster system whose quorum
9632 daemon has failed is not updating its timestamp on the quorum partitions,
9633 although the system is still communicating over the heartbeat channels.
</li>
9637 After a period of time, the functional cluster system power-cycles the
9638 cluster system whose quorum daemon has failed.
</li>
9642 The functional cluster system restarts any services that were running on
9643 the cluster system whose quorum daemon has failed.
</li>
9647 If the cluster system reboots and can join the cluster (that is, it can
9648 write to the quorum partitions), services are re-balanced across the member
9649 systems, according to each service's placement policy.
</li>
9651 If a quorum daemon fails, and power switches are not used in the cluster,
9652 the following occurs:
9655 The functional cluster system detects that the cluster system whose quorum
9656 daemon has failed is not updating its timestamp on the quorum partitions,
9657 although the system is still communicating over the heartbeat channels.
</li>
9661 The functional cluster system restarts any services that were running on
9662 the cluster system whose quorum daemon has failed. Both cluster systems
9663 may be running services simultaneously, which can cause data corruption.
</li>
9669 <a NAME=
"admin-heartbeat"></a></h3>
9672 B
.3.7 Heartbeat Daemon Failure
</h3>
9673 If the heartbeat daemon fails on a cluster system, service failover time
9674 will increase because the quorum daemon cannot quickly determine the state
9675 of the other cluster system. By itself, a heartbeat daemon failure will
9676 not cause a service failover.
9677 <p><a NAME=
"admin-powerd"></a>
9679 B
.3.8 Power Daemon Failure
</h3>
9680 If the power daemon fails on a cluster system and the other cluster system
9681 experiences a severe failure (for example, a system panic), the cluster
9682 system will not be able to power-cycle the failed system. Instead, the
9683 cluster system will continue to run its services, and the services that
9684 were running on the failed system will not fail over. Cluster behavior
9685 is the same as for a remote power switch connection failure.
9687 <p><a NAME=
"admin-serviceman"></a>
9689 B
.3.9 Service Manager Daemon Failure
</h3>
9690 If the service manager daemon fails, services cannot be started or stopped
9691 until you restart the service manager daemon or reboot the system.
9694 <p><a NAME=
"app-tuning"></a>
9696 B
.5 Tuning Oracle Services
</h2>
9697 The Oracle database recovery time after a failover is directly proportional
9698 to the number of outstanding transactions and the size of the database.
9699 The following parameters control database recovery time:
9702 <b><font face=
"Courier New, Courier, mono">LOG_CHECKPOINT_TIMEOUT
</font></b></li>
9705 <b><font face=
"Courier New, Courier, mono">LOG_CHECKPOINT_INTERVAL
</font></b></li>
9708 <b><font face=
"Courier New, Courier, mono">FAST_START_IO_TARGET
</font></b></li>
9711 <b><font face=
"Courier New, Courier, mono">REDO_LOG_FILE_SIZES
</font></b></li>
9713 To minimize recovery time, set the previous parameters to relatively low
9714 values. Note that excessively low values will adversely impact performance.
9715 You may have to try different values in order to find the optimal value.
9716 <p>Oracle provides additional tuning parameters that control the number
9717 of database transaction retries and the retry delay time. Be sure that
9718 these values are large enough to accommodate the failover time in your
9719 environment. This will ensure that failover is transparent to database
9720 client application programs and does not require programs to reconnect.
9721 <br><a NAME=
"lvs"></a>
9723 B
.7 Using a Cluster in an LVS Environment
</h2>
9724 <i>Editorial comment: Integrate with Piranha documentation.
</i>
9725 <p>You can use a cluster in conjunction with Linux Virtual Server (LVS)
9726 to deploy a highly available e-commerce site that has complete data integrity
9727 and application availability, in addition to load balancing capabilities.
9728 Note that various commercial cluster offerings are LVS derivatives. See
9729 <a href=
"http://www.linuxvirtualserver.org" target=
"_blank">www.linuxvirtualserver.org
</a>
9730 for detailed information about LVS and downloading the software.
9731 <p>The following figure shows how you could use a cluster in an LVS environment.
9732 It has a three-tier architecture, where the top tier consists of LVS load-balancing
9733 systems to distribute Web requests, the second tier consists of a set of
9734 Web servers to serve the requests, and the third tier consists of a cluster
9735 to serve data to the Web servers.
9737 Cluster in an LVS Environment
</h4>
9738 <img SRC=
"lvs_cluster.jpg" >
9739 <p>In an LVS configuration, client systems issue requests on the World
9740 Wide Web. For security reasons, these requests enter a Web site through
9741 a firewall, which can be a Linux system serving in that capacity or a dedicated
9742 firewall device. For redundancy, you can configure firewall devices in
9743 a failover configuration. Behind the firewall are LVS load-balancing systems,
9744 which can be configured in an active-standby mode. The active load-balancing
9745 system forwards the requests to a set of Web servers.
9746 <p>Each Web server can independently process an HTTP request from a client
9747 and send the response back to the client. LVS enables you to expand a Web
9748 site's capacity by adding Web servers to the load-balancing systems' set
9749 of active Web servers. In addition, if a Web server fails, it can be removed
9751 <p>This LVS configuration is particularly suitable if the Web servers serve
9752 only static Web content, which consists of small amounts of infrequently
9753 changing data, such as corporate logos, that can be easily duplicated on
9754 the Web servers. However, this configuration is not suitable if the Web
9755 servers serve dynamic content, which consists of information that changes
9756 frequently. Dynamic content could include a product inventory, purchase
9757 orders, or customer database, which must be consistent on all the Web servers
9758 to ensure that customers have access to up-to-date and accurate information.
9759 <p>To serve dynamic Web content in an LVS configuration, you can add a
9760 cluster behind the Web servers, as shown in the previous figure. This combination
9761 of LVS and a cluster enables you to configure a high-integrity, no-single-point-of-failure
9762 e-commerce site. The cluster can run a highly-available instance of a database
9763 or a set of databases that are network-accessible to the web servers.
9764 <p>For example, the figure could represent an e-commerce site used for
9765 online merchandise ordering through a URL. Client requests to the URL pass
9766 through the firewall to the active LVS load-balancing system, which then
9767 forwards the requests to one of the three Web servers. The cluster systems
9768 serve dynamic data to the Web servers, which forward the data to the requesting
9771 <hr width=
"75%" noshade
>