doc/cluster.html

   1 <!doctype html public "-//w3c//dtd html 4.0 transitional//en">
   2 <html>
   3 <head>
   4    <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
   5    <meta name="GENERATOR" content="Mozilla/4.78 [en] (X11; U; Linux 2.4.7-10 i686) [Netscape]">
   6    <title>Cluster Installation and Administration</title>
   7 <DOCTYPE HTML PUBLIC W3C//DTD 4.0//EN">
   8 </head>
   9 <body bgcolor="#FFFFFF">
  10 &nbsp;
  11 <h1>
  12 <font size=+4>Installation and Administration *Draft*</font></h1>
  13
  14 <h2>
  15 <font size=+4>Red Hat Cluster Manager</font></h2>
  16 &nbsp;
  17 <p>&nbsp;
  18 <br>&nbsp;
  19 <br>&nbsp;
  20 <br>&nbsp;
  21 <br>&nbsp;
  22 <br>&nbsp;
  23 <br>&nbsp;
  24 <br>&nbsp;
  25 <br>&nbsp;
  26 <br>&nbsp;
  27 <br>&nbsp;
  28 <br>&nbsp;
  29 <br>&nbsp;
  30 <br>&nbsp;
  31 <br>&nbsp;
  32 <br>&nbsp;
  33 <br>&nbsp;
  34 <p>Copyright &copy; 2000 Mission Critical Linux, Inc.
  35 <br>Copyright &copy; 2002 Red Hat, Inc.
  36 <p>January, 2002
  37 <br>&nbsp;
  38 <br>&nbsp;
  39 <p>This document describes how to set up and manage the Red Hat Cluster
  40 Manager, which provides application availability and data integrity.
  41 <br>&nbsp;
  42 <br>&nbsp;
  43 <br>&nbsp;
  44 <br>&nbsp;
  45 <br>&nbsp;
  46 <p><i>Editorial comments:</i>
  47 <p><i>&nbsp;&nbsp;&nbsp; Searching for "Editorial comment" will highlight
  48 many areas which need some work.</i>
  49 <ul>
  50 <li>
  51 <i>Need to update TOC &amp; heading numbers (as sections have been added
  52 &amp; deleted).</i></li>
  53
  54 <li>
  55 <i>New power management scheme not done yet.</i></li>
  56
  57 <li>
  58 <i>New NFS services section added, but awaiting editorial review.</i></li>
  59
  60 <li>
  61 <i>Needs Piranha integration work.</i></li>
  62
  63 <li>
  64 <i>Needs updates to reflect service manager changes as well as service
  65 monitoring.</i></li>
  66
  67 <li>
  68 <i>Many sections ended up with lots of unnecessary extra blank lines (gratuitiously
  69 thrown in by Netscape Composer?).</i></li>
  70
  71 <li>
  72 <i>Could benefit from a spell check.</i></li>
  73 </ul>
  74
  75 <br>&nbsp;
  76 <br>&nbsp;
  77 <br>&nbsp;
  78 <br>&nbsp;
  79 <br>&nbsp;
  80 <br>&nbsp;
  81 <br>&nbsp;
  82 <br>&nbsp;
  83 <br>&nbsp;
  84 <br>&nbsp;
  85 <br>&nbsp;
  86 <br>&nbsp;
  87 <br>&nbsp;
  88 <br>&nbsp;
  89 <br>&nbsp;
  90 <br>&nbsp;
  91 <br>&nbsp;
  92 <br>&nbsp;
  93 <br>&nbsp;
  94 <br>&nbsp;
  95 <h2>
  96 Table of Contents</h2>
  97
  98 <table BORDER=0 CELLSPACING=0 CELLPADDING=3 WIDTH="75%" NOSAVE >
  99 <tr NOSAVE>
 100 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37" NOSAVE><font size=+1><a href="#changes">New
 101 and Changed Features</a></font></td>
 102 </tr>
 103
 104 <tr>
 105 <td COLSPAN="6"><font size=+1><a href="#introduction">1 Introduction</a></font></td>
 106 </tr>
 107
 108 <tr>
 109 <td WIDTH="2%">&nbsp;</td>
 110
 111 <td COLSPAN="5"><a href="#overview">1.1 Cluster Overview</a></td>
 112 </tr>
 113
 114 <tr>
 115 <td WIDTH="2%">&nbsp;</td>
 116
 117 <td COLSPAN="5"><a href="#features">1.2 Cluster Features</a></td>
 118 </tr>
 119
 120 <tr>
 121 <td WIDTH="2%">&nbsp;</td>
 122
 123 <td COLSPAN="5"><a href="#steps">1.3 How To Use This Manual</a></td>
 124 </tr>
 125
 126 <tr>
 127 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37"><font size=+1><a href="#hardware">2
 128 Hardware Installation and Operating System Configuration&nbsp;</a></font></td>
 129 </tr>
 130
 131 <tr>
 132 <td WIDTH="2%">&nbsp;</td>
 133
 134 <td COLSPAN="5"><a href="#gather">2.1 Choosing a Hardware Configuration</a></td>
 135 </tr>
 136
 137 <tr>
 138 <td WIDTH="2%">&nbsp;</td>
 139
 140 <td WIDTH="2%">&nbsp;</td>
 141
 142 <td COLSPAN="4"><a href="#hardware-table">2.1.1 Cluster Hardware Table</a></td>
 143 </tr>
 144
 145 <tr>
 146 <td WIDTH="2%">&nbsp;</td>
 147
 148 <td WIDTH="2%">&nbsp;</td>
 149
 150 <td COLSPAN="4"><a href="#install-min">2.1.2 Example of a Minimum Cluster
 151 Configuration&nbsp;</a></td>
 152 </tr>
 153
 154 <tr>
 155 <td WIDTH="2%">&nbsp;</td>
 156
 157 <td WIDTH="2%">&nbsp;</td>
 158
 159 <td COLSPAN="4"><a href="#install-max">2.1.3 Example of a No-Single-Point-Of-Failure
 160 Configuration</a></td>
 161 </tr>
 162
 163 <tr>
 164 <td WIDTH="2%" HEIGHT="21">&nbsp;</td>
 165
 166 <td COLSPAN="5" HEIGHT="21"><a href="#basic-install">2.2 Steps for Setting
 167 Up the Cluster Systems</a></td>
 168 </tr>
 169
 170 <tr>
 171 <td WIDTH="2%">&nbsp;</td>
 172
 173 <td WIDTH="2%">&nbsp;</td>
 174
 175 <td COLSPAN="4"><a href="#hardware-system">2.2.1 Installing the Basic System
 176 Hardware</a></td>
 177 </tr>
 178
 179 <tr>
 180 <td WIDTH="2%">&nbsp;</td>
 181
 182 <td WIDTH="2%">&nbsp;</td>
 183
 184 <td COLSPAN="4"><a href="#hardware-terminal">2.2.2 Setting Up a Console
 185 Switch</a></td>
 186 </tr>
 187
 188 <tr>
 189 <td WIDTH="2%">&nbsp;</td>
 190
 191 <td WIDTH="2%">&nbsp;</td>
 192
 193 <td COLSPAN="4"><a href="#hardware-network">2.2.3 Setting Up a Network
 194 Switch or Hub</a></td>
 195 </tr>
 196
 197 <tr>
 198 <td WIDTH="2%">&nbsp;</td>
 199
 200 <td COLSPAN="5"><a href="#install-linux">2.3 Steps for Installing and Configuring
 201 the Linux Distribution</a></td>
 202 </tr>
 203
 204 <tr>
 205 <td WIDTH="2%">&nbsp;</td>
 206
 207 <td WIDTH="2%">&nbsp;</td>
 208
 209 <td COLSPAN="4"><a href="#linux-dist">2.3.1 Linux Distribution and Kernel
 210 Requirements</a></td>
 211 </tr>
 212
 213 <tr>
 214 <td WIDTH="2%">&nbsp;</td>
 215
 216 <td WIDTH="2%">&nbsp;</td>
 217
 218 <td COLSPAN="3">&nbsp;</td>
 219
 220 <td WIDTH="96%"><a href="#valinux">2.3.1.1 VA Linux Distribution Installation
 221 Requirements&nbsp;</a></td>
 222 </tr>
 223
 224 <tr>
 225 <td WIDTH="2%">&nbsp;</td>
 226
 227 <td WIDTH="2%">&nbsp;</td>
 228
 229 <td COLSPAN="3">&nbsp;</td>
 230
 231 <td WIDTH="96%"><a href="#redhat">2.3.1.2 Red Hat Distribution Installation
 232 Requirements</a></td>
 233 </tr>
 234
 235 <tr>
 236 <td WIDTH="2%">&nbsp;</td>
 237
 238 <td WIDTH="2%">&nbsp;</td>
 239
 240 <td COLSPAN="4"><a href="#hosts">2.3.2 Editing the /etc/hosts File</a></td>
 241 </tr>
 242
 243 <tr>
 244 <td WIDTH="2%">&nbsp;</td>
 245
 246 <td WIDTH="2%">&nbsp;</td>
 247
 248 <td COLSPAN="4"><a href="#alt-kernel">2.3.3 Decreasing the Kernel Boot
 249 Timeout Limit</a></td>
 250 </tr>
 251
 252 <tr>
 253 <td WIDTH="2%">&nbsp;</td>
 254
 255 <td WIDTH="2%">&nbsp;</td>
 256
 257 <td COLSPAN="4"><a href="#dmesg">2.3.4 Displaying Console Startup Messages</a></td>
 258 </tr>
 259
 260 <tr>
 261 <td WIDTH="2%">&nbsp;</td>
 262
 263 <td WIDTH="2%">&nbsp;</td>
 264
 265 <td COLSPAN="4"><a href="#devices-kernel">2.3.5 Displaying Devices Configured
 266 in the Kernel</a></td>
 267 </tr>
 268
 269 <tr>
 270 <td WIDTH="2%">&nbsp;</td>
 271
 272 <td COLSPAN="5"><a href="#install-cluster">2.4 Steps for Setting Up and
 273 Connecting the Cluster Hardware</a></td>
 274 </tr>
 275
 276 <tr>
 277 <td WIDTH="2%">&nbsp;</td>
 278
 279 <td WIDTH="2%">&nbsp;</td>
 280
 281 <td COLSPAN="5"><a href="#hardware-heart">2.4.1 Configuring Heartbeat Channels</a></td>
 282 </tr>
 283
 284 <tr>
 285 <td WIDTH="2%">&nbsp;</td>
 286
 287 <td WIDTH="2%">&nbsp;</td>
 288
 289 <td COLSPAN="4"><a href="#hardware-power">2.4.2 Configuring Power Switches</a></td>
 290 </tr>
 291
 292 <tr>
 293 <td WIDTH="2%">&nbsp;</td>
 294
 295 <td WIDTH="2%">&nbsp;</td>
 296
 297 <td COLSPAN="4"><a href="#hardware-ups">2.4.3 Configuring UPS Systems</a></td>
 298 </tr>
 299
 300 <tr>
 301 <td WIDTH="2%">&nbsp;</td>
 302
 303 <td WIDTH="2%">&nbsp;</td>
 304
 305 <td COLSPAN="4"><a href="#hardware-storage">2.4.4 Configuring Shared Disk
 306 Storage</a></td>
 307 </tr>
 308
 309 <tr>
 310 <td WIDTH="2%">&nbsp;</td>
 311
 312 <td WIDTH="2%">&nbsp;</td>
 313
 314 <td COLSPAN="3">&nbsp;</td>
 315
 316 <td WIDTH="96%"><a href="#multiinit">2.4.4.1 Setting Up a Multi-Initiator
 317 SCSI Bus</a></td>
 318 </tr>
 319
 320 <tr>
 321 <td WIDTH="2%">&nbsp;</td>
 322
 323 <td WIDTH="2%">&nbsp;</td>
 324
 325 <td COLSPAN="3">&nbsp;</td>
 326
 327 <td WIDTH="96%"><a href="#singleinit">2.4.4.2 Setting Up a Single-Initiator
 328 SCSI Bus</a></td>
 329 </tr>
 330
 331 <tr>
 332 <td WIDTH="2%">&nbsp;</td>
 333
 334 <td WIDTH="2%">&nbsp;</td>
 335
 336 <td COLSPAN="3">&nbsp;</td>
 337
 338 <td WIDTH="96%"><a href="#single-fibre">2.4.4.3 Setting Up a Single-Initiator
 339 Fibre Channel Interconnect</a></td>
 340 </tr>
 341
 342 <tr>
 343 <td WIDTH="2%">&nbsp;</td>
 344
 345 <td WIDTH="2%">&nbsp;</td>
 346
 347 <td COLSPAN="3">&nbsp;</td>
 348
 349 <td WIDTH="96%"><a href="#state-partitions">2.4.4.4 Configuring the Quorum
 350 Partitions&nbsp;</a></td>
 351 </tr>
 352
 353 <tr>
 354 <td WIDTH="2%">&nbsp;</td>
 355
 356 <td WIDTH="2%">&nbsp;</td>
 357
 358 <td COLSPAN="3">&nbsp;</td>
 359
 360 <td WIDTH="96%"><a href="#partition">2.4.4.5 Partitioning Disks</a></td>
 361 </tr>
 362
 363 <tr>
 364 <td WIDTH="2%">&nbsp;</td>
 365
 366 <td WIDTH="2%">&nbsp;</td>
 367
 368 <td COLSPAN="3">&nbsp;</td>
 369
 370 <td WIDTH="96%"><a href="#rawdevices">2.4.4.6 Creating Raw Devices</a></td>
 371 </tr>
 372
 373 <tr>
 374 <td WIDTH="2%">&nbsp;</td>
 375
 376 <td WIDTH="2%">&nbsp;</td>
 377
 378 <td COLSPAN="3">&nbsp;</td>
 379
 380 <td WIDTH="96%"><a href="#filesystems">2.4.4.7 Creating File Systems</a></td>
 381 </tr>
 382
 383 <tr>
 384 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="35"><font size=+1><a href="#software">3
 385 Cluster Software Installation and Initialization</a></font></td>
 386 </tr>
 387
 388 <tr>
 389 <td WIDTH="2%">&nbsp;</td>
 390
 391 <td COLSPAN="5"><a href="#software-steps">3.1 Steps for Installing and
 392 Initializing the Cluster Software</a></td>
 393 </tr>
 394
 395 <tr>
 396 <td WIDTH="2%">&nbsp;</td>
 397
 398 <td WIDTH="2%">&nbsp;</td>
 399
 400 <td COLSPAN="4"><a href="#software-rawdevices">3.1.1 Editing the rawdevices
 401 File</a></td>
 402 </tr>
 403
 404 <tr>
 405 <td WIDTH="2%">&nbsp;</td>
 406
 407 <td WIDTH="2%">&nbsp;</td>
 408
 409 <td COLSPAN="4"><a href="#software-config">3.1.2 Example of the cluconfig
 410 Utility</a></td>
 411 </tr>
 412
 413 <tr>
 414 <td WIDTH="2%">&nbsp;</td>
 415
 416 <td COLSPAN="5"><a href="#software-check">3.2 Checking the Cluster Configuration</a></td>
 417 </tr>
 418
 419 <tr>
 420 <td WIDTH="2%">&nbsp;</td>
 421
 422 <td WIDTH="2%">&nbsp;</td>
 423
 424 <td COLSPAN="4"><a href="#cludiskutil">3.2.1 Testing the Quorum Partitions&nbsp;</a></td>
 425 </tr>
 426
 427 <tr>
 428 <td WIDTH="2%">&nbsp;</td>
 429
 430 <td WIDTH="2%">&nbsp;</td>
 431
 432 <td COLSPAN="4"><a href="#pswitch">3.2.2 Testing the Power Switches</a></td>
 433 </tr>
 434
 435 <tr>
 436 <td WIDTH="2%">&nbsp;</td>
 437
 438 <td WIDTH="2%">&nbsp;</td>
 439
 440 <td COLSPAN="4"><a href="#release">3.2.3 Displaying the Cluster Software
 441 Version</a></td>
 442 </tr>
 443
 444 <tr>
 445 <td WIDTH="2%">&nbsp;</td>
 446
 447 <td COLSPAN="5"><a href="#software-logging">3.3 Configuring syslog Event
 448 Logging&nbsp;</a></td>
 449 </tr>
 450
 451 <tr>
 452 <td WIDTH="2%">&nbsp;</td>
 453
 454 <td COLSPAN="5"><a href="#software-ui">3.4 Using the cluadmin Utility</a></td>
 455 </tr>
 456
 457 <tr>
 458 <td WIDTH="2%">&nbsp;</td>
 459
 460 <td COLSPAN="5"><a href="#software-gui">3.5 Configuring and Using the Graphical
 461 User Interface</a></td>
 462 </tr>
 463
 464 <tr>
 465 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="36"><font size=+1><a href="#service">4
 466 Service Configuration and Administration</a></font></td>
 467 </tr>
 468
 469 <tr>
 470 <td WIDTH="2%">&nbsp;</td>
 471
 472 <td COLSPAN="5"><font size=+0><a href="#service-configure">4.1 Configuring
 473 a Service</a></font></td>
 474 </tr>
 475
 476 <tr>
 477 <td WIDTH="2%">&nbsp;</td>
 478
 479 <td WIDTH="2%">&nbsp;</td>
 480
 481 <td COLSPAN="4"><font size=+0><a href="#service-gather">4.1.1 Gathering
 482 Service Information</a></font></td>
 483 </tr>
 484
 485 <tr>
 486 <td WIDTH="2%">&nbsp;</td>
 487
 488 <td WIDTH="2%">&nbsp;</td>
 489
 490 <td COLSPAN="4"><font size=+0><a href="#service-scripts">4.1.2 Creating
 491 Service Scripts</a></font></td>
 492 </tr>
 493
 494 <tr>
 495 <td WIDTH="2%">&nbsp;</td>
 496
 497 <td WIDTH="2%">&nbsp;</td>
 498
 499 <td COLSPAN="4"><font size=+0><a href="#service-storage">4.1.3 Configuring
 500 Service Disk Storage</a></font></td>
 501 </tr>
 502
 503 <tr>
 504 <td WIDTH="2%">&nbsp;</td>
 505
 506 <td WIDTH="2%">&nbsp;</td>
 507
 508 <td COLSPAN="4"><font size=+0><a href="#service-app">4.1.4 Verifying Application
 509 Software and Service Scripts</a></font></td>
 510 </tr>
 511
 512 <tr>
 513 <td WIDTH="2%">&nbsp;</td>
 514
 515 <td WIDTH="2%">&nbsp;</td>
 516
 517 <td COLSPAN="4"><font size=+0><a href="#service-dbase">4.1.5 Setting Up
 518 an Oracle Service</a></font></td>
 519 </tr>
 520
 521 <tr>
 522 <td WIDTH="2%">&nbsp;</td>
 523
 524 <td WIDTH="2%">&nbsp;</td>
 525
 526 <td COLSPAN="4"><font size=+0><a href="#service-mysql">4.1.6 Setting Up
 527 a MySQL Service</a></font></td>
 528 </tr>
 529
 530 <tr>
 531 <td WIDTH="2%">&nbsp;</td>
 532
 533 <td WIDTH="2%">&nbsp;</td>
 534
 535 <td COLSPAN="4"><font size=+0><a href="#service-db2">4.1.7 Setting Up a
 536 DB2 Service</a></font></td>
 537 </tr>
 538
 539 <tr>
 540 <td WIDTH="2%">&nbsp;</td>
 541
 542 <td WIDTH="2%">&nbsp;</td>
 543
 544 <td COLSPAN="4"><font size=+0><a href="#service-apache">4.1.8 Setting Up
 545 an Apache Service</a></font></td>
 546 </tr>
 547
 548 <tr>
 549 <td WIDTH="2%">&nbsp;</td>
 550
 551 <td COLSPAN="5"><a href="#service-status">4.2 Displaying a Service Configuration</a></td>
 552 </tr>
 553
 554 <tr>
 555 <td WIDTH="2%">&nbsp;</td>
 556
 557 <td COLSPAN="5"><a href="#service-disable">4.3 Disabling a Service</a></td>
 558 </tr>
 559
 560 <tr>
 561 <td WIDTH="2%">&nbsp;</td>
 562
 563 <td COLSPAN="5"><a href="#service-enable">4.4 Enabling a Service</a></td>
 564 </tr>
 565
 566 <tr>
 567 <td WIDTH="2%">&nbsp;</td>
 568
 569 <td COLSPAN="5"><a href="#service-modify">4.5 Modifying a Service</a></td>
 570 </tr>
 571
 572 <tr>
 573 <td WIDTH="2%">&nbsp;</td>
 574
 575 <td COLSPAN="5"><a href="#service-relocate">4.6 Relocating a Service</a></td>
 576 </tr>
 577
 578 <tr>
 579 <td WIDTH="2%">&nbsp;</td>
 580
 581 <td COLSPAN="5"><a href="#service-delete">4.7 Deleting a Service&nbsp;</a></td>
 582 </tr>
 583
 584 <tr>
 585 <td WIDTH="2%">&nbsp;</td>
 586
 587 <td COLSPAN="5"><a href="#service-error">4.8 Handling Services in an Error
 588 State</a></td>
 589 </tr>
 590
 591 <tr>
 592 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="36"><font size=+1><a href="#admin">5
 593 Cluster Administration</a></font></td>
 594 </tr>
 595
 596 <tr>
 597 <td WIDTH="2%">&nbsp;</td>
 598
 599 <td COLSPAN="5"><a href="#cluster-status">5.1 Displaying Cluster and Service
 600 Status</a></td>
 601 </tr>
 602
 603 <tr>
 604 <td WIDTH="2%">&nbsp;</td>
 605
 606 <td COLSPAN="5"><a href="#cluster-start">5.2 Starting and Stopping the
 607 Cluster Software</a></td>
 608 </tr>
 609
 610 <tr>
 611 <td WIDTH="2%">&nbsp;</td>
 612
 613 <td COLSPAN="5"><a href="#cluster-config">5.3 Modifying the Cluster Configuration</a></td>
 614 </tr>
 615
 616 <tr>
 617 <td WIDTH="2%">&nbsp;</td>
 618
 619 <td COLSPAN="5"><a href="#cluster-backup">5.4 Backing Up and Restoring
 620 the Cluster Database</a></td>
 621 </tr>
 622
 623 <tr>
 624 <td WIDTH="2%">&nbsp;</td>
 625
 626 <td COLSPAN="5"><a href="#cluster-logging">5.5 Modifying Cluster Event
 627 Logging&nbsp;</a></td>
 628 </tr>
 629
 630 <tr>
 631 <td WIDTH="2%">&nbsp;</td>
 632
 633 <td COLSPAN="5"><a href="#cluster-reinstall">5.6 Updating the Cluster Software</a></td>
 634 </tr>
 635
 636 <tr>
 637 <td WIDTH="2%">&nbsp;</td>
 638
 639 <td COLSPAN="5"><a href="#cluster-reload">5.7 Reloading the Cluster Database</a></td>
 640 </tr>
 641
 642 <tr>
 643 <td WIDTH="2%">&nbsp;</td>
 644
 645 <td COLSPAN="5"><a href="#cluster-name">5.8 Changing the Cluster Name</a></td>
 646 </tr>
 647
 648 <tr>
 649 <td WIDTH="2%">&nbsp;</td>
 650
 651 <td COLSPAN="5"><a href="#cluster-init">5.9 Reinitializing the Cluster</a></td>
 652 </tr>
 653
 654 <tr>
 655 <td WIDTH="2%">&nbsp;</td>
 656
 657 <td COLSPAN="5"><a href="#cluster-remove">5.10 Removing a Cluster Member</a></td>
 658 </tr>
 659
 660 <tr>
 661 <td WIDTH="2%">&nbsp;</td>
 662
 663 <td COLSPAN="5"><a href="#diagnose">5.11 Diagnosing and Correcting Problems
 664 in a Cluster</a></td>
 665 </tr>
 666
 667 <tr>
 668 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37"><font size=+1><a href="#supplement">A
 669 Supplementary Hardware Information</a></font></td>
 670 </tr>
 671
 672 <tr>
 673 <td WIDTH="2%" HEIGHT="20">&nbsp;</td>
 674
 675 <td COLSPAN="5" HEIGHT="20"><a href="#cyclades">A.1 Setting Up a Cyclades
 676 Terminal Server</a></td>
 677 </tr>
 678
 679 <tr>
 680 <td WIDTH="2%">&nbsp;</td>
 681
 682 <td WIDTH="2%">&nbsp;</td>
 683
 684 <td COLSPAN="4"><a href="#hardware-router">A.1.1 Setting Up the Router
 685 IP Address</a></td>
 686 </tr>
 687
 688 <tr>
 689 <td WIDTH="2%">&nbsp;</td>
 690
 691 <td WIDTH="2%">&nbsp;</td>
 692
 693 <td COLSPAN="4"><a href="#hardware-parameters">A.1.2 Setting Up the Network
 694 and Terminal Port Parameters</a></td>
 695 </tr>
 696
 697 <tr>
 698 <td WIDTH="2%">&nbsp;</td>
 699
 700 <td WIDTH="2%">&nbsp;</td>
 701
 702 <td COLSPAN="4"><a href="#console-linux">A.1.3 Configuring Linux to Send
 703 Console Messages to the Console Port</a></td>
 704 </tr>
 705
 706 <tr>
 707 <td WIDTH="2%">&nbsp;</td>
 708
 709 <td WIDTH="2%">&nbsp;</td>
 710
 711 <td COLSPAN="4"></td>
 712 </tr>
 713
 714 <tr>
 715 <td WIDTH="2%">&nbsp;</td>
 716
 717 <td COLSPAN="5"><a href="#rps-10">A.2 Setting Up an RPS-10 Power Switch</a></td>
 718 </tr>
 719
 720 <tr>
 721 <td WIDTH="2%">&nbsp;</td>
 722
 723 <td COLSPAN="5"><a href="#scsi-reqs">A.3 SCSI Bus Configuration Requirements</a></td>
 724 </tr>
 725
 726 <tr>
 727 <td WIDTH="2%">&nbsp;</td>
 728
 729 <td WIDTH="2%">&nbsp;</td>
 730
 731 <td COLSPAN="4"><a href="#scsi-term">A.3.1 SCSI Bus Termination&nbsp;</a></td>
 732 </tr>
 733
 734 <tr>
 735 <td WIDTH="2%">&nbsp;</td>
 736
 737 <td WIDTH="2%">&nbsp;</td>
 738
 739 <td COLSPAN="4"><a href="#scsi-length">A.3.2 SCSI Bus Length</a></td>
 740 </tr>
 741
 742 <tr>
 743 <td WIDTH="2%">&nbsp;</td>
 744
 745 <td WIDTH="2%">&nbsp;</td>
 746
 747 <td COLSPAN="4"><a href="#scsi-ids">A.3.3 SCSI Identification Numbers&nbsp;</a></td>
 748 </tr>
 749
 750 <tr>
 751 <td WIDTH="2%" HEIGHT="22">&nbsp;</td>
 752
 753 <td COLSPAN="5" HEIGHT="22"><a href="#hba">A.4 Host Bus Adapter Features
 754 and Configuration Requirements&nbsp;</a></td>
 755 </tr>
 756
 757 <tr>
 758 <td WIDTH="2%" HEIGHT="22">&nbsp;</td>
 759
 760 <td COLSPAN="5" HEIGHT="22"><a href="#adaptec">A.5 Adaptec Host Bus Adapter
 761 Requirement</a></td>
 762 </tr>
 763
 764 <tr>
 765 <td WIDTH="2%">&nbsp;</td>
 766
 767 <td COLSPAN="5"><a href="#vscom">A.6 VScom Multiport Serial Card Requirement</a></td>
 768 </tr>
 769
 770 <tr>
 771 <td WIDTH="2%">&nbsp;</td>
 772
 773 <td COLSPAN="5"><a href="#tulip">A.7 Tulip Network Driver Requirement</a></td>
 774 </tr>
 775
 776 <tr>
 777 <td ALIGN=LEFT VALIGN=BOTTOM COLSPAN="6" HEIGHT="37"><font size=+1><a href="#supp-software">B
 778 Supplementary Software Information&nbsp;</a></font></td>
 779 </tr>
 780
 781 <tr>
 782 <td WIDTH="2%">&nbsp;</td>
 783
 784 <td COLSPAN="5"><a href="#cluster-com">B.1 Cluster Communication Mechanisms</a></td>
 785 </tr>
 786
 787 <tr>
 788 <td WIDTH="2%">&nbsp;</td>
 789
 790 <td COLSPAN="5"><a href="#cluster-daemons">B.2 Cluster Daemons</a></td>
 791 </tr>
 792
 793 <tr>
 794 <td WIDTH="2%">&nbsp;</td>
 795
 796 <td COLSPAN="5"><a href="#admin-scenarios">B.3 Failover and Recovery Scenarios</a></td>
 797 </tr>
 798
 799 <tr>
 800 <td WIDTH="2%">&nbsp;</td>
 801
 802 <td WIDTH="2%">&nbsp;</td>
 803
 804 <td COLSPAN="4"><a href="#admin-failure">B.3.1 System Hang</a></td>
 805 </tr>
 806
 807 <tr>
 808 <td WIDTH="2%">&nbsp;</td>
 809
 810 <td WIDTH="2%">&nbsp;</td>
 811
 812 <td COLSPAN="4"><a href="#admin-panic">B.3.2 System Panic</a></td>
 813 </tr>
 814
 815 <tr>
 816 <td WIDTH="2%">&nbsp;</td>
 817
 818 <td WIDTH="2%">&nbsp;</td>
 819
 820 <td COLSPAN="4"><a href="#admin-storage">B.3.3 Inaccessible Quorum Partitions</a></td>
 821 </tr>
 822
 823 <tr>
 824 <td WIDTH="2%">&nbsp;</td>
 825
 826 <td WIDTH="2%">&nbsp;</td>
 827
 828 <td COLSPAN="4"><a href="#admin-network">B.3.4 Total Network Connection
 829 Failure</a></td>
 830 </tr>
 831
 832 <tr>
 833 <td WIDTH="2%">&nbsp;</td>
 834
 835 <td WIDTH="2%">&nbsp;</td>
 836
 837 <td COLSPAN="4"><a href="#admin-power">B.3.5 Remote Power Switch Connection
 838 Failure</a></td>
 839 </tr>
 840
 841 <tr>
 842 <td WIDTH="2%">&nbsp;</td>
 843
 844 <td WIDTH="2%">&nbsp;</td>
 845
 846 <td COLSPAN="4"><a href="#admin-quorum">B.3.6 Quorum Daemon Failure</a></td>
 847 </tr>
 848
 849 <tr>
 850 <td WIDTH="2%">&nbsp;</td>
 851
 852 <td WIDTH="2%">&nbsp;</td>
 853
 854 <td COLSPAN="4"><a href="#admin-heartbeat">B.3.7 Heartbeat Daemon Failure</a></td>
 855 </tr>
 856
 857 <tr>
 858 <td WIDTH="2%">&nbsp;</td>
 859
 860 <td WIDTH="2%">&nbsp;</td>
 861
 862 <td COLSPAN="4"><a href="#admin-powerd">B.3.8 Power Daemon Failure</a></td>
 863 </tr>
 864
 865 <tr>
 866 <td WIDTH="2%">&nbsp;</td>
 867
 868 <td WIDTH="2%">&nbsp;</td>
 869
 870 <td COLSPAN="4"><a href="#admin-serviceman">B.3.9 Service Manager Daemon
 871 Failure</a></td>
 872 </tr>
 873
 874 <tr>
 875 <td WIDTH="2%">&nbsp;</td>
 876
 877 <td COLSPAN="5" WIDTH="2%"><a href="#software-manual">B.4 Cluster Database
 878 Fields</a></td>
 879 </tr>
 880
 881 <tr>
 882 <td WIDTH="2%">&nbsp;</td>
 883
 884 <td COLSPAN="5"><a href="#app-tuning">B.5 Tuning Oracle Services</a></td>
 885 </tr>
 886
 887 <tr>
 888 <td WIDTH="2%">&nbsp;</td>
 889
 890 <td COLSPAN="5"><a href="#raw-program">B.6 Raw I/O Programming Example</a></td>
 891 </tr>
 892
 893 <tr>
 894 <td WIDTH="2%" HEIGHT="26">&nbsp;</td>
 895
 896 <td COLSPAN="5"><a href="#lvs">B.7 Using a Cluster in an LVS Environment</a></td>
 897 </tr>
 898 </table>
 899
 900 <p>
 901 <hr noshade width="80%" align="center">
 902 <p>Copyright &copy; 2000 Mission Critical Linux, Inc.
 903 <br>Copyright &copy; 2002 Red Hat, Inc.
 904 <br>&nbsp;
 905 <p>Permission is granted to copy, distribute and/or modify this document
 906 under the terms of the GNU Free Documentation License, Version 1.1 or any
 907 later version published by the Free Software Foundation. A copy of the
 908 license is included on the <a href="http://www.gnu.org/copyleft/fdl.html#SEC1" target="_blank">GNU
 909 Free Documentation License Web site</a>.
 910 <p>Linux is a trademark of Linus Torvalds
 911 <p>All product names mentioned herein are the trademarks of their respective
 912 owners.
 913 <br>&nbsp;
 914 <h1>
 915 Acknowledgments</h1>
 916 The Red Hat Cluster Manager software was originally based on the open source
 917 Kimberlite&nbsp; <a href="http://oss.missioncriticallinux.com/kimberlite">oss.missioncriticallinux.com/kimberlite</a>&nbsp;
 918 cluster project which was developed by Mission Critical Linux, Inc.
 919 <p>Subsequent to its inception based on Kimberlite, developers at Red Hat
 920 have made a large number of enhancements and modifications.&nbsp; The following
 921 is a non-comprehensive list highlighting some of these enhancements.
 922 <ul>
 923 <li>
 924 Packaging and integration into the Red Hat installation paradigm in order
 925 to simplify the end user's experience.</li>
 926
 927 <li>
 928 Addition of support for high availability NFS services.</li>
 929
 930 <li>
 931 Addition of support for high availability Samba services.</li>
 932
 933 <li>
 934 Addition of service monitoring, which will automatically restart a failed
 935 application.</li>
 936
 937 <li>
 938 Rewrite of the service manager to facilitate additional cluster-wide operations.</li>
 939
 940 <li>
 941 A set of miscellaneous bug fixes.</li>
 942 </ul>
 943 The Red Hat Cluster Manager software incorporates STONITH compliant power
 944 switch modules from the Linux-HA project&nbsp; <a href="http://www.linux-ha.org/stonith">www.linux-ha.org/stonith</a>
 945 .
 946 <br><a NAME="introduction"></a>
 947 <h1>
 948 1 Introduction</h1>
 949 The Red Hat Cluster Manager technology provides data integrity and the
 950 ability to maintain application availability in the event of a failure.
 951 Using redundant hardware, shared disk storage, power management, and robust
 952 cluster communication and application failover mechanisms, a cluster can
 953 meet the needs of the enterprise market.
 954 <p>Especially suitable for database applications, network file servers,
 955 and World Wide Web (Web) servers with dynamic content, a cluster can also
 956 be used in conjunction with other Linux availability efforts, such as Linux
 957 Virtual Server (LVS), to deploy a highly available e-commerce site that
 958 has complete data integrity and application availability, in addition to
 959 load balancing capabilities. See <a href="#lvs">Using a Cluster in an LVS
 960 Environment</a> for more information.
 961 <p><i>Editorial note: need to better integrate with the Piranha load balancing
 962 documentation (rather then referring to LVS).</i>
 963 <p>The following sections describe:
 964 <ul>
 965 <li>
 966 <a href="#overview">Cluster overview</a></li>
 967
 968 <li>
 969 <a href="#features">Cluster features</a></li>
 970
 971 <li>
 972 <a href="#steps">How to use this manual</a></li>
 973 </ul>
 974
 975 <br>&nbsp;
 976 <h2>
 977 <a NAME="overview"></a></h2>
 978
 979 <h2>
 980 1.1 Cluster Overview</h2>
 981 To set up a cluster, you connect the <b>cluster systems</b> (often referred
 982 to as <b>member systems</b>) to the cluster hardware, and configure the
 983 systems into the cluster environment. The foundation of a cluster is an
 984 advanced host membership algorithm. This algorithm ensures that the cluster
 985 maintains complete data integrity at all times by using the following methods
 986 of inter-node communication:
 987 <ul>
 988 <li>
 989 Quorum disk partitions on shared disk storage to hold system status</li>
 990
 991 <br>&nbsp;
 992 <li>
 993 Ethernet and serial connections between the cluster systems for heartbeat
 994 channels</li>
 995 </ul>
 996 To make an application and data highly available in a cluster, you configure
 997 a <b>cluster</b> <b>service</b>, which is a discrete group of service properties
 998 and resources, such as an application and shared disk storage. A service
 999 can be assigned an IP address to provide transparent client access to the
1000 service. For example, you can set up a cluster service that provides clients
1001 with access to highly-available database application data.
1002 <p>Both cluster systems can run any service and access the service data
1003 on shared disk storage. However, each service can run on only one cluster
1004 system at a time, in order to maintain data integrity. You can set up an
1005 <b>active-active
1006 configuration</b> in which both cluster systems run different services,
1007 or a <b>hot-standby configuration</b> in which a primary cluster system
1008 runs all the services, and a backup cluster system takes over only if the
1009 primary system fails.
1010 <p>The following figure shows a cluster in an active-active configuration.
1011 <p><img SRC="cluster.gif" >
1012 <p>If a hardware or software failure occurs, the cluster will automatically
1013 restart the failed system's services on the functional cluster system.
1014 This <b>service failover </b>capability ensures that no data is lost, and
1015 there is little disruption to users. When the failed system recovers, the
1016 cluster can re-balance the services across the two systems.
1017 <p>In addition, a cluster administrator can cleanly stop the services running
1018 on a cluster system, and then restart them on the other system. This <b>service
1019 relocation</b> capability enables you to maintain application and data
1020 availability when a cluster system requires maintenance.
1021 <br>&nbsp;
1022 <h2>
1023 <a NAME="features"></a></h2>
1024
1025 <h2>
1026 1.2 Cluster Features</h2>
1027 A cluster includes the following features:
1028 <ul>
1029 <li>
1030 <b>No-single-point-of-failure hardware configuration</b></li>
1031
1032 <br>&nbsp;
1033 <p>&nbsp;
1034 <br>&nbsp;
1035 <br>&nbsp;
1036 <p>You can set up a cluster that includes a dual-controller RAID array,
1037 multiple network and serial communication channels, and redundant uninterruptible
1038 power supply (UPS) systems to ensure that no single failure results in
1039 application down time or loss of data.
1040 <p>Alternately, you can set up a low-cost cluster that provides less availability
1041 than a no-single-point-of-failure cluster. For example, you can set up
1042 a cluster with JBOD ("just a bunch of disks") storage and only a single
1043 heartbeat channel.
1044 <p>Note that you cannot use host-based, adapter-based, or software RAID
1045 in a cluster, because these products usually do not properly coordinate
1046 multisystem access to shared storage.
1047 <br>&nbsp;
1048 <li>
1049 <b>Service configuration framework</b></li>
1050
1051 <br>&nbsp;
1052 <p>&nbsp;
1053 <br>&nbsp;
1054 <br>&nbsp;
1055 <p>A cluster enables you to easily configure individual servicesto make
1056 data and applications highly available. To create a service, you specify
1057 the resources used in the service and properties for the service, including
1058 the service name, application start and stop script, disk partitions, mount
1059 points, and the cluster system on which you prefer to run the service.
1060 After you add a service, the cluster enters the information into the cluster
1061 database on shared storage, where it can be accessed by both cluster systems.
1062 <p>The cluster provides an easy-to-use framework for database applications.
1063 For example, a <b>database service</b> serves highly-available data to
1064 a database application. The application running on a cluster system provides
1065 network access to database client systems, such as Web servers. If the
1066 service fails over to another cluster system, the application can still
1067 access the shared database data. A network-accessible database service
1068 is usually assigned an IP address, which is failed over along with the
1069 service to maintain transparent access for clients.
1070 <p>The cluster service framework can be easily extended to other applications,
1071 such as mail and print applications.
1072 <li>
1073 <b>Data integrity assurance</b></li>
1074
1075 <br>&nbsp;
1076 <p>&nbsp;
1077 <br>&nbsp;
1078 <br>&nbsp;
1079 <p>To ensure data integrity, only one cluster system can run a service
1080 and access service data at one time. Using power switches in the cluster
1081 configuration enable each cluster system to power-cycle the other cluster
1082 system before restarting its services during the failover process. This
1083 prevents the two systems from simultaneously accessing the same data and
1084 corrupting it. Although not required, it is recommended that you use power
1085 switches to guarantee data integrity under all failure conditions.
1086 <br>&nbsp;
1087 <li>
1088 <b>Cluster administration user interface</b></li>
1089
1090 <br>&nbsp;
1091 <p>&nbsp;
1092 <br>&nbsp;
1093 <br>&nbsp;
1094 <p>A user interface simplifies cluster administration and enables you to
1095 easily create, start, and stop services, and monitor the cluster.
1096 <br>&nbsp;
1097 <li>
1098 <b>Multiple cluster communication methods</b></li>
1099
1100 <br>&nbsp;
1101 <p>&nbsp;
1102 <br>&nbsp;
1103 <br>&nbsp;
1104 <p>To monitor the health of the other cluster system, each cluster system
1105 monitors the health of the remote power switch, if any, and issues heartbeat
1106 pings over network and serial channels to monitor the health of the other
1107 cluster system. In addition, each cluster system periodically writes a
1108 timestamp and cluster state information to two <b>quorum partitions</b>
1109 located on shared disk storage. System state information includes whether
1110 the system is an active cluster member. Service state information includes
1111 whether the service is running and which cluster system is running the
1112 service. Each cluster system checks to ensure that the other system's status
1113 is up to date.
1114 <p>To ensure correct cluster operation, if a system is unable to write
1115 to both quorum partitions at startup time, it will not be allowed to join
1116 the cluster. In addition, if a cluster system is not updating its timestamp,
1117 and if heartbeats to the system fail, the cluster system will be removed
1118 from the cluster.
1119 <p>The following figure shows how systems communicate in a cluster configuration.
1120 <i>Note:
1121 the terminal server used to access system consoles via serial ports is
1122 not a required cluster component.</i>
1123 <h4>
1124 Cluster Communication Mechanisms</h4>
1125 <img SRC="comm.gif" >
1126 <br>&nbsp;
1127 <br>&nbsp;
1128 <li>
1129 <b>Service failover capability</b></li>
1130
1131 <br>&nbsp;
1132 <p>&nbsp;
1133 <br>&nbsp;
1134 <br>&nbsp;
1135 <p>If a hardware or software failure occurs, the cluster will take the
1136 appropriate action to maintain application availability and data integrity.
1137 For example, if a cluster system completely fails, the other cluster system
1138 will restart its services. Services already running on this system are
1139 not disrupted.
1140 <p>When the failed system reboots and is able to write to the quorum partitions,
1141 it can rejoin the cluster and run services. Depending on how you configured
1142 the services, the cluster can re-balance the services across the two cluster
1143 systems.
1144 <br>&nbsp;
1145 <li>
1146 <b>Manual service relocation capability</b></li>
1147
1148 <br>&nbsp;
1149 <p>&nbsp;
1150 <br>&nbsp;
1151 <br>&nbsp;
1152 <p>In addition to automatic service failover, a cluster enables administrators
1153 to cleanly stop services on one cluster system and restart them on the
1154 other system. This enables administrators to perform planned maintenance
1155 on a cluster system, while providing application and data availability.
1156 <br>&nbsp;
1157 <li>
1158 <b>Event logging facility</b></li>
1159
1160 <br>&nbsp;
1161 <p>&nbsp;
1162 <br>&nbsp;
1163 <br>&nbsp;
1164 <p>To ensure that problems are detected and resolved before they affect
1165 service availability, the cluster daemons log messages by using the conventional
1166 Linux syslog subsystem. You can customize the severity level of the messages
1167 that are logged.</ul>
1168 <a NAME="steps"></a>
1169 <h2>
1170 1.3 How To Use This Manual</h2>
1171 <i>Editorial comment: perhaps a section with a title like this should appear
1172 earlier in the manual?</i>
1173 <p>This manual contains information about setting up the cluster hardware,
1174 and installing the Linux distribution and the cluster software. These tasks
1175 are described in <a href="#hardware">Hardware Installation and Operating
1176 System Configuration</a> and <a href="#software">Cluster Software Installation
1177 and Initialization</a>.
1178 <p>For information about setting up and managing cluster services, see
1179 <a href="#service">Service
1180 Configuration and Administration</a>. For information about managing a
1181 cluster, see <a href="#admin">Cluster Administration</a>.
1182 <p><a href="#supplement">Supplementary Hardware Information</a> contains
1183 detailed configuration information for specific hardware devices, in addition
1184 to information about shared storage configurations. You should always check
1185 for information that is applicable to your hardware.
1186 <p><a href="#supp-software">Supplementary Software Information</a> contains
1187 background information on the cluster software and other related information.
1188 <p>
1189 <hr noshade width="80%">
1190 <h3 CLASS="ChapterTitleTOC">
1191 <a NAME="hardware"></a></h3>
1192
1193 <h1 CLASS="ChapterTitleTOC">
1194 2 Hardware Installation and Operating System Configuration</h1>
1195 To set up the hardware configuration and install the Linux distribution,
1196 follow these steps:
1197 <ol>
1198 <li>
1199 <a href="#gather">Choose a cluster hardware configuration that meets the
1200 needs of your applications and users.</a></li>
1201
1202 <br>&nbsp;
1203 <li>
1204 <a href="#basic-install">Set up and connect the cluster systems and the
1205 optional console switch and network switch or hub.</a></li>
1206
1207 <br>&nbsp;
1208 <li>
1209 <a href="#install-linux">Install and configure the Linux distribution on
1210 the cluster systems.</a></li>
1211
1212 <br>&nbsp;
1213 <li>
1214 <a href="#install-cluster">Set up the remaining cluster hardware components
1215 and connect them to the cluster systems.</a></li>
1216 </ol>
1217
1218 <div CLASS="ChapterTitleTOC">After setting up the hardware configuration
1219 and installing the Linux distribution, you can install the cluster software.</div>
1220
1221 <br>&nbsp;
1222 <h2 CLASS="ChapterTitleTOC">
1223 <a NAME="gather"></a></h2>
1224
1225 <h2 CLASS="ChapterTitleTOC">
1226 2.1 Choosing a Hardware Configuration</h2>
1227 The Red Hat Cluster Manager allows you to use commodity hardware to set
1228 up a cluster configuration that will meet the performance, availability,
1229 and data integrity needs of your applications and users. Cluster hardware
1230 ranges from low-cost minimum configurations that include only the components
1231 required for cluster operation, to high-end configurations that include
1232 redundant heartbeat channels, hardware RAID, and power switches.
1233 <p>Regardless of your configuration, you should always use high-quality
1234 hardware in a cluster, because hardware malfunction is the primary cause
1235 of system down time.
1236 <p>Although all cluster configurations provide availability, some configurations
1237 protect against every single point of failure. In addition, all cluster
1238 configurations provide data integrity, but some configurations protect
1239 data under every failure condition. Therefore, you must fully understand
1240 the needs of your computing environment and also the availability and data
1241 integrity features of different hardware configurations, in order to choose
1242 the cluster hardware that will meet your requirements.
1243 <p>When choosing a cluster hardware configuration, consider the following:
1244 <ul>
1245 <li>
1246 Performance requirements of your applications and users</li>
1247
1248 <br>&nbsp;
1249 <p>&nbsp;
1250 <br>&nbsp;
1251 <br>&nbsp;
1252 <p>Choose a hardware configuration that will provide adequate memory, CPU,
1253 and I/O resources. You should also be sure that the configuration can handle
1254 any future increases in workload.
1255 <br>&nbsp;
1256 <li>
1257 Cost restrictions</li>
1258
1259 <br>&nbsp;
1260 <p>&nbsp;
1261 <br>&nbsp;
1262 <br>&nbsp;
1263 <p>The hardware configuration you choose must meet your budget requirements.
1264 For example, systems with multiple I/O ports usually cost more than low-end
1265 systems with less expansion capabilities.
1266 <br>&nbsp;
1267 <li>
1268 Availability requirements</li>
1269
1270 <br>If you have a computing environment that requires the highest availability,
1271 such as a production environment, you can set up a cluster hardware configuration
1272 that protects against all single points of failure, including disk, storage
1273 interconnect, heartbeat channel, and power failures. Environments that
1274 can tolerate an interruption in availability, such as development environments,
1275 may not require as much protection. See <a href="#hardware-heart">Configuring
1276 Heartbeat Channels</a>, <a href="#hardware-ups">Configuring UPS Systems</a>,
1277 and <a href="#hardware-storage">Configuring Shared Disk Storage</a> for
1278 more information about using redundant hardware for high availability.
1279 <br>&nbsp;
1280 <li>
1281 Data integrity under all failure conditions requirement</li>
1282
1283 <br>&nbsp;
1284 <p>&nbsp;
1285 <br>&nbsp;
1286 <br>&nbsp;
1287 <p>Using power switches in a cluster configuration guarantees that service
1288 data is protected under every failure condition. These devices enable a
1289 cluster system to power cycle the other cluster system before restarting
1290 its services during failover. Power switches protect against data corruption
1291 if an unresponsive ("hung") system becomes responsive ("unhung") after
1292 its services have failed over, and then issues I/O to a disk that is also
1293 receiving I/O from the other cluster system.
1294 <p>In addition, if a quorum daemon fails on a cluster system, the system
1295 is no longer able to monitor the quorum partitions. If you are not using
1296 power switches in the cluster, this error condition may result in services
1297 being run on more than one cluster system, which can cause data corruption.
1298 See <a href="#hardware-power">Configuring Power Switches</a> for more information
1299 about the benefits of using power switches in a cluster. It is recommended
1300 that production environments use power switches in the cluster configuration.</ul>
1301 A <b>minimum hardware configuration</b> includes only the hardware components
1302 that are required for cluster operation, as follows:
1303 <ul>
1304 <li>
1305 <b>Two servers</b> to run cluster services</li>
1306
1307 <li>
1308 <b>Ethernet connection</b> for a heartbeat channel and client network access</li>
1309
1310 <li>
1311 <b>Shared disk storage</b> for the cluster quorum partitions and service
1312 data</li>
1313 </ul>
1314 See <a href="#install-min">Example of a Minimum Cluster Configuration</a>
1315 for an example of this type of hardware configuration.
1316 <p>The minimum hardware configuration is the most cost-effective cluster
1317 configuration; however, it includes multiple points of failure. For example,
1318 if a shared disk fails, any cluster service that uses the disk will be
1319 unavailable. In addition, the minimum configuration does not include power
1320 switches, which protect against data corruption under all failure conditions.
1321 Therefore, only development environments should use a minimum cluster configuration.
1322 <p>To improve availability and protect against component failure, and to
1323 guarantee data integrity under all failure conditions, you can expand the
1324 minimum configuration. The following table shows how you can improve availability
1325 and guarantee data integrity:
1326 <br>&nbsp;
1327 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="94%" >
1328 <tr ALIGN=LEFT VALIGN=TOP>
1329 <td WIDTH="30%"><b>To protect against:</b></td>
1330
1331 <td WIDTH="70%"><b>You can use:</b></td>
1332 </tr>
1333
1334 <tr ALIGN=LEFT VALIGN=TOP>
1335 <td WIDTH="30%">Disk failure</td>
1336
1337 <td WIDTH="70%">Hardware RAID to replicate data across multiple disks.</td>
1338 </tr>
1339
1340 <tr ALIGN=LEFT VALIGN=TOP>
1341 <td WIDTH="30%">Storage interconnect failure</td>
1342
1343 <td WIDTH="70%">RAID array with multiple SCSI buses or Fibre Channel interconnects.</td>
1344 </tr>
1345
1346 <tr ALIGN=LEFT VALIGN=TOP>
1347 <td WIDTH="30%">RAID controller failure</td>
1348
1349 <td WIDTH="70%">Dual RAID controllers to provide redundant access to disk
1350 data.&nbsp;</td>
1351 </tr>
1352
1353 <tr ALIGN=LEFT VALIGN=TOP>
1354 <td WIDTH="30%">Heartbeat channel failure</td>
1355
1356 <td WIDTH="70%">Point-to-point Ethernet or serial connection between the
1357 cluster systems.</td>
1358 </tr>
1359
1360 <tr ALIGN=LEFT VALIGN=TOP>
1361 <td WIDTH="30%">Power source failure</td>
1362
1363 <td WIDTH="70%">Redundant uninterruptible power supply (UPS) systems.</td>
1364 </tr>
1365
1366 <tr ALIGN=LEFT VALIGN=TOP>
1367 <td WIDTH="30%">Data corruption under all failure conditions</td>
1368
1369 <td WIDTH="70%">Power switches</td>
1370 </tr>
1371 </table>
1372
1373 <p>A <b>no-single-point-of-failure hardware configuration</b> that guarantees
1374 data integrity under all failure conditions can include the following components:
1375 <ul>
1376 <li>
1377 <b>Two servers</b> to run cluster services</li>
1378
1379 <li>
1380 <b>Ethernet connection</b> between each system for a heartbeat channel
1381 and client network access</li>
1382
1383 <li>
1384 <b>Dual-controller RAID array</b> to replicate quorum partitions and service
1385 data</li>
1386
1387 <li>
1388 <b>Two power switches</b> to enable each cluster system to power-cycle
1389 the other system during the failover process</li>
1390
1391 <li>
1392 <b>Point-to-point Ethernet connection</b> between the cluster systems for
1393 a redundant Ethernet heartbeat channel</li>
1394
1395 <li>
1396 <b>Point-to-point serial connection</b> between the cluster systems for
1397 a serial heartbeat channel</li>
1398
1399 <li>
1400 <b>Two UPS systems</b> for a highly-available source of power</li>
1401 </ul>
1402 See <a href="#install-max">Example of a No-Single-Point-Of-Failure Configuration</a>
1403 for an example of this type of hardware configuration.
1404 <p>Cluster hardware configurations can also include other optional hardware
1405 components that are common in a computing environment. For example, you
1406 can include a <b>network switch</b> or <b>network hub</b>, which enables
1407 you to connect the cluster systems to a network, and a <b>console switch</b>,
1408 which facilitates the management of multiple systems and eliminates the
1409 need for separate monitors, mouses, and keyboards for each cluster system.
1410 <p>One type of console switch is a <b>terminal server</b>, which enables
1411 you to connect to serial consoles and manage many systems from one remote
1412 location. As a low-cost alternative, you can use a <b>KVM</b> (keyboard,
1413 video, and mouse) switch, which enables multiple systems to share one keyboard,
1414 monitor, and mouse. A KVM is suitable for configurations in which you access
1415 a graphical user interface (GUI) to perform system management tasks.
1416 <p>When choosing a cluster system, be sure that it provides the PCI slots,
1417 network slots, and serial ports that the hardware configuration requires.
1418 For example, a no-single-point-of-failure configuration requires multiple
1419 serial and Ethernet ports. Ideally, choose cluster systems that have at
1420 least two serial ports. See <a href="#hardware-system">Installing the Basic
1421 System Hardware</a> for more information.
1422 <br>&nbsp;
1423 <br>&nbsp;
1424 <p><a NAME="hardware-table"></a>
1425 <h3>
1426 2.1.1 Cluster Hardware Table</h3>
1427 Use the following table to identify the hardware components required for
1428 your cluster configuration. In some cases, the table lists specific products
1429 that have been tested in a cluster, although a cluster is expected to work
1430 with other products.
1431 <br>&nbsp;
1432 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1433 <tr>
1434 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="47">
1435 <center><b><font size=+1>Cluster System Hardware</font></b></center>
1436 </td>
1437 </tr>
1438
1439 <tr>
1440 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1441
1442 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1443
1444 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1445
1446 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1447 </tr>
1448
1449 <tr ALIGN=LEFT>
1450 <td VALIGN=TOP WIDTH="16%">Cluster system</td>
1451
1452 <td VALIGN=TOP WIDTH="11%">Two</td>
1453
1454 <td VALIGN=TOP WIDTH="61%">Red Hat Cluster Manager supports IA-32 hardware
1455 platforms. Each cluster system must provide enough PCI slots, network slots,
1456 and serial ports for the cluster hardware configuration. Because disk devices
1457 must have the same name on each cluster system, it is recommended that
1458 the systems have symmetric I/O subsystems. In addition, it is recommended
1459 that each system have a minimum of 450 Mhz CPU speed and 256 MB of memory.
1460 See <a href="#hardware-system">Installing the Basic System Hardware</a>
1461 for more information.&nbsp;</td>
1462
1463 <td VALIGN=TOP WIDTH="12%">Yes</td>
1464 </tr>
1465 </table>
1466
1467 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1468 <tr ALIGN=LEFT>
1469 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="45"><b><font size=+1>Power
1470 Switch Hardware</font></b></td>
1471 </tr>
1472
1473 <tr>
1474 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1475
1476 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1477
1478 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1479
1480 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1481 </tr>
1482
1483 <tr ALIGN=LEFT>
1484 <td VALIGN=TOP WIDTH="16%" HEIGHT="48">Serial power switches</td>
1485
1486 <td VALIGN=TOP WIDTH="11%" HEIGHT="48">Two</td>
1487
1488 <td VALIGN=TOP WIDTH="61%" HEIGHT="48">Power switches enable each cluster
1489 system to power-cycle the other cluster system.&nbsp; See <a href="#hardware-power">Configuring
1490 Power Switches</a> for information about using power switches in a cluster.&nbsp;
1491 Note: clusters are configured with either serial or network attached power
1492 switches (not both).
1493 <p>The following serial attached power switch has been fully tested:
1494 <ul>
1495 <li>
1496 RPS-10 (model M/HD in the US, and model M/EC in Europe), which is available
1497 from <a href="http://www.wti.com/rps-10.htm" target="_blank">www.wti.com/rps-10.htm</a>.
1498 Refer to&nbsp; <a href="#rps-10">RPS-10 Configuration Information</a>&nbsp;</li>
1499 </ul>
1500 Latent support is provided for the following serial attached power switch.&nbsp;
1501 This switch has not yet been fully tested:
1502 <ul>
1503 <li>
1504 APC Serial On/Off Switch (partAP9211),&nbsp; <a href="http://www.apc.com">www.apc.com</a>&nbsp;</li>
1505 </ul>
1506 </td>
1507
1508 <td VALIGN=TOP WIDTH="12%" HEIGHT="48">Strongly recommended for data integrity
1509 under all failure conditions</td>
1510 </tr>
1511
1512 <tr ALIGN=LEFT>
1513 <td VALIGN=TOP WIDTH="16%" HEIGHT="51">Null modem cable</td>
1514
1515 <td VALIGN=TOP WIDTH="11%" HEIGHT="51">Two</td>
1516
1517 <td VALIGN=TOP WIDTH="61%" HEIGHT="51">Null modem cables connect a serial
1518 port on a cluster system to aserial power switch. This serial connection
1519 enables each cluster system to power-cycle the other system. Some power
1520 switches may require different cables.&nbsp;</td>
1521
1522 <td VALIGN=TOP WIDTH="12%" HEIGHT="51">Only if using power switches</td>
1523 </tr>
1524
1525 <tr ALIGN=LEFT>
1526 <td VALIGN=TOP WIDTH="16%" HEIGHT="26">Mounting bracket</td>
1527
1528 <td VALIGN=TOP WIDTH="11%" HEIGHT="26">One</td>
1529
1530 <td VALIGN=TOP WIDTH="61%" HEIGHT="26">Some power switches support rack
1531 mount configurations and require a separate mounting bracket (e.g. RPS-10).&nbsp;</td>
1532
1533 <td VALIGN=TOP WIDTH="12%" HEIGHT="26">Only for rack mounting power switches</td>
1534 </tr>
1535
1536 <tr>
1537 <td>Network power switch</td>
1538
1539 <td>One</td>
1540
1541 <td>Network attached power switches enable each cluster member to power
1542 cycle all others.&nbsp; Refer to&nbsp; <a href="#power-setup">Configuring
1543 Power Switches</a> for information about using network attached power switches,
1544 as well as caveats associated with each.
1545 <p>The following network attached power switch has been fully tested:
1546 <ul>
1547 <li>
1548 WTI NPS-115, or NPS-230, available from&nbsp; <a href="http://www.wti.com">www.wti.com</a>
1549 . Note: the NPS power switch can properly accommodate systems with dual
1550 redundant power supplies.&nbsp; Refer to&nbsp; <a href="#power-wti-nps">WTI
1551 NPS Configuration Information.</a></li>
1552 </ul>
1553 Latent support is provided for the following network attached power switches.&nbsp;
1554 These switches have not yet been fully tested:
1555 <ul>
1556 <li>
1557 APC Master Switch (AP9211, or AP9212),&nbsp; <a href="http://www.apc.com/products/masterswitch/index.cfm">www.apc.com</a>&nbsp;</li>
1558 </ul>
1559
1560 <ul>
1561 <li>
1562 Baytech RPC-3 and RPC-5,&nbsp; <a href="http://www.baytech.net">www.baytech.net</a>&nbsp;</li>
1563 </ul>
1564 </td>
1565
1566 <td>Strongly recommended for data integrity under all failure conditions</td>
1567 </tr>
1568 </table>
1569
1570 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1571 <tr ALIGN=LEFT>
1572 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="53"><b><font size=+1>Shared
1573 Disk Storage Hardware</font></b></td>
1574 </tr>
1575
1576 <tr>
1577 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1578
1579 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1580
1581 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1582
1583 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1584 </tr>
1585
1586 <tr ALIGN=LEFT>
1587 <td VALIGN=TOP WIDTH="16%" HEIGHT="223">External disk storage enclosure</td>
1588
1589 <td VALIGN=TOP WIDTH="11%" HEIGHT="223">One</td>
1590
1591 <td VALIGN=TOP WIDTH="61%" HEIGHT="223">For production environments, it
1592 is recommended that you use single-initiator SCSI buses or single-initiator
1593 Fibre Channel interconnects to connect the cluster systems to a single
1594 or dual-controller RAID array. To use single-initiator buses or interconnects,
1595 a RAID controller must have multiple host ports and provide simultaneous
1596 access to all the logical units on the host ports. If a logical unit can
1597 fail over from one controller to the other, the process must be transparent
1598 to the operating system.&nbsp;
1599 <p>The following are recommended SCSI RAID arrays that provides simultaneous
1600 access to all the logical units on the host ports (this is not a comprehensive
1601 list; rather its limited to those RAID boxes which have been tested):
1602 <ul>
1603 <li>
1604 Winchester Systems FlashDisk RAID Disk Array, which is available from <a href="http://www.winsys.com" target="_blank">www.winsys.com</a>.&nbsp;</li>
1605
1606 <li>
1607 Dot Hill's SANnet Storage Systems, which is available from&nbsp; <a href="http://www.dothill.com">www.dothill.com</a>&nbsp;</li>
1608
1609 <li>
1610 CMD's CRD-7040 &amp; CRA-7040, CRD -7220, CRD-7240 &amp; CRA-7240, CRD-7400
1611 &amp; CRA-7400 controller based RAID arrays. Available from
1612 <a href="http://www.synetexinc.com">www.synetexinc.com</a>&nbsp;</li>
1613 </ul>
1614 Note: in order to ensure symmetry of device IDs &amp; LUNs, many RAID arrays
1615 with dual redundant controllers are required to be configured in an active/passive
1616 mode.
1617 <p>For development environments, you can use a multi-initiator SCSI bus
1618 or multi-initiator Fibre Channel interconnect to connect the cluster systems
1619 to a JBOD storage enclosure, a single-port RAID array, or a RAID controller
1620 that does not provide access to all the shared logical units from the ports
1621 on the storage enclosure.
1622 <p>You cannot use host-based, adapter-based, or software RAID products
1623 in a cluster, because these products usually do not properly coordinate
1624 multi-system access to shared storage.&nbsp;
1625 <p>See <a href="#hardware-storage">Configuring Shared Disk Storage</a>
1626 for more information.</td>
1627
1628 <td VALIGN=TOP WIDTH="12%" HEIGHT="223">Yes</td>
1629 </tr>
1630
1631 <tr ALIGN=LEFT>
1632 <td VALIGN=TOP WIDTH="16%" HEIGHT="273">Host bus adapter</td>
1633
1634 <td VALIGN=TOP WIDTH="11%" HEIGHT="273">Two</td>
1635
1636 <td VALIGN=TOP WIDTH="61%" HEIGHT="273">To connect to shared disk storage,
1637 you must install either a parallel SCSI or a Fibre Channel host bus adapter
1638 in a PCI slot in each cluster system.
1639 <p>For parallel SCSI, use a low voltage differential (LVD) host bus adapter.
1640 Adapters have either HD68 or VHDCI connectors. If you want hot plugging
1641 support, you must be able to disable the host bus adapter's onboard termination.
1642 Recommended parallel SCSI host bus adapters include the following:
1643 <ul>
1644 <li>
1645 Adaptec 2940U2W, 29160, 29160LP, 39160, and 3950U2&nbsp;</li>
1646
1647 <li>
1648 Adaptec AIC-7896 on the Intel L440GX+ motherboard&nbsp;</li>
1649
1650 <li>
1651 Qlogic QLA1080 and QLA12160</li>
1652
1653 <li>
1654 Tekram Ultra2 DC-390U2W&nbsp;</li>
1655
1656 <li>
1657 LSI Logic SYM22915&nbsp;</li>
1658 </ul>
1659 A recommended Fibre Channel host bus adapter is the Qlogic QLA2200.&nbsp;
1660 <p>For multi-initiator configurations, the Tekram Ultra2 DC-390U2W and
1661 LSI Logic SYM22915 are recommended.&nbsp; Some other adapters have issues
1662 precluding external termination for hot-plugging.
1663 <p>See <a href="#hba">Host Bus Adapter Features and Configuration Requirements</a>
1664 and <a href="#adaptec">Adaptec Host Bus Adapter Requirement</a> for device
1665 features and configuration information.</td>
1666
1667 <td VALIGN=TOP WIDTH="12%" HEIGHT="273">Yes</td>
1668 </tr>
1669
1670 <tr ALIGN=LEFT>
1671 <td VALIGN=TOP WIDTH="16%" HEIGHT="47">SCSI cable</td>
1672
1673 <td VALIGN=TOP WIDTH="11%" HEIGHT="47">Two&nbsp;</td>
1674
1675 <td VALIGN=TOP WIDTH="61%" HEIGHT="47">SCSI cables with 68 pins connect
1676 each host bus adapter to a storage enclosure port. Cables have either HD68
1677 or VHDCI connectors.&nbsp;</td>
1678
1679 <td VALIGN=TOP WIDTH="12%" HEIGHT="47">Only for parallel SCSI configurations</td>
1680 </tr>
1681
1682 <tr ALIGN=LEFT>
1683 <td VALIGN=TOP WIDTH="16%" HEIGHT="156">External SCSI LVD active terminator</td>
1684
1685 <td VALIGN=TOP WIDTH="11%" HEIGHT="156">Two</td>
1686
1687 <td VALIGN=TOP WIDTH="61%" HEIGHT="156">For hot plugging support, connect
1688 an external LVD active terminator to a host bus adapter that has disabled
1689 internal termination. This enables you to disconnect the terminator from
1690 the adapter without affecting bus operation. Terminators have either HD68
1691 or VHDCI connectors.
1692 <p>Recommended external pass-through terminators with HD68 connectors can
1693 be obtained from Technical Cable Concepts, Inc., 350 Lear Avenue, Costa
1694 Mesa, California, 92626 (714-835-1081), or <a href="http://www.techcable.com" target="_blank">www.techcable.com</a>.
1695 The part description and number is TERM SSM/F LVD/SE Ext Beige, 396868-LVD/SE.</td>
1696
1697 <td VALIGN=TOP WIDTH="12%" HEIGHT="156">Only for parallel SCSI configurations
1698 that require external termination for hot plugging</td>
1699 </tr>
1700
1701 <tr ALIGN=LEFT>
1702 <td VALIGN=TOP WIDTH="16%" HEIGHT="32">SCSI terminator</td>
1703
1704 <td VALIGN=TOP WIDTH="11%" HEIGHT="32">Two</td>
1705
1706 <td VALIGN=TOP WIDTH="61%" HEIGHT="32">For a RAID storage enclosure that
1707 uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to
1708 single-initiator SCSI buses, connect terminators to the "out" ports in
1709 order to terminate the buses.&nbsp;</td>
1710
1711 <td VALIGN=TOP WIDTH="12%" HEIGHT="32">Only for parallel SCSI configurations
1712 and only if necessary for termination</td>
1713 </tr>
1714
1715 <tr ALIGN=LEFT>
1716 <td VALIGN=TOP WIDTH="16%" HEIGHT="32">Fibre Channel hub or switch</td>
1717
1718 <td VALIGN=TOP WIDTH="11%" HEIGHT="32">One or two</td>
1719
1720 <td VALIGN=TOP WIDTH="61%" HEIGHT="32">A Fibre Channel hub or switch is
1721 required, unless you have a storage enclosure with two ports, and the host
1722 bus adapters in the cluster systems can be connected directly to different
1723 ports.</td>
1724
1725 <td VALIGN=TOP WIDTH="12%" HEIGHT="32">Only for some Fibre Channel configurations</td>
1726 </tr>
1727
1728 <tr ALIGN=LEFT>
1729 <td VALIGN=TOP WIDTH="16%" HEIGHT="32">Fibre Channel cable</td>
1730
1731 <td VALIGN=TOP WIDTH="11%" HEIGHT="32">Two to six</td>
1732
1733 <td VALIGN=TOP WIDTH="61%" HEIGHT="32">A Fibre Channel cable connects a
1734 host bus adapter to a storage enclosure port, a Fibre Channel hub, or a
1735 Fibre Channel switch. If a hub or switch is used, additional cables are
1736 needed to connect the hub or switch to the storage adapter ports.&nbsp;</td>
1737
1738 <td VALIGN=TOP WIDTH="12%" HEIGHT="32">Only for Fibre Channel configurations</td>
1739 </tr>
1740 </table>
1741
1742 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1743 <tr ALIGN=CENTER VALIGN=CENTER>
1744 <td COLSPAN="4" HEIGHT="50"><b><font size=+1>Network Hardware</font></b></td>
1745 </tr>
1746
1747 <tr>
1748 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1749
1750 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1751
1752 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1753
1754 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1755 </tr>
1756
1757 <tr ALIGN=LEFT>
1758 <td VALIGN=TOP WIDTH="16%">Network interface</td>
1759
1760 <td VALIGN=TOP WIDTH="11%">One for each network connection</td>
1761
1762 <td VALIGN=TOP WIDTH="61%">Each network connection requires a network interface
1763 installed in a cluster system.&nbsp;</td>
1764
1765 <td VALIGN=TOP WIDTH="12%">Yes</td>
1766 </tr>
1767
1768 <tr ALIGN=LEFT>
1769 <td VALIGN=TOP WIDTH="16%">Network switch or hub&nbsp;</td>
1770
1771 <td VALIGN=TOP WIDTH="11%">One</td>
1772
1773 <td VALIGN=TOP WIDTH="61%">A network switch or hub enables you to connect
1774 multiple systems to a network.</td>
1775
1776 <td VALIGN=TOP WIDTH="12%">No</td>
1777 </tr>
1778
1779 <tr ALIGN=LEFT>
1780 <td VALIGN=TOP WIDTH="16%" HEIGHT="49">Network cable</td>
1781
1782 <td VALIGN=TOP WIDTH="11%" HEIGHT="49">One for each network interface&nbsp;</td>
1783
1784 <td VALIGN=TOP WIDTH="61%" HEIGHT="49">A conventional network cable, such
1785 as a cable with an RJ45 connector, connects each network interface to a
1786 network switch or a network hub.</td>
1787
1788 <td VALIGN=TOP WIDTH="12%" HEIGHT="49">Yes</td>
1789 </tr>
1790 </table>
1791
1792 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1793 <tr ALIGN=CENTER VALIGN=CENTER>
1794 <td COLSPAN="4" HEIGHT="51"><b><font size=+1>Point-To-Point Ethernet Heartbeat
1795 Channel Hardware&nbsp;</font></b></td>
1796 </tr>
1797
1798 <tr>
1799 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1800
1801 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1802
1803 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1804
1805 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1806 </tr>
1807
1808 <tr ALIGN=LEFT>
1809 <td VALIGN=TOP WIDTH="16%" HEIGHT="51">Network interface</td>
1810
1811 <td VALIGN=TOP WIDTH="11%" HEIGHT="51">Two for each channel&nbsp;</td>
1812
1813 <td VALIGN=TOP WIDTH="61%" HEIGHT="51">Each Ethernet heartbeat channel
1814 requires a network interface installed in both cluster systems.</td>
1815
1816 <td VALIGN=TOP WIDTH="12%" HEIGHT="51">No</td>
1817 </tr>
1818
1819 <tr ALIGN=LEFT>
1820 <td VALIGN=TOP WIDTH="16%" HEIGHT="71">Network crossover cable</td>
1821
1822 <td VALIGN=TOP WIDTH="11%" HEIGHT="71">One for each channel</td>
1823
1824 <td VALIGN=TOP WIDTH="61%" HEIGHT="71">A network crossover cable connects
1825 a network interface on one cluster system to a network interface on the
1826 other cluster system, creating an Ethernet heartbeat channel.</td>
1827
1828 <td VALIGN=TOP WIDTH="12%" HEIGHT="71">Only for a redundant Ethernet heartbeat
1829 channel</td>
1830 </tr>
1831 </table>
1832
1833 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1834 <tr ALIGN=LEFT>
1835 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="54"><b><font size=+1>Point-To-Point
1836 Serial Heartbeat Channel Hardware&nbsp;</font></b></td>
1837 </tr>
1838
1839 <tr>
1840 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1841
1842 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1843
1844 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1845
1846 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1847 </tr>
1848
1849 <tr ALIGN=LEFT>
1850 <td VALIGN=TOP WIDTH="16%">Serial card</td>
1851
1852 <td VALIGN=TOP WIDTH="11%">Two for each serial channel&nbsp;</td>
1853
1854 <td VALIGN=TOP WIDTH="61%">Each serial heartbeat channel requires a serial
1855 port on both cluster systems. To expand your serial port capacity, you
1856 can use multi-port serial PCI cards. Recommended multi-port cards include
1857 the following:&nbsp;
1858 <ul>
1859 <li>
1860 Vision Systems VScom 200H PCI card, which provides you with two serial
1861 ports and is available from <a href="http://www.vscom.de" target="_blank">www.vscom.de</a>&nbsp;</li>
1862
1863 <br>&nbsp;
1864 <li>
1865 Cyclades-4YoPCI+ card, which provides you with four serial ports and is
1866 available from <a href="http://www.cyclades.com" target="_blank">www.cyclades.com</a>&nbsp;</li>
1867 </ul>
1868 Note: since configuration of serial heartbeat channels is optional, it
1869 is not required that you invest in additional hardware specifically for
1870 this purpose.&nbsp; Should future support be provided for more than 2 cluster
1871 members, serial heartbeat channel support may be deprecated.</td>
1872
1873 <td VALIGN=TOP WIDTH="12%">No</td>
1874 </tr>
1875
1876 <tr ALIGN=LEFT>
1877 <td VALIGN=TOP WIDTH="16%">Null modem cable</td>
1878
1879 <td VALIGN=TOP WIDTH="11%">One for each channel</td>
1880
1881 <td VALIGN=TOP WIDTH="61%">A null modem cable connects a serial port on
1882 one cluster system to a corresponding serial port on the other cluster
1883 system, creating a serial heartbeat channel.</td>
1884
1885 <td VALIGN=TOP WIDTH="12%">Only for serial heartbeat channel</td>
1886 </tr>
1887 </table>
1888
1889 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1890 <tr ALIGN=LEFT>
1891 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="56"><b><font size=+1>Console
1892 Switch Hardware&nbsp;</font></b></td>
1893 </tr>
1894
1895 <tr>
1896 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1897
1898 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1899
1900 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1901
1902 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1903 </tr>
1904
1905 <tr ALIGN=LEFT>
1906 <td VALIGN=TOP WIDTH="16%">Terminal server</td>
1907
1908 <td VALIGN=TOP WIDTH="11%">One</td>
1909
1910 <td VALIGN=TOP WIDTH="61%">A terminal server enables you to manage many
1911 systems from one remote location. Recommended terminal servers include
1912 the following:
1913 <ul>
1914 <li>
1915 Cyclades terminal server, which is available from <a href="http://www.cyclades.com" target="_blank">www.cyclades.com</a></li>
1916
1917 <br>&nbsp;
1918 <li>
1919 NetReach Model CMS-16, which is available from Western Telematic, Inc.
1920 at <a href="http://www.wti.com/cms.htm" target="_blank">www.wti.com/cms.htm</a></li>
1921 </ul>
1922 </td>
1923
1924 <td VALIGN=TOP WIDTH="12%">No</td>
1925 </tr>
1926
1927 <tr ALIGN=LEFT>
1928 <td VALIGN=TOP WIDTH="16%" HEIGHT="43">RJ45 to DB9 crossover cable</td>
1929
1930 <td VALIGN=TOP WIDTH="11%" HEIGHT="43">Two</td>
1931
1932 <td VALIGN=TOP WIDTH="61%" HEIGHT="43">RJ45 to DB9 crossover cables connect
1933 a serial port on each cluster system to a Cyclades terminal server. Other
1934 types of terminal servers may require different cables.&nbsp;</td>
1935
1936 <td VALIGN=TOP WIDTH="12%" HEIGHT="43">Only for terminal server</td>
1937 </tr>
1938
1939 <tr ALIGN=LEFT>
1940 <td VALIGN=TOP WIDTH="16%">Network cable</td>
1941
1942 <td VALIGN=TOP WIDTH="11%">One</td>
1943
1944 <td VALIGN=TOP WIDTH="61%">A network cable connects a terminal server to
1945 a network switch or hub.</td>
1946
1947 <td VALIGN=TOP WIDTH="12%">Only for terminal server</td>
1948 </tr>
1949
1950 <tr ALIGN=LEFT>
1951 <td VALIGN=TOP WIDTH="16%">KVM</td>
1952
1953 <td VALIGN=TOP WIDTH="11%">One</td>
1954
1955 <td VALIGN=TOP WIDTH="61%">A KVM enables multiple systems to share one
1956 keyboard, monitor, and mouse. A recommended KVM is the Cybex Switchview,
1957 which is available from <a href="http://www.cybex.com" target="_blank">www.cybex.com</a>.
1958 Cables for connecting systems to the switch depend on the type of KVM.</td>
1959
1960 <td VALIGN=TOP WIDTH="12%">No</td>
1961 </tr>
1962 </table>
1963
1964 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
1965 <tr ALIGN=LEFT>
1966 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="4" HEIGHT="55"><b><font size=+1>UPS
1967 System Hardware</font></b></td>
1968 </tr>
1969
1970 <tr>
1971 <td WIDTH="16%" HEIGHT="25"><b><font size=+0>Hardware</font></b></td>
1972
1973 <td ALIGN=LEFT VALIGN=TOP WIDTH="11%" HEIGHT="25"><b>Quantity</b></td>
1974
1975 <td ALIGN=LEFT VALIGN=TOP WIDTH="61%" HEIGHT="25"><b>Description</b></td>
1976
1977 <td ALIGN=LEFT VALIGN=TOP WIDTH="12%" HEIGHT="25"><b>Required</b></td>
1978 </tr>
1979
1980 <tr ALIGN=LEFT>
1981 <td VALIGN=TOP WIDTH="16%" HEIGHT="150">UPS system</td>
1982
1983 <td VALIGN=TOP WIDTH="11%" HEIGHT="150">One or two</td>
1984
1985 <td VALIGN=TOP WIDTH="61%" HEIGHT="150">Uninterruptible power supply (UPS)
1986 systems protect against downtime if a power outage occurs. UPS systems
1987 are highly recommended for cluster operation. Ideally, connect the power
1988 cables for the shared storage enclosure and both power switches to redundant
1989 UPS systems. In addition, a UPS system must be able to provide voltage
1990 for an adequate period of time, and should be connected to its own power
1991 circuit.&nbsp;
1992 <p>A recommended UPS system is the APC Smart-UPS 1400 Rackmount, which
1993 is available from <a href="http://www.apcc.com/products/smart-ups_rm/index.cfm">www.apc.com</a>.&nbsp;</td>
1994
1995 <td VALIGN=TOP WIDTH="12%" HEIGHT="150">Strongly recommended for availability</td>
1996 </tr>
1997 </table>
1998
1999 <br>&nbsp;
2000 <p><a NAME="install-min"></a>
2001 <h3>
2002 2.1.2 Example of a Minimum Cluster Configuration</h3>
2003 The hardware components described in the following table can be used to
2004 set up a minimum cluster configuration that uses a multi-initiator SCSI
2005 bus and supports hot plugging. This configuration does not guarantee data
2006 integrity under all failure conditions, because it does not include power
2007 switches. Note that this is a sample configuration; you may be able to
2008 set up a minimum configuration using other hardware.
2009 <br>&nbsp;
2010 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2011 <tr BGCOLOR="#FFFFFF">
2012 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="2" HEIGHT="45"><b><font size=+1>Minimum
2013 Cluster Hardware Configuration Example</font></b></td>
2014 </tr>
2015
2016 <tr ALIGN=LEFT VALIGN=TOP>
2017 <td WIDTH="22%" HEIGHT="124"><b>Two servers</b></td>
2018
2019 <td WIDTH="78%" HEIGHT="124">Each cluster system includes the following
2020 hardware:
2021 <ul>
2022 <li>
2023 Network interface for client access and an Ethernet heartbeat channel&nbsp;</li>
2024
2025 <li>
2026 One Adaptec 2940U2W SCSI adapter (termination disabled) for the shared
2027 storage connection&nbsp;</li>
2028 </ul>
2029 </td>
2030 </tr>
2031 </table>
2032
2033 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2034 <tr ALIGN=LEFT VALIGN=TOP>
2035 <td WIDTH="22%" HEIGHT="46"><b>Two network cables with RJ45 connectors</b></td>
2036
2037 <td WIDTH="78%" HEIGHT="46">Network cables connect a network interface
2038 on each cluster system to the network for client access and Ethernet heartbeats.&nbsp;</td>
2039 </tr>
2040 </table>
2041 <!--
2042 <table width="95%" border="1" cellspacing="0" cellpadding="3">
2043   <tr align="left" valign="top">
2044
2045     <td width="22%" height="28"><b>Two RPS-10 power switches</b></td>
2046
2047     <td width="78%" height="28">
2048       <p>Power switches enable each cluster system to power-cycle the other system before restarting its
2049 services.
2050         The power cable for each cluster system is connected to a power switch.
2051       </p>
2052       </td>
2053     </tr>
2054   </table>
2055
2056 <table width="95%" border="1" cellspacing="0" cellpadding="3">
2057   <tr align="left" valign="top">
2058
2059     <td width="22%" height="46"><b>Two null modem cables</b></td>
2060
2061     <td width="78%" height="46">Null modem cables connect
2062       a serial port on each cluster system to the power switch that provides power
2063       to the other cluster system. This connection enables each cluster system
2064       to power-cycle the other system.</td>
2065     </tr>
2066   </table>
2067
2068  -->
2069 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2070 <tr>
2071 <td ALIGN=LEFT VALIGN=TOP WIDTH="22%" HEIGHT="26"><b>JBOD storage enclosure</b></td>
2072
2073 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">The storage enclosure's
2074 internal termination is disabled.&nbsp;</td>
2075 </tr>
2076 </table>
2077
2078 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2079 <tr ALIGN=LEFT VALIGN=TOP>
2080 <td WIDTH="22%" HEIGHT="50"><b>Two pass-through LVD active terminators</b></td>
2081
2082 <td WIDTH="78%" HEIGHT="50">External pass-through LVD active terminators
2083 connected to each host bus adapter provide external SCSI bus termination
2084 for hot plugging support.</td>
2085 </tr>
2086 </table>
2087
2088 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2089 <tr ALIGN=LEFT VALIGN=TOP>
2090 <td WIDTH="22%" HEIGHT="31"><b>Two HD68 SCSI cables</b></td>
2091
2092 <td WIDTH="78%" HEIGHT="31">HD68 cables connect each terminator to a port
2093 on the storage enclosure, creating a multi-initiator SCSI bus.&nbsp;</td>
2094 </tr>
2095 </table>
2096
2097 <p>The following figure shows a minimum cluster hardware configuration
2098 that includes the hardware described in the previous table and a multi-initiator
2099 SCSI bus, and also supports hot plugging. A "T" enclosed by a circle indicates
2100 internal (onboard) or external SCSI bus termination. A slash through the
2101 "T" indicates that termination has been disabled.
2102 <h4 class="ChapterTitleTOC">
2103 Minimum Cluster Hardware Configuration With Hot Plugging</h4>
2104 <img SRC="lowcost.gif" >
2105 <br>&nbsp;
2106 <br>&nbsp;
2107 <p><a NAME="install-max"></a>
2108 <h3>
2109 2.1.3 Example of a No-Single-Point-Of-Failure Configuration</h3>
2110 The components described in the following table can be used to set up a
2111 no-single-point-of-failure cluster configuration that includes two single-initiator
2112 SCSI buses and power switches to guarantee data integrity under all failure
2113 conditions. Note that this is a sample configuration; you may be able to
2114 set up a no-single-point-of-failure configuration using other hardware.
2115 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2116 <tr BGCOLOR="#FFFFFF">
2117 <td ALIGN=CENTER VALIGN=CENTER COLSPAN="2" HEIGHT="45">
2118 <h3>
2119 <b>No-Single-Point-Of-Failure Configuration Example</b></h3>
2120 </td>
2121 </tr>
2122
2123 <tr ALIGN=LEFT VALIGN=TOP>
2124 <td WIDTH="22%" HEIGHT="234"><b>Two servers</b></td>
2125
2126 <td WIDTH="78%" HEIGHT="234">Each cluster system includes the following
2127 hardware:
2128 <ul>
2129 <li>
2130 Two network interfaces for:</li>
2131
2132 <ul>
2133 <li>
2134 Point-to-point Ethernet heartbeat channel</li>
2135
2136 <li>
2137 Client network access and Ethernet heartbeat connection</li>
2138 </ul>
2139
2140 <li>
2141 Three serial ports for:</li>
2142
2143 <ul>
2144 <li>
2145 Point-to-point serial heartbeat channel</li>
2146
2147 <li>
2148 Remote power switch connection</li>
2149
2150 <li>
2151 Connection to the terminal server&nbsp;</li>
2152 </ul>
2153
2154 <li>
2155 One Tekram Ultra2 DC-390U2W adapter (termination enabled) for the shared
2156 disk storage connection</li>
2157 </ul>
2158 </td>
2159 </tr>
2160 </table>
2161
2162 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2163 <tr ALIGN=LEFT VALIGN=TOP>
2164 <td WIDTH="22%"><b><font size=+0>One network switch&nbsp;</font></b></td>
2165
2166 <td WIDTH="78%"><font size=+0>A network switch enables you to connect multiple
2167 systems to a network.&nbsp;</font></td>
2168 </tr>
2169 </table>
2170
2171 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2172 <tr ALIGN=LEFT VALIGN=TOP>
2173 <td WIDTH="22%"><b>One Cyclades terminal server&nbsp;</b></td>
2174
2175 <td WIDTH="78%">A terminal server enables you to manage remote systems
2176 from a central location. (A terminal server is not required for cluster
2177 operation.)</td>
2178 </tr>
2179 </table>
2180
2181 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2182 <tr ALIGN=LEFT VALIGN=TOP>
2183 <td WIDTH="22%" HEIGHT="24"><b>Three network cables</b></td>
2184
2185 <td WIDTH="78%" HEIGHT="24">Network cables connect the terminal server
2186 and a network interface on each cluster system to the network switch.&nbsp;</td>
2187 </tr>
2188 </table>
2189
2190 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2191 <tr ALIGN=LEFT VALIGN=TOP>
2192 <td WIDTH="22%" HEIGHT="47"><b>Two RJ45 to DB9 crossover cables&nbsp;</b></td>
2193
2194 <td WIDTH="78%" HEIGHT="47">RJ45 to DB9 crossover cables connect a serial
2195 port on each cluster system to the Cyclades terminal server.</td>
2196 </tr>
2197 </table>
2198
2199 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2200 <tr ALIGN=LEFT VALIGN=TOP>
2201 <td WIDTH="22%" HEIGHT="53"><b>One network crossover cable&nbsp;</b></td>
2202
2203 <td WIDTH="78%" HEIGHT="53">A network crossover cable connects a network
2204 interface on one cluster system to a network interface on the other system,
2205 creating a point-to-point Ethernet heartbeat channel.&nbsp;</td>
2206 </tr>
2207 </table>
2208
2209 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2210 <tr ALIGN=LEFT VALIGN=TOP>
2211 <td WIDTH="22%" HEIGHT="31"><b>Two RPS-10 power switches</b></td>
2212
2213 <td WIDTH="78%" HEIGHT="31">Power switches enable each cluster system to
2214 power-cycle the other system before restarting its services. The power
2215 cable for each cluster system is connected to its own power switch.</td>
2216 </tr>
2217 </table>
2218
2219 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2220 <tr ALIGN=LEFT VALIGN=TOP>
2221 <td WIDTH="22%" HEIGHT="49"><b>Three null modem cables</b></td>
2222
2223 <td WIDTH="78%" HEIGHT="49">Null modem cables connect a serial port on
2224 each cluster system to the power switch that provides power to the other
2225 cluster system. This connection enables each cluster system to power-cycle
2226 the other system.
2227 <p>A null modem cable connects a serial port on one cluster system to a
2228 corresponding serial port on the other system, creating a point-to-point
2229 serial heartbeat channel.&nbsp;</td>
2230 </tr>
2231 </table>
2232
2233 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2234 <tr>
2235 <td ALIGN=LEFT VALIGN=TOP WIDTH="22%" HEIGHT="27"><b>FlashDisk RAID Disk
2236 Array with dual controllers&nbsp;</b></td>
2237
2238 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="27">Dual RAID controllers
2239 protect against disk and controller failure. The RAID controllers provide
2240 simultaneous access to all the logical units on the host ports.</td>
2241 </tr>
2242 </table>
2243
2244 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2245 <tr ALIGN=LEFT VALIGN=TOP>
2246 <td WIDTH="22%"><b>Two HD68 SCSI cables</b></td>
2247
2248 <td WIDTH="78%">HD68 cables connect each host bus adapter to a RAID enclosure
2249 "in" port, creating two single-initiator SCSI buses.</td>
2250 </tr>
2251 </table>
2252
2253 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2254 <tr ALIGN=LEFT VALIGN=TOP>
2255 <td WIDTH="22%"><b>Two terminators</b></td>
2256
2257 <td WIDTH="78%">Terminators connected to each "out" port on the RAID enclosure
2258 terminate both single-initiator SCSI buses.</td>
2259 </tr>
2260 </table>
2261
2262 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
2263 <tr ALIGN=LEFT VALIGN=TOP>
2264 <td WIDTH="22%" HEIGHT="47"><b>Redundant UPS Systems</b></td>
2265
2266 <td WIDTH="78%" HEIGHT="47">UPS systems provide a highly-available source
2267 of power. The power cables for the power switches and the RAID enclosure
2268 are connected to two UPS systems.</td>
2269 </tr>
2270 </table>
2271
2272 <p>The following figure shows an example of a no-single-point-of-failure
2273 hardware configuration that includes the hardware described in the previous
2274 table, two single-initiator SCSI buses, and power switches to guarantee
2275 data integrity under all error conditions.
2276 <h4 class="ChapterTitleTOC">
2277 No-Single-Point-Of-Failure Configuration Example</h4>
2278
2279 <h3 class="ChapterTitleTOC">
2280 <img SRC="hardware.gif" ></h3>
2281 &nbsp;
2282 <p>&nbsp;
2283 <br>&nbsp;
2284 <br>&nbsp;
2285 <br>&nbsp;
2286 <br>&nbsp;
2287 <p><a NAME="basic-install"></a>
2288 <h2 CLASS="ChapterTitleTOC">
2289 2.2 Steps for Setting Up the Cluster Systems</h2>
2290 After you identify the cluster hardware components, as described in <a href="#gather">Choosing
2291 a Hardware Configuration</a>, you must set up the basic cluster system
2292 hardware and connect the systems to the optional console switch and network
2293 switch or hub. Follow these steps:
2294 <ol>
2295 <li>
2296 In both cluster systems, install the required network adapters, serial
2297 cards, and host bus adapters. See <a href="#hardware-system">Installing
2298 the Basic System Hardware</a> for more information about performing this
2299 task.</li>
2300
2301 <br>&nbsp;
2302 <li>
2303 Set up the optional console switch and connect it to each cluster system.
2304 See <a href="#hardware-terminal">Setting Up a Console Switch</a> for more
2305 information about performing this task.</li>
2306
2307 <br>&nbsp;
2308 <p>&nbsp;
2309 <br>&nbsp;
2310 <br>&nbsp;
2311 <p>If you are not using a console switch, connect each system to a console
2312 terminal.
2313 <br>&nbsp;
2314 <li>
2315 Set up the optional network switch or hub and use conventional network
2316 cables to connect it to the cluster systems and the terminal server (if
2317 applicable). See <a href="#hardware-network">Setting Up a Network Switch
2318 or Hub</a> for more information about performing this task.</li>
2319
2320 <br>&nbsp;
2321 <p>&nbsp;
2322 <br>&nbsp;
2323 <br>&nbsp;
2324 <p>If you are not using a network switch or hub, use conventional network
2325 cables to connect each system and the terminal server (if applicable) to
2326 a network.</ol>
2327 After performing the previous tasks, you can install the Linux distribution,
2328 as described in <a href="#install-linux">Steps for Installing and Configuring
2329 the Linux Distribution</a>.
2330 <br>&nbsp;
2331 <br>&nbsp;
2332 <h3>
2333 <a NAME="hardware-system"></a></h3>
2334
2335 <h3>
2336 2.2.1 Installing the Basic System Hardware</h3>
2337 Cluster systems must provide the CPU processing power and memory required
2338 by your applications. It is recommended that each system have a minimum
2339 of 450 Mhz CPU speed and 256 MB of memory.
2340 <p>In addition, cluster systems must be able to accommodate the SCSI or
2341 FC adapters, network interfaces, and serial ports that your hardware configuration
2342 requires. Systems have a limited number of preinstalled serial and network
2343 ports and PCI expansion slots. The following table will help you determine
2344 how much capacity your cluster systems require:
2345 <br>&nbsp;
2346 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="100%" >
2347 <tr ALIGN=LEFT VALIGN=TOP>
2348 <td WIDTH="40%"><b>Cluster Hardware Component</b></td>
2349
2350 <td WIDTH="23%"><b>Serial Ports</b></td>
2351
2352 <td WIDTH="20%"><b>Network Slots</b></td>
2353
2354 <td WIDTH="17%"><b>PCI slots</b></td>
2355 </tr>
2356
2357 <tr ALIGN=LEFT VALIGN=TOP>
2358 <td WIDTH="40%">Remote power switch connection (optional, but strongly
2359 recommended)</td>
2360
2361 <td WIDTH="23%">One</td>
2362
2363 <td WIDTH="20%">&nbsp;</td>
2364
2365 <td WIDTH="17%">&nbsp;</td>
2366 </tr>
2367
2368 <tr ALIGN=LEFT VALIGN=TOP>
2369 <td WIDTH="40%" HEIGHT="29">SCSI bus to shared disk storage&nbsp;</td>
2370
2371 <td WIDTH="23%" HEIGHT="29">&nbsp;</td>
2372
2373 <td WIDTH="20%" HEIGHT="29">&nbsp;</td>
2374
2375 <td WIDTH="17%" HEIGHT="29">One for each bus</td>
2376 </tr>
2377
2378 <tr ALIGN=LEFT VALIGN=TOP>
2379 <td WIDTH="40%">Network connection for client access and Ethernet heartbeat</td>
2380
2381 <td WIDTH="23%">&nbsp;</td>
2382
2383 <td WIDTH="20%">One for each network connection</td>
2384
2385 <td WIDTH="17%">&nbsp;</td>
2386 </tr>
2387
2388 <tr ALIGN=LEFT VALIGN=TOP>
2389 <td WIDTH="40%">Point-to-point Ethernet heartbeat channel (optional)</td>
2390
2391 <td WIDTH="23%">&nbsp;</td>
2392
2393 <td WIDTH="20%">One for each channel</td>
2394
2395 <td WIDTH="17%">&nbsp;</td>
2396 </tr>
2397
2398 <tr ALIGN=LEFT VALIGN=TOP>
2399 <td WIDTH="40%">Point-to-point serial heartbeat channel (optional)</td>
2400
2401 <td WIDTH="23%">One for each channel</td>
2402
2403 <td WIDTH="20%">&nbsp;</td>
2404
2405 <td WIDTH="17%">&nbsp;</td>
2406 </tr>
2407
2408 <tr ALIGN=LEFT VALIGN=TOP>
2409 <td WIDTH="40%">Terminal server connection (optional)</td>
2410
2411 <td WIDTH="23%">One</td>
2412
2413 <td WIDTH="20%">&nbsp;</td>
2414
2415 <td WIDTH="17%">&nbsp;</td>
2416 </tr>
2417 </table>
2418
2419 <p>Most systems come with at least one serial port. Ideally, choose systems
2420 that have at least two serial ports. If your system has a graphics display
2421 capability, you can use the serial console port for a serial heartbeat
2422 channel or a power switch connection. To expand your serial port capacity,
2423 you can use multi-port serial PCI cards.
2424 <p>In addition, you must be sure that local system disks will not be on
2425 the same SCSI bus as the shared disks. For example, you can use two-channel
2426 SCSI adapters, such as the Adaptec 3950-series cards, and put the internal
2427 devices on one channel and the shared disks on the other channel. You can
2428 also use multiple SCSI cards.
2429 <p>See the system documentation supplied by the vendor for detailed installation
2430 information. See <a href="#supplement">Supplementary Hardware Information</a>
2431 for hardware-specific information about using host bus adapters in a cluster.
2432 <p>The following figure shows the bulkhead of a sample cluster system and
2433 the external cable connections for a typical cluster configuration.
2434 <h4 class="ChapterTitleTOC">
2435 Typical Cluster System External Cabling</h4>
2436 <img SRC="backview.gif" >
2437 <br>&nbsp;
2438 <br>&nbsp;
2439 <p><a NAME="hardware-terminal"></a>
2440 <h3>
2441 2.2.2 Setting Up a Console Switch</h3>
2442 Although a console switch is not required for cluster operation, you can
2443 use one to facilitate cluster system management and eliminate the need
2444 for separate monitors, mouses, and keyboards for each cluster system. There
2445 are several types of console switches.
2446 <p>For example, a terminal server enables you to connect to serial consoles
2447 and manage many systems from a remote location. For a low-cost alternative,
2448 you can use a KVM (keyboard, video, and mouse) switch, which enables multiple
2449 systems to share one keyboard, monitor, and mouse. A KVM switch is suitable
2450 for configurations in which you access a graphical user interface (GUI)
2451 to perform system management tasks.
2452 <p>Set up the console switch according to the documentation provided by
2453 the vendor, unless this manual provides cluster-specific installation guidelines
2454 that supersede the vendor instructions.
2455 <p>After you set up the console switch, connect it to each cluster system.
2456 The cables you use depend on the type of console switch. For example, if
2457 you have a Cyclades terminal server, use RJ45 to DB9 crossover cables to
2458 connect a serial port on each cluster system to the terminal server.
2459 <br>&nbsp;
2460 <br>&nbsp;
2461 <p><a NAME="hardware-network"></a>
2462 <h3>
2463 2.2.3 Setting Up a Network Switch or Hub</h3>
2464 Although a network switch or hub is not required for cluster operation,
2465 you may want to use one to facilitate cluster and client system network
2466 operations.
2467 <p>Set up a network switch or hub according to the documentation provided
2468 by the vendor.
2469 <p>After you set up the network switch or hub, connect it to each cluster
2470 system by using conventional network cables. If you are using a terminal
2471 server, use a network cable to connect it to the network switch or hub.
2472 <br>&nbsp;
2473 <h2 CLASS="ChapterTitleTOC">
2474 <a NAME="install-linux"></a></h2>
2475
2476 <h2 CLASS="ChapterTitleTOC">
2477 2.3 Steps for Installing and Configuring the Red Hat Linux Distribution</h2>
2478 After you set up the basic system hardware, install the Red Hat Linux distribution
2479 on both cluster systems and ensure that they recognize the connected devices.
2480 Follow these steps:
2481 <ol>
2482 <li>
2483 Install the Red Hat Linux distribution on both cluster systems. If you
2484 tailor the kernel, be sure to following the kernel requirements and guidelines
2485 described in <a href="#linux-dist">Kernel Requirements</a>.</li>
2486
2487 <br>&nbsp;
2488 <li>
2489 Reboot the cluster systems.</li>
2490
2491 <br>&nbsp;
2492 <li>
2493 If you are using a terminal server, configure Linux to send console messages
2494 to the console port.</li>
2495
2496 <br>&nbsp;
2497 <li>
2498 Edit the <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
2499 file on each cluster system and include the IP addresses used in the cluster.
2500 See <a href="#hosts">Editing the /etc/hosts File</a> for more information
2501 about performing this task.</li>
2502
2503 <br>&nbsp;
2504 <li>
2505 Decrease the alternate kernel boot timeout limit to reduce cluster system
2506 boot time. See <a href="#alt-kernel">Decreasing the Kernel Boot Timeout
2507 Limit</a> for more information about performing this task.</li>
2508
2509 <br>&nbsp;
2510 <li>
2511 Ensure that no login (or getty) programs are associated with the serial
2512 ports that are being used for the serial heartbeat channel or the remote
2513 power switch connection, if applicable. To perform this task, edit the
2514 <b><font face="Courier New, Courier, mono">/etc/inittab</font></b>
2515 file and use a number sign (#) to comment out the entries that correspond
2516 to the serial ports used for the serial channel and the remote power switch.
2517 Then, invoke the <b><font face="Courier New, Courier, mono">init q</font></b>
2518 command.</li>
2519
2520 <br>&nbsp;
2521 <li>
2522 Verify that both systems detect all the installed hardware:</li>
2523
2524 <br>&nbsp;
2525 <ul>
2526 <li>
2527 Use the <b><font face="Courier New, Courier, mono">dmesg</font></b> command
2528 to display the console startup messages. See <a href="#dmesg">Displaying
2529 Console Startup Messages</a> for more information about performing this
2530 task.</li>
2531
2532 <br>&nbsp;
2533 <li>
2534 Use the <b><font face="Courier New, Courier, mono">cat /proc/devices</font></b>
2535 command to display the devices configured in the kernel. See <a href="#devices-kernel">Displaying
2536 Devices Configured in the Kernel</a> for more information about performing
2537 this task.</li>
2538
2539 <br>&nbsp;</ul>
2540
2541 <li>
2542 Verify that the cluster systems can communicate over all the network interfaces
2543 by using the <b><font face="Courier New, Courier, mono">ping</font></b>
2544 command to send test packets from one system to the other system.</li>
2545 </ol>
2546
2547 <br>&nbsp;
2548 <p><a NAME="linux-dist"></a>
2549 <h3>
2550 2.3.1&nbsp; Kernel Requirements</h3>
2551 If you chose to manually configure your kernel, you must adhere to the
2552 following
2553 <b>kernel requirements</b>:
2554 <ul>
2555 <li>
2556 You must enable IP Aliasing support in the kernel by setting the <b><font face="Courier New, Courier, mono">CONFIG_IP_ALIAS
2557 </font></b>kernel
2558 option to <b><font face="Courier New, Courier, mono">y</font></b>. When
2559 specifying kernel options, under <b><font face="Courier New, Courier, mono">Networking
2560 Options</font></b>, select <b><font face="Courier New, Courier, mono">IP
2561 aliasing support</font></b>.</li>
2562
2563 <br>&nbsp;
2564 <li>
2565 You must enable support for the <b><font face="Courier New, Courier, mono">/proc</font></b>
2566 file system by setting the <b><font face="Courier New, Courier, mono">CONFIG_PROC_FS</font></b>
2567 kernel option to <b><font face="Courier New, Courier, mono">y</font></b>.
2568 When specifying kernel options, under <b><font face="Courier New, Courier, mono">Filesystems</font></b>,
2569 select <b><font face="Courier New, Courier, mono">/proc filesystem support</font></b>.</li>
2570
2571 <br>&nbsp;
2572 <li>
2573 You must ensure that the SCSI driver is started before the cluster software.
2574 For example, you can edit the startup scripts so that the driver is started
2575 before the <b><font face="Courier New, Courier, mono">cluster</font></b>
2576 script. You can also statically build the SCSI driver into the kernel,
2577 instead of including it as a loadable module, by modifying the <b><font face="Courier New, Courier, mono">/etc/modules.conf</font></b>
2578 file.</li>
2579 </ul>
2580 In addition, when installing the Linux distribution, it is <b>strongly
2581 recommended</b> that you:
2582 <ul>
2583 <li>
2584 Gather the IP addresses for the cluster systems and for the point-to-point
2585 Ethernet heartbeat interfaces, before installing a Linux distribution.
2586 Note that the IP addresses for the point-to-point Ethernet interfaces can
2587 be private IP addresses, such as 10<b><i>.x.x.x</i></b> addresses.</li>
2588
2589 <br>&nbsp;
2590 <li>
2591 Enable the following Linux kernel options to provide detailed information
2592 about the system configuration and events and help you diagnose problems:</li>
2593
2594 <br>&nbsp;
2595 <ul>
2596 <li>
2597 Enable SCSI logging support by setting the <b><font face="Courier New, Courier, mono">CONFIG_SCSI_LOGGING</font></b>
2598 kernel option to <b><font face="Courier New, Courier, mono">y</font></b>.
2599 When specifying kernel options, under <b><font face="Courier New, Courier, mono">SCSI
2600 Support</font></b>, select <b><font face="Courier New, Courier, mono">SCSI
2601 logging facility
2602 </font></b>.</li>
2603
2604 <br>&nbsp;
2605 <li>
2606 Enable support for <b><font face="Courier New, Courier, mono">sysctl</font></b>
2607 by setting the <b><font face="Courier New, Courier, mono">CONFIG_SYSCTL</font></b>
2608 kernel option to <b><font face="Courier New, Courier, mono">y</font></b>.
2609 When specifying kernel options, under
2610 <b><font face="Courier New, Courier, mono">General
2611 Setup</font></b>, select
2612 <b><font face="Courier New, Courier, mono">Sysctl
2613 support</font></b>.</li>
2614 </ul>
2615
2616 <li>
2617 Do not put local file systems, such as <b><font face="Courier New, Courier, mono">/</font></b>,
2618 <b><font face="Courier New, Courier, mono">/etc</font></b>,
2619 <b><font face="Courier New, Courier, mono">/tmp</font></b>,
2620 and <b><font face="Courier New, Courier, mono">/var</font></b> on shared
2621 disks or on the same SCSI bus as shared disks. This helps prevent the other
2622 cluster member from accidentally mounting these file systems, and also
2623 reserves the limited number of SCSI identification numbers on a bus for
2624 cluster disks.</li>
2625
2626 <br>&nbsp;
2627 <li>
2628 Put <b><font face="Courier New, Courier, mono">/tmp</font></b> and <b><font face="Courier New, Courier, mono">/var</font></b>
2629 on different file systems. This may improve system performance.</li>
2630
2631 <br>&nbsp;
2632 <li>
2633 When a cluster system boots, be sure that the system detects the disk devices
2634 in the same order in which they were detected during the Linux installation.
2635 If the devices are not detected in the same order, the system may not boot.</li>
2636 </ul>
2637 <a NAME="hosts"></a>
2638 <h3>
2639 2.3.2 Editing the /etc/hosts File</h3>
2640 The <b><font face="Courier New, Courier, mono">/etc/hosts</font></b> file
2641 contains the IP address-to-hostname translation table. The <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
2642 file on each cluster system must contain entries for the following:
2643 <ul>
2644 <li>
2645 IP addresses and associated host names for both cluster systems</li>
2646
2647 <br>&nbsp;
2648 <li>
2649 IP addresses and associated host names for the point-to-point Ethernet
2650 heartbeat connections (these can be private IP addresses)</li>
2651 </ul>
2652 As an alternative to the <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
2653 file, you could use a naming service such as DNS or NIS to define the host
2654 names used by a cluster. However, to limit the number of dependencies and
2655 optimize availability, it is strongly recommended that you use the <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
2656 file to define IP addresses for cluster network interfaces.
2657 <p>To following is an example of an <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
2658 file on a cluster system:
2659 <pre><font size=-1>127.0.0.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; localhost.localdomain&nbsp;&nbsp; localhost
2660 193.186.1.81&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cluster2.linux.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cluster2
2661 10.0.0.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ecluster2.linux.com&nbsp;&nbsp;&nbsp;&nbsp; ecluster2
2662 193.186.1.82&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cluster3.linux.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cluster3
2663 10.0.0.2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ecluster3.linux.com&nbsp;&nbsp;&nbsp;&nbsp; ecluster3</font></pre>
2664 The previous example shows the IP addresses and host names for two cluster
2665 systems (<b><font face="Courier New, Courier, mono">cluster2</font></b>
2666 and <b><font face="Courier New, Courier, mono">cluster3</font></b>), and
2667 the private IP addresses and host names for the Ethernet interface used
2668 for the point-to-point heartbeat connection on each cluster system (<b><font face="Courier New, Courier, mono">ecluster2</font></b>
2669 and <b><font face="Courier New, Courier, mono">ecluster3</font></b>).
2670 <p>Verify correct formatting of the local host entry in the <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
2671 file, to ensure that it does not&nbsp; include non-local systems in the
2672 entry for the local host. An example of an incorrect local host entry that
2673 includes a non-local system (<b><font face="Courier New, Courier, mono">server1</font></b>)
2674 is shown next:
2675 <pre><font size=-1>127.0.0.1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; localhost.localdomain&nbsp;&nbsp; localhost server1</font></pre>
2676 A heartbeat channel may not operate properly if the format is not correct.
2677 For example, the channel will erroneously appear to be "offline." Check
2678 your <b><font face="Courier New, Courier, mono">/etc/hosts</font></b> file
2679 and correct the file format by removing non-local systems from the local
2680 host entry, if necessary.
2681 <p>Note that each network adapter must be configured with the appropriate
2682 IP address and netmask.
2683 <p>The following is an example of a portion of the output from the <b><font face="Courier New, Courier, mono">ifconfig</font></b>
2684 command on a cluster system:
2685 <pre><font size=-1># ifconfig
2686
2687 eth0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Link encap:Ethernet&nbsp; HWaddr 00:00:BC:11:76:93&nbsp;&nbsp;
2688 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; inet addr:192.186.1.81&nbsp; Bcast:192.186.1.245&nbsp; Mask:255.255.255.0
2689 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; UP BROADCAST RUNNING MULTICAST&nbsp; MTU:1500&nbsp; Metric:1
2690 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RX packets:65508254 errors:225 dropped:0 overruns:2 frame:0
2691 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TX packets:40364135 errors:0 dropped:0 overruns:0 carrier:0
2692 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; collisions:0 txqueuelen:100
2693 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Interrupt:19 Base address:0xfce0
2694
2695 eth1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Link encap:Ethernet&nbsp; HWaddr 00:00:BC:11:76:92&nbsp;&nbsp;
2696 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; inet addr:10.0.0.1&nbsp; Bcast:10.0.0.245&nbsp; Mask:255.255.255.0
2697 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; UP BROADCAST RUNNING MULTICAST&nbsp; MTU:1500&nbsp; Metric:1
2698 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; RX packets:0 errors:0 dropped:0 overruns:0 frame:0
2699 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
2700 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; collisions:0 txqueuelen:100
2701 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Interrupt:18 Base address:0xfcc0</font></pre>
2702 The previous example shows two network interfaces on a cluster system,
2703 <b><font face="Courier New, Courier, mono">eth0
2704 </font></b>(network
2705 interface for the cluster system) and <b><font face="Courier New, Courier, mono">eth1</font></b>
2706 (network interface for the point-to-point heartbeat connection).
2707 <p><a NAME="alt-kernel"></a>
2708 <h3>
2709 2.3.3 Decreasing the Kernel Boot Timeout Limit</h3>
2710 You can reduce the boot time for a cluster system by decreasing the kernel
2711 boot timeout limit. During the Linux boot sequence, you are given the opportunity
2712 to specify an alternate kernel to boot. The default timeout limit for specifying
2713 a kernel depends on the Linux distribution. For Red Hat distributions,
2714 the limit is five seconds.
2715 <p>To modify the kernel boot timeout limit for a cluster system, edit the
2716 <b><font face="Courier New, Courier, mono">/etc/lilo.conf</font></b>
2717 file and specify the desired value (in tenths of a second) for the <b><font face="Courier New, Courier, mono">timeout</font></b>
2718 parameter. The following example sets the timeout limit to three seconds:
2719 <pre>timeout = 30</pre>
2720 To apply the changes you made to the <b><font face="Courier New, Courier, mono">/etc/lilo.conf</font></b>
2721 file, invoke the <b><font face="Courier New, Courier, mono">/sbin/lilo</font></b>
2722 command.
2723 <p>Similarly, if you are using the <b>grub</b> boot loader, the timeout
2724 parameter in <b>/boot/grub/grub.conf </b>should be modified to specify
2725 the appropriate number of seconds.&nbsp; For example:
2726 <p>timeout = 3
2727 <p><a NAME="dmesg"></a>
2728 <h3>
2729 2.3.4 Displaying Console Startup Messages</h3>
2730 Use the <b><font face="Courier New, Courier, mono">dmesg</font></b> command
2731 to display the console startup messages. See the <b><font face="Courier New, Courier, mono">dmesg.8</font></b>
2732 manpage for more information.
2733 <p>The following example of <b><font face="Courier New, Courier, mono">dmesg</font></b>
2734 command output shows that a serial expansion card was recognized during
2735 startup:
2736 <pre><font size=-1>May 22 14:02:10 storage3 kernel: Cyclades driver 2.3.2.5 2000/01/19 14:35:33
2737 May 22 14:02:10 storage3 kernel: built May 8 2000 12:40:12
2738 May 22 14:02:10 storage3 kernel: Cyclom-Y/PCI #1: 0xd0002000-0xd0005fff, IRQ9,&nbsp;
2739 &nbsp;&nbsp; 4 channels starting from port 0.</font></pre>
2740 The following example of <b><font face="Courier New, Courier, mono">dmesg</font></b>
2741 command output shows that two external SCSI buses and nine disks were detected
2742 on the system:
2743 <pre><font size=-1>May 22 14:02:10 storage3 kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4&nbsp;
2744 May 22 14:02:10 storage3 kernel:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<adaptec aic-7890/1 ultra2 scsi host adapter>&nbsp;
2745 May 22 14:02:10 storage3 kernel: scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4&nbsp;
2746 May 22 14:02:10 storage3 kernel:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<adaptec aha-294x ultra2 scsi host adapter>&nbsp;
2747 May 22 14:02:10 storage3 kernel: scsi : 2 hosts.&nbsp;
2748 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST39236LW&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0004&nbsp;
2749 May 22 14:02:11 storage3 kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0&nbsp;
2750 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2751 May 22 14:02:11 storage3 kernel: Detected scsi disk sdb at scsi1, channel 0, id 0, lun 0&nbsp;
2752 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2753 May 22 14:02:11 storage3 kernel: Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0&nbsp;
2754 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2755 May 22 14:02:11 storage3 kernel: Detected scsi disk sdd at scsi1, channel 0, id 2, lun 0&nbsp;
2756 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2757 May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0&nbsp;
2758 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2759 May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0&nbsp;
2760 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2761 May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0&nbsp;
2762 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2763 May 22 14:02:11 storage3 kernel: Detected scsi disk sdh at scsi1, channel 0, id 10, lun 0&nbsp;
2764 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: SEAGATE&nbsp;&nbsp; Model: ST318203LC&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0001&nbsp;
2765 May 22 14:02:11 storage3 kernel: Detected scsi disk sdi at scsi1, channel 0, id 11, lun 0&nbsp;
2766 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Vendor: Dell&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Model: 8 BAY U2W CU&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rev: 0205&nbsp;
2767 May 22 14:02:11 storage3 kernel:&nbsp;&nbsp; Type:&nbsp;&nbsp; Processor&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ANSI SCSI revision: 03&nbsp;
2768 May 22 14:02:11 storage3 kernel: scsi1 : channel 0 target 15 lun 1 request sense failed, performing reset.&nbsp;
2769 May 22 14:02:11 storage3 kernel: SCSI bus is being reset for host 1 channel 0.&nbsp;
2770 May 22 14:02:11 storage3 kernel: scsi : detected 9 SCSI disks total.</font></pre>
2771 The following example of <b><font face="Courier New, Courier, mono">dmesg</font></b>
2772 command output shows that a quad Ethernet card was detected on the system:
2773 <pre><font size=-1>May 22 14:02:11 storage3 kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html&nbsp;
2774 May 22 14:02:11 storage3 kernel: tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov&nbsp;
2775 May 22 14:02:11 storage3 kernel: eth0: Digital DS21140 Tulip rev 34 at 0x9800, 00:00:BC:11:76:93, IRQ 5.&nbsp;
2776 May 22 14:02:12 storage3 kernel: eth1: Digital DS21140 Tulip rev 34 at 0x9400, 00:00:BC:11:76:92, IRQ 9.&nbsp;
2777 May 22 14:02:12 storage3 kernel: eth2: Digital DS21140 Tulip rev 34 at 0x9000, 00:00:BC:11:76:91, IRQ 11.&nbsp;
2778 May 22 14:02:12 storage3 kernel: eth3: Digital DS21140 Tulip rev 34 at 0x8800, 00:00:BC:11:76:90, IRQ 10.</font></pre>
2779 &nbsp;
2780 <h2>
2781 <a NAME="devices-kernel"></a></h2>
2782
2783 <h3>
2784 2.3.5 Displaying Devices Configured in the Kernel</h3>
2785 To be sure that the installed devices, including serial and network interfaces,
2786 are configured in the kernel, use the <b><font face="Courier New, Courier, mono">cat
2787 /proc/devices</font></b> command on each cluster system. You can also use
2788 this command to determine if you have raw device support installed on the
2789 system. For example:
2790 <pre><font size=-1># <b>cat /proc/devices
2791 </b>Character devices:
2792 &nbsp; 1 mem
2793 &nbsp; 2 pty
2794 &nbsp; 3 ttyp
2795 &nbsp; 4 ttyS
2796 &nbsp; 5 cua
2797 &nbsp; 7 vcs
2798 &nbsp;10 misc
2799 &nbsp;19 ttyC
2800 &nbsp;20 cub
2801 128 ptm
2802 136 pts
2803 162 raw
2804
2805 Block devices:
2806 &nbsp; 2 fd
2807 &nbsp; 3 ide0
2808 &nbsp; 8 sd
2809 &nbsp;65 sd
2810 #</font>
2811
2812 </pre>
2813 The previous example shows:
2814 <ul>
2815 <li>
2816 Onboard serial ports (<b><font face="Courier New, Courier, mono">ttyS</font></b>)</li>
2817
2818 <li>
2819 Serial expansion card (<b><font face="Courier New, Courier, mono">ttyC</font></b>)</li>
2820
2821 <li>
2822 Raw devices (<b><font face="Courier New, Courier,mono">raw</font></b>)</li>
2823
2824 <li>
2825 SCSI devices (<b><font face="Courier New, Courier, mono">sd</font></b>)</li>
2826 </ul>
2827
2828 <h2 CLASS="ChapterTitleTOC">
2829 <a NAME="install-cluster"></a></h2>
2830
2831 <h2 CLASS="ChapterTitleTOC">
2832 2.4 Steps for Setting Up and Connecting the Cluster Hardware</h2>
2833 After installing the Red Hat Linux distribution, you can set up the cluster
2834 hardware components and then verify the installation to ensure that the
2835 cluster systems recognize all the connected devices. Note that the exact
2836 steps for setting up the hardware depend on the type of configuration.
2837 See <a href="#gather">Choosing a Hardware Configuration</a> for more information
2838 about cluster configurations.
2839 <p>To set up the cluster hardware, follow these steps:
2840 <ol>
2841 <li>
2842 Shut down the cluster systems and disconnect them from their power source.</li>
2843
2844 <br>&nbsp;
2845 <li>
2846 Set up the point-to-point Ethernet and serial heartbeat channels, if applicable.
2847 See <a href="#hardware-heart">Configuring Heartbeat Channels</a> for more
2848 information about performing this task.</li>
2849
2850 <br>&nbsp;
2851 <li>
2852 If you are using power switches, set up the devices and connect each cluster
2853 system to a power switch.&nbsp; See <a href="#hardware-power">Configuring
2854 Power Switches</a> for more information about performing this task.</li>
2855
2856 <br>&nbsp;
2857 <p>&nbsp;
2858 <br>&nbsp;
2859 <br>&nbsp;
2860 <p>In addition, it is recommended that you connect each power switch (or
2861 each cluster system's power cord if you are not using power switches) to
2862 a different UPS system. See <a href="#hardware-ups">Configuring UPS Systems</a>
2863 for information about using optional UPS systems.
2864 <br>&nbsp;
2865 <li>
2866 Set up the shared disk storage according to the vendor instructions and
2867 connect the cluster systems to the external storage enclosure. Be sure
2868 to adhere to the configuration requirements for multi-initiator or single-initiator
2869 SCSI buses. See <a href="#hardware-storage">Configuring Shared Disk Storage</a>
2870 for more information about performing this task.</li>
2871
2872 <br>&nbsp;
2873 <p>&nbsp;
2874 <br>&nbsp;
2875 <br>&nbsp;
2876 <p>In addition, it is recommended that you connect the storage enclosure
2877 to redundant UPS systems. See <a href="#hardware-ups">Configuring UPS Systems</a>
2878 for more information about using optional UPS systems.
2879 <br>&nbsp;
2880 <li>
2881 Turn on power to the hardware, and boot each cluster system. During the
2882 boot, enter the BIOS utility to modify the system setup, as follows:</li>
2883
2884 <br>&nbsp;
2885 <ul>
2886 <li>
2887 Assign a unique SCSI identification number to each host bus adapter on
2888 a SCSI bus. See <a href="#scsi-ids">SCSI Identification Numbers</a> for
2889 more information about performing this task.</li>
2890
2891 <br>&nbsp;
2892 <li>
2893 Enable or disable the onboard termination for each host bus adapter, as
2894 required by your storage configuration. See <a href="#hardware-storage">Configuring
2895 Shared Disk Storage</a> and <a href="#scsi-term">SCSI Bus Termination</a>
2896 for more information about performing this task.</li>
2897
2898 <br>&nbsp;
2899 <li>
2900 If using a multi-initiator SCSI bus configuration, disable bus resets for
2901 the host bus adapters connected to cluster shared storage.</li>
2902
2903 <br>&nbsp;
2904 <li>
2905 Enable the cluster system to automatically boot when it is powered on.</li>
2906 </ul>
2907
2908 <p><br>If you are using Adaptec host bus adapters for shared storage, see
2909 <a href="#adaptec">Adaptec
2910 Host Bus Adapter Requirement</a> for configuration information.
2911 <li>
2912 Exit from the BIOS utility, and continue to boot each system. Examine the
2913 startup messages to verify that the Linux kernel has been configured and
2914 can recognize the full set of shared disks. You can also use the <b><font face="Courier New, Courier, mono">dmesg</font></b>
2915 command to display console startup messages. See <a href="#dmesg">Displaying
2916 Console Startup Messages</a> for more information about using this command.</li>
2917
2918 <br>&nbsp;
2919 <li>
2920 Verify that the cluster systems can communicate over each point-to-point
2921 Ethernet heartbeat connection by using the <b><font face="Courier New, Courier, mono">ping</font></b>
2922 command to send packets over each network interface.</li>
2923
2924 <br>&nbsp;
2925 <li>
2926 Set up the quorum disk partitions on the shared disk storage. See <a href="#state-partitions">Configuring
2927 the Quorum Partitions</a> for more information about performing this task.</li>
2928
2929 <br>&nbsp;</ol>
2930 <a NAME="hardware-heart"></a>
2931 <h3 class="ChapterTitleTOC">
2932 2.4.1 Configuring Heartbeat Channels</h3>
2933 The cluster uses heartbeat channels as a policy input during failover of
2934 the cluster systems. For example, if a cluster system stops updating its
2935 timestamp on the quorum partitions, the other cluster system will check
2936 the status of the heartbeat channels to determine if additional time should
2937 be alloted prior to initiating a failover.
2938 <p>A cluster must include at least one heartbeat channel. You can use an
2939 Ethernet connection for both client access and a heartbeat channel. However,
2940 it is recommended that you set up additional heartbeat channels for high
2941 availability. You can set up redundant Ethernet heartbeat channels, in
2942 addition to one or more serial heartbeat channels.
2943 <p>For example, if you have an Ethernet and a serial heartbeat channel,
2944 and the cable for the Ethernet channel is disconnected, the cluster systems
2945 can still check status through the serial heartbeat channel.
2946 <p>To set up a redundant Ethernet heartbeat channel, use a network crossover
2947 cable to connect a network interface on one cluster system to a network
2948 interface on the other cluster system.
2949 <p>To set up a serial heartbeat channel, use a null modem cable to connect
2950 a serial port on one cluster system to a serial port on the other cluster
2951 system. Be sure to connect corresponding serial ports on the cluster systems;
2952 do not connect to the serial port that will be used for a remote power
2953 switch connection.&nbsp; In the future, should support be added for more
2954 than 2 cluster members, then usage of serial based heartbeat channels may
2955 be deprecated.
2956 <br>&nbsp;
2957 <p><a NAME="hardware-power"></a>
2958 <h3 class="ChapterTitleTOC">
2959 2.4.2 Configuring Power Switches</h3>
2960 Power switches enable a cluster system to power-cycle the other cluster
2961 system before restarting its services as part of the failover process.
2962 The ability to remotely disable a system ensures data integrity is maintained
2963 under any failure condition. It is recommended that production environments
2964 use power switches in the cluster configuration. Only development (test)
2965 environments should use a configuration without power switches.
2966 <p>In a cluster configuration that uses power switches, each cluster system's
2967 power cable is connected to a power switch through either a serial or network
2968 connection (depending on switch type). When failover occurs, a cluster
2969 system can use this connection to power-cycle the other cluster system
2970 before restarting its services.
2971 <p>Power switches protect against data corruption if an unresponsive ("hung")
2972 system becomes responsive ("unhung") after its services have failed over,
2973 and issues I/O to a disk that is also receiving I/O from the other cluster
2974 system. In addition, if a quorum daemon fails on a cluster system, the
2975 system is no longer able to monitor the quorum partitions. If you are not
2976 using power switches in the cluster, this error condition may result in
2977 services being run on more than one cluster system, which can cause data
2978 corruption and posibly system crashes.
2979 <p>It is strongly recommended that you use power switches in a cluster.
2980 However, if you are fully aware of the risk, you can choose to set up a
2981 cluster without power switches.
2982 <p>A cluster system may "hang" for a few seconds if it is swapping or has
2983 a high system workload. For this reason, adqeuate time is allowed prior
2984 to concluding another system has failed (typically 12 seconds).
2985 <p>A cluster system may "hang" indefinitely because of a hardware failure
2986 or a kernel error. In this case, the other cluster will notice that the
2987 "hung" system is not updating its timestamp on the quorum partitions, and
2988 is not responding to pings over the heartbeat channels.
2989 <p>If a cluster system determines that a "hung" system is down, and power
2990 switches are used in the cluster, the cluster system will power-cycle the
2991 "hung" system before restarting its services. This will cause the "hung"
2992 system to reboot in a clean state, and prevent it from issuing I/O and
2993 corrupting service data.
2994 <p>If power switches are not used in cluster, and a cluster system determines
2995 that a "hung" system is down, it will set the status of the failed system
2996 to <b><font face="Courier New, Courier, mono">DOWN</font></b> on the quorum
2997 partitions, and then restart the "hung" system's services. If the "hung"
2998 system becomes "unhung," it will notice that its status is <b><font face="Courier New, Courier, mono">DOWN</font></b>,
2999 and initiate a system reboot. This will minimize the time that both cluster
3000 systems may be able to issue I/O to the same disk, but it does not provide
3001 the data integrity guarantee of power switches. If the "hung" system never
3002 becomes responsive, you will have to manually reboot the system.
3003 <p>If you are using power switches, set up the hardware according to the
3004 vendor instructions. However, you may have to perform some cluster-specific
3005 tasks to use a power switch in the cluster. See <a href="#rps-10">Setting
3006 Up Power Switches</a> for detailed information. Note that the cluster-specific
3007 information provided in this document supersedes the vendor information.&nbsp;
3008 Also be sure to read the detailed information provided in&nbsp; <a href="#power-setup">Setting
3009 Up Power Switches</a> to take note of any caveats or functional attributes
3010 of specific power switch types.
3011 <p>When cabling up power switches, take special care to ensure that each
3012 cable is plugged into the appropriate outlet.&nbsp; This is crucial because
3013 there is no indemendent means for the software to verify correct cabling.
3014 Failure to cable correctly can lead to an incorrect system being power
3015 cycled, or for one system to inapropriately conclude that it has successfully
3016 power cycled another cluster member.
3017 <p>After you set up the power switches, perform these tasks to connect
3018 them to the cluster systems:
3019 <ol>
3020 <li>
3021 Connect the power cable for each cluster system to a power switch.</li>
3022
3023 <br>&nbsp;
3024 <li>
3025 On each cluster system, connect a serial port to the serial port on the
3026 power switch that provides power to the other cluster system. The cable
3027 you use for the serial connection depends on the type of power switch.
3028 For example, if you have an RPS-10 power switch, use null modem cables.
3029 Alternatively, if you have a network attached power switch, a network cable
3030 is used.</li>
3031
3032 <br>&nbsp;
3033 <li>
3034 Connect the power cable for each power switch to a power source. It is
3035 recommended that you connect each power switch to a different UPS system.
3036 See <a href="#hardware-ups">Configuring UPS Systems</a> for more information.</li>
3037 </ol>
3038 After you install the cluster software, but before you start the cluster,
3039 test the power switches to ensure that each cluster system can power-cycle
3040 the other system. See <a href="#pswitch">Testing the Power Switches</a>
3041 for information.
3042 <p><a NAME="hardware-ups"></a>
3043 <h3 class="ChapterTitleTOC">
3044 2.4.3 Configuring UPS Systems</h3>
3045 Uninterruptible power supply (UPS) systems provide a highly-available source
3046 of power. Ideally, a redundant solution should be used incorporating multiple
3047 UPS's (one per server).&nbsp; For maximal fault-tollerance, you could incorporate
3048 two UPS's per server as well as APC's Automatic Transfer Switches to be
3049 used to manage the power and shutdown management to the server.&nbsp; Both
3050 solutions are solely dependant on the level of availability desired.
3051 <p>It is not recommended that an existing large UPS infrastructure be the
3052 sole source of power for the cluster.&nbsp; A UPS solution dedicated to
3053 the cluster itself allows for more flexibility in terms of manageability
3054 and availability.
3055 <p>A complete UPS system must be able to provide adequate voltage and current
3056 for an adequate period of time.&nbsp; While there is no single UPS to fit
3057 every power requirement.&nbsp;&nbsp; Visit APC's UPS configurator at: <a href="http://www.apcc.com/template/size/apc">www.apcc.com/template/size/apc</a>
3058 to size the correct UPS for your server. The APC Smart-UPS product line
3059 ships with software management for Red Hat Linux, the RPM name is
3060 <b>pbeagent</b>.
3061 <p>If your disk storage subsystem has two power supplies with separate
3062 power cords, set up two UPS systems, and connect one power switch (or one
3063 cluster system's power cord if you are not using power switches) and one
3064 of the storage subsystem's power cords to each UPS system.
3065 <p>A redundant UPS system configuration is shown in the following figure.
3066 <h4 class="ChapterTitleTOC">
3067 Redudant UPS System Configuration</h4>
3068 <img SRC="two_ups.gif" >
3069 <p>You can also connect both power switches (or both cluster systems' power
3070 cords) and the disk storage subsystem to the same UPS system. This is the
3071 most cost-effective configuration, and provides some protection against
3072 power failure. However, if a power outage occurs, the single UPS system
3073 becomes a possible single point of failure. In addition, one UPS system
3074 may not be able to provide enough power to all the attached devices for
3075 an adequate amount of time.
3076 <p>A single UPS system configuration is shown in the following figure.
3077 <h4 class="ChapterTitleTOC">
3078 Single UPS System Configuration</h4>
3079 <img SRC="one_ups.gif" >
3080 <p>Many UPS system products include Linux applications that monitor the
3081 operational status of the UPS system through a serial port connection.
3082 If the battery power is low, the monitoring software will initiate a clean
3083 system shutdown. If this occurs, the cluster software will be properly
3084 stopped, because it is controlled by a System V run level script (for example,
3085 <b><font face="Courier New, Courier, mono">/etc/rc.d/init.d/cluster</font></b>).
3086 <p>See the UPS documentation supplied by the vendor for detailed installation
3087 information.
3088 <br>&nbsp;
3089 <p><a NAME="hardware-storage"></a>
3090 <h3 CLASS="ChapterTitleTOC">
3091 2.4.4 Configuring Shared Disk Storage</h3>
3092 In a cluster, shared disk storage is used to hold service data and two
3093 quorum partitions. Because this storage must be available to both cluster
3094 systems, it cannot be located on disks that depend on the availability
3095 of any one system. See the vendor documentation for detailed product and
3096 installation information.
3097 <p>There are a number of factors to consider when setting up shared disk
3098 storage in a cluster:
3099 <ul>
3100 <li>
3101 Hardware RAID versus JBOD</li>
3102
3103 <br>&nbsp;
3104 <p>&nbsp;
3105 <br>&nbsp;
3106 <br>&nbsp;
3107 <p><b>JBOD</b> ("just a bunch of disks") storage provides a low-cost storage
3108 solution, but it does not provide highly available data. If a disk in a
3109 JBOD enclosure fails, any cluster service that uses the disk will be unavailable.
3110 Therefore, only development environments should use JBOD.
3111 <p>Controller-based <b>hardware RAID</b> is more expensive than JBOD storage,
3112 but it enables you to protect against disk failure. In addition, a dual-controller
3113 RAID array protects against controller failure. It is strongly recommended
3114 that you use RAID 1 (mirroring) to make service data and the quorum partitions
3115 highly available. Optionally, you can use parity RAID for high availability.
3116 Do not use RAID 0 (striping) for the quorum partitions. It is recommended
3117 that production environments use RAID for high availability.
3118 <p>Note that you cannot use host-based, adapter-based, or software RAID
3119 in a cluster, because these products usually do not properly coordinate
3120 multisystem access to shared storage.
3121 <br>&nbsp;
3122 <li>
3123 Multi-initiator SCSI buses or Fibre Channel interconnects versus single-initiator
3124 buses or interconnects</li>
3125
3126 <br>&nbsp;
3127 <p>&nbsp;
3128 <br>&nbsp;
3129 <br>&nbsp;
3130 <p>A <b>multi-initiator</b> SCSI bus or Fibre Channel interconnect has
3131 more than one cluster system connected to it. RAID controllers with a single
3132 host port and parallel SCSI disks must use a multi-initiator bus or interconnect
3133 to connect the two host bus adapters to the storage enclosure. This configuration
3134 provides no host isolation. Therefore, only development environments should
3135 use multi-initiator buses.
3136 <p>A <b>single-initiator</b> SCSI bus or Fibre Channel interconnect has
3137 only one cluster system connected to it, and provides host isolation and
3138 better performance than a multi-initiator bus. Single-initiator buses or
3139 interconnects ensure that each cluster system is protected from disruptions
3140 due to the workload, initialization, or repair of the other cluster system.
3141 <p>If you have a RAID array that has multiple host ports and provides simultaneous
3142 access to all the shared logical units from the host ports on the storage
3143 enclosure, you can set up two single-initiator buses or interconnects to
3144 connect each cluster system to the RAID array. If a logical unit can fail
3145 over from one controller to the other, the process must be transparent
3146 to the operating system. It is recommended that production environments
3147 use single-initiator buses or interconnects.
3148 <br>&nbsp;
3149 <li>
3150 Hot plugging</li>
3151
3152 <br>&nbsp;
3153 <p>&nbsp;
3154 <br>&nbsp;
3155 <br>&nbsp;
3156 <p>In some cases, you can set up a shared storage configuration that supports
3157 <b>hot
3158 plugging</b>, which enables you to disconnect a device from a multi-initiator
3159 SCSI bus or a multi-initiator Fibre Channel interconnect without affecting
3160 bus operation. This enables you to easily perform maintenance on a device,
3161 while the services that use the bus or interconnect remain available.
3162 <p>For example, by using an external terminator to terminate a SCSI bus
3163 instead of the onboard termination for a host bus adapter, you can disconnect
3164 the SCSI cable and terminator from the adapter and the bus will still be
3165 terminated.
3166 <p>However, if you are using a Fibre Channel hub or switch, hot plugging
3167 is not necessary because the hub or switch allows the interconnect to remain
3168 operational if a device is disconnected. In addition, if you have a single-initiator
3169 SCSI bus or Fibre Channel interconnect, hot plugging is not necessary because
3170 the private bus does not need to remain operational when you disconnect
3171 a device.</ul>
3172 Note that you must carefully follow the configuration guidelines for multi
3173 and single-initiator buses and for hot plugging, in order for the cluster
3174 to operate correctly.
3175 <p>You must adhere to the following <b>shared storage requirements</b>:
3176 <ul>
3177 <li>
3178 The Linux device name for each shared storage device must be the same on
3179 each cluster system. For example, a device named <b><font face="Courier New, Courier, mono">/dev/sdc</font></b>
3180 on one cluster system must be named <b><font face="Courier New, Courier, mono">/dev/sdc</font></b>
3181 on the other cluster system. You can usually ensure that devices are named
3182 the same by using identical hardware for both cluster systems.</li>
3183
3184 <br>&nbsp;
3185 <li>
3186 A disk partition can be used by only one cluster service.</li>
3187
3188 <br>&nbsp;
3189 <li>
3190 Do not include any file systems used in a cluster service in the cluster
3191 system's local <b><font face="Courier New, Courier, mono">/etc/fstab</font></b>
3192 files, because the cluster software must control the mounting and unmounting
3193 of service file systems.</li>
3194
3195 <br>&nbsp;
3196 <li>
3197 For optimal performance, use a 4 KB block size when creating shared file
3198 systems. Note that some of the <b><font face="Courier New, Courier, mono">mkfs</font></b>
3199 file system build utilities default to a 1 KB block size, which can cause
3200 long <b><font face="Courier New, Courier, mono">fsck</font></b> times.</li>
3201 </ul>
3202 You must adhere to the following <b>parallel SCSI requirements</b>, if
3203 applicable:
3204 <ul>
3205 <li>
3206 SCSI buses must be terminated at each end, and must adhere to length and
3207 hot plugging restrictions.</li>
3208
3209 <br>&nbsp;
3210 <li>
3211 Devices (disks, host bus adapters, and RAID controllers) on a SCSI bus
3212 must have a unique SCSI identification number.</li>
3213
3214 <br>&nbsp;
3215 <li>
3216 SCSI bus resets must be disabled.</li>
3217
3218 <br>&nbsp;</ul>
3219 See
3220 <a href="#scsi-reqs">SCSI Bus Configuration Requirements</a> for more
3221 information.
3222 <p>In addition, it is <b>strongly recommended</b> that you connect the
3223 storage enclosure to redundant UPS systems for a highly-available source
3224 of power. See <a href="#hardware-ups">Configuring UPS Systems</a> for more
3225 information.
3226 <p>See <a href="#multiinit">Setting Up a Multi-Initiator SCSI Bus</a>,
3227 <a href="#singleinit">Setting
3228 Up a Single-Initiator SCSI Bus</a>, and <a href="#single-fibre">Setting
3229 Up a Single-Initiator Fibre Channel Interconnect</a> for more information
3230 about configuring shared storage.
3231 <p>After you set up the shared disk storage hardware, you can partition
3232 the disks and then either create file systems or raw devices on the partitions.
3233 You must create two raw devices for the primary and the backup quorum partitions.
3234 See <a href="#state-partitions">Configuring the Quorum Partitions</a>,
3235 <a href="#partition">Partitioning
3236 Disks</a>, <a href="#rawdevices">Creating Raw Devices</a>, and <a href="#filesystems">Creating
3237 File Systems</a> for more information.
3238 <p><a NAME="multiinit"></a>
3239 <h4>
3240 2.4.4.1 Setting Up a Multi-Initiator SCSI Bus</h4>
3241 A multi-initiator SCSI bus has more than one cluster system connected to
3242 it. If you have JBOD storage, you must use a multi-initiator SCSI bus to
3243 connect the cluster systems to the shared disks in a cluster storage enclosure.
3244 You also must use a multi-initiator bus if you have a RAID controller that
3245 does not provide access to all the shared logical units from host ports
3246 on the storage enclosure, or has only one host port.
3247 <p>A multi-initiator bus does not provide host isolation. Therefore, only
3248 development environments should use a multi-initiator bus.
3249 <p>A multi-initiator bus must adhere to the requirements described in <a href="#scsi-reqs">SCSI
3250 Bus Configuration Requirements</a>. In addition, see
3251 <a href="#hba">Host
3252 Bus Adapter Features and Configuration Requirements</a> for information
3253 about terminating host bus adapters and configuring a multi-initiator bus
3254 with and without hot plugging support.
3255 <p>In general, to set up a multi-initiator SCSI bus with a cluster system
3256 at each end of the bus, you must do the following:
3257 <ul>
3258 <li>
3259 Enable the onboard termination for each host bus adapter.</li>
3260
3261 <li>
3262 Disable the termination for the storage enclosure, if applicable.</li>
3263
3264 <li>
3265 Use the appropriate 68-pin SCSI cable to connect each host bus adapter
3266 to the storage enclosure.</li>
3267 </ul>
3268 To set host bus adapter termination, you usually must enter the system
3269 configuration utility during system boot. To set RAID controller or storage
3270 enclosure termination, see the vendor documentation.
3271 <p>The following figure shows a multi-initiator SCSI bus with no hot plugging
3272 support.
3273 <p><b>Multi-Initiator SCSI Bus Configuration</b>
3274 <p><img SRC="multidrop_1.gif" height=130 width=360>
3275 <p>If the onboard termination for a host bus adapter can be disabled, you
3276 can configure it for hot plugging. This allows you to disconnect the adapter
3277 from the multi-initiator bus, without affecting bus termination, so you
3278 can perform maintenance while the bus remains operational.
3279 <p>To configure a host bus adapter for hot plugging, you must do the following:
3280 <ul>
3281 <li>
3282 Disable the onboard termination for the host bus adapter.</li>
3283
3284 <li>
3285 Connect an external pass-through LVD active terminator to the host bus
3286 adapter connector.</li>
3287 </ul>
3288 You can then use the appropriate 68-pin SCSI cable to connect the LVD terminator
3289 to the (unterminated) storage enclosure.
3290 <p>The following figure shows a multi-initiator SCSI bus with both host
3291 bus adapters configured for hot plugging.
3292 <p><b>Multi-Initiator SCSI Bus Configuration With Hot Plugging</b>
3293 <p><img SRC="multidrop_2.gif" height=137 width=360>
3294 <p>The following figure shows the termination in a JBOD storage enclosure
3295 connected to a multi-initiator SCSI bus.
3296 <p><b>JBOD Storage Connected to a Multi-Initiator Bus</b>
3297 <p><img SRC="jbod_raid_multi.gif" height=208 width=360>
3298 <p>The following figure shows the termination in a single-controller RAID
3299 array connected to a multi-initiator SCSI bus.
3300 <p><b>Single-Controller RAID Array Connected to a Multi-Initiator Bus</b>
3301 <p><img SRC="single_raid_multi.gif" height=295 width=360>
3302 <p>The following figure shows the termination in a dual-controller RAID
3303 array connected to a multi-initiator SCSI bus.
3304 <p><b>Dual-Controller RAID Array Connected to a Multi-Initiator Bus</b>
3305 <p><img SRC="dual_raid_multi.gif" >
3306 <br>&nbsp;
3307 <p><a NAME="singleinit"></a>
3308 <h4>
3309 2.4.4.2 Setting Up a Single-Initiator SCSI Bus</h4>
3310 A single-initiator SCSI bus has only one cluster system connected to it,
3311 and provides host isolation and better performance than a multi-initiator
3312 bus. Single-initiator buses ensure that each cluster system is protected
3313 from disruptions due to the workload, initialization, or repair of the
3314 other cluster system.
3315 <p>If you have a single or dual-controller RAID array that has multiple
3316 host ports and provides simultaneous access to all the shared logical units
3317 from the host ports on the storage enclosure, you can set up two single-initiator
3318 SCSI buses to connect each cluster system to the RAID array. If a logical
3319 unit can fail over from one controller to the other, the process must be
3320 transparent to the operating system.
3321 <p>It is recommended that production environments use single-initiator
3322 SCSI buses or single-initiator Fibre Channel interconnects.
3323 <p>Note that some RAID controllers restrict a set of disks to a specific
3324 controller or port. In this case, you cannot set up single-initiator buses.
3325 In addition, hot plugging is not necessary in a single-initiator SCSI bus,
3326 because the private bus does not need to remain operational when you disconnect
3327 a host bus adapter from the bus.
3328 <p>A single-initiator bus must adhere to the requirements described in
3329 <a href="#scsi-reqs">SCSI
3330 Bus Configuration Requirements</a>. In addition, see <a href="#hba">Host
3331 Bus Adapter Features and Configuration Requirements</a> for detailed information
3332 about terminating host bus adapters and configuring a single-initiator
3333 bus.
3334 <p>To set up a single-initiator SCSI bus configuration, you must do the
3335 following:
3336 <ul>
3337 <li>
3338 Enable the onboard termination for each host bus adapter.</li>
3339
3340 <li>
3341 Enable the termination for each RAID controller.</li>
3342
3343 <li>
3344 Use the appropriate 68-pin SCSI cable to connect each host bus adapter
3345 to the storage enclosure.</li>
3346 </ul>
3347 To set host bus adapter termination, you usually must enter a BIOS utility
3348 during system boot. To set RAID controller termination, see the vendor
3349 documentation.
3350 <p>The following figure shows a configuration that uses two single-initiator
3351 SCSI buses.
3352 <p><b>Single-Initiator SCSI Bus Configuration</b>
3353 <p><img SRC="multidrop_3.gif" height=138 width=360>
3354 <p>The following figure shows the termination in a single-controller RAID
3355 array connected to two single-initiator SCSI buses.
3356 <p><b>Single-Controller RAID Array Connected to Single-Initiator SCSI Buses</b>
3357 <p><img SRC="single_raid_store.gif" height=295 width=360>
3358 <p>The following figure shows the termination in a dual-controller RAID
3359 array connected to two single-initiator SCSI buses.
3360 <p><b>Dual-Controller RAID Array Connected to Single-Initiator SCSI Buses</b>
3361 <p><img SRC="dual_raid_store.gif" >
3362 <br>&nbsp;
3363 <p><a NAME="single-fibre"></a>
3364 <h4>
3365 2.4.4.3 Setting Up a Single-Initiator Fibre Channel Interconnect</h4>
3366 A single-initiator Fibre Channel interconnect has only one cluster system
3367 connected to it, and provides host isolation and better performance than
3368 a multi-initiator bus. Single-initiator interconnects ensure that each
3369 cluster system is protected from disruptions due to the workload, initialization,
3370 or repair of the other cluster system.
3371 <p>It is recommended that production environments use single-initiator
3372 SCSI buses or single-initiator Fibre Channel interconnects.
3373 <p>If you have a RAID array that has multiple host ports, and the RAID
3374 array provides simultaneous access to all the shared logical units from
3375 the host ports on the storage enclosure, you can set up two single-initiator
3376 Fibre Channel interconnects to connect each cluster system to the RAID
3377 array. If a logical unit can fail over from one controller to the other,
3378 the process must be transparent to the operating system.
3379 <p>The following figure shows a single-controller RAID array with two host
3380 ports, and the host bus adapters connected directly to the RAID controller,
3381 without using Fibre Channel hubs or switches.
3382 <p><b>Single-Controller RAID Array Connected to Single-Initiator Fibre
3383 Channel Interconnects</b>
3384 <p><img SRC="single_fibre.gif" >
3385 <p>If you have a dual-controller RAID array with two host ports on each
3386 controller, you must use a Fibre Channel hub or switch to connect each
3387 host bus adapter to one port on both controllers, as shown in the following
3388 figure.
3389 <p><b>Dual-Controller RAID Array Connected to Single-Initiator Fibre Channel
3390 Interconnects</b>
3391 <p><img SRC="fibre_hub.gif" >
3392 <br>&nbsp;
3393 <p><a NAME="state-partitions"></a>
3394 <h4 class="ChapterTitleTOC">
3395 2.4.4.4 Configuring Quorum Partitions</h4>
3396 You must create two raw devices on shared disk storage for the primary
3397 quorum partition and the backup quorum partition. Each quorum partition
3398 must have a minimum size of 10 MB. The amount of data in a quorum partition
3399 is constant; it does not increase or decrease over time.
3400 <p>The quorum partitions are used to hold cluster state information. Periodically,
3401 each cluster system writes its status (either UP or DOWN), a timestamp,
3402 and the state of its services. In addition, the quorum partitions contain
3403 a version of the cluster database. This ensures that each cluster system
3404 has a common view of the cluster configuration.
3405 <p>To monitor cluster health, the cluster systems periodically read state
3406 information from the primary quorum partition and determine if it is up
3407 to date. If the primary partition is corrupted, the cluster systems read
3408 the information from the backup quorum partition and simultaneously repair
3409 the primary partition. Data consistency is maintained through checksums
3410 and any inconsistencies between the partitions are automatically corrected.
3411 <p>If a system is unable to write to both quorum partitions at startup
3412 time, it will not be allowed to join the cluster. In addition, if an active
3413 cluster system can no longer write to both quorum partitions, the system
3414 will remove itself from the cluster by rebooting (and may be remotely power
3415 cycled by the healthy cluster member).
3416 <p>You must adhere to the following <b>quorum partition requirements</b>:
3417 <ul>
3418 <li>
3419 Both quorum partitions must have a minimum size of 10 MB.</li>
3420
3421 <br>&nbsp;
3422 <li>
3423 Quorum partitions must be raw devices. They cannot contain file systems.</li>
3424
3425 <br>&nbsp;
3426 <li>
3427 The quorum partitions must be located on the same shared SCSI bus or the
3428 same RAID controller. This prevents a situation in which each cluster system
3429 has access to only one of the partitions.</li>
3430
3431 <br>&nbsp;
3432 <li>
3433 Quorum partitions can be used only for cluster state and configuration
3434 information.</li>
3435 </ul>
3436 The following are <b>recommended guidelines</b> for configuring the quorum
3437 partitions:
3438 <ul>&nbsp;
3439 <li>
3440 It is strongly recommended that you set up a RAID subsystem for shared
3441 storage, and use RAID 1 (mirroring) to make the logical unit that contains
3442 the quorum partitions highly available. Optionally, you can use parity
3443 RAID for high availability. Do not use RAID 0 (striping) for the quorum
3444 partitions.</li>
3445
3446 <br>&nbsp;
3447 <p>&nbsp;
3448 <br>&nbsp;
3449 <br>&nbsp;
3450 <p>Otherwise, put both quorum partitions on the same disk.
3451 <br>&nbsp;
3452 <li>
3453 Do not put the quorum partitions on a disk that contains heavily-accessed
3454 service data. If possible, locate the quorum partitions on disks that contain
3455 service data that is lightly accessed.</li>
3456 </ul>
3457 See <a href="#partition">Partitioning Disks</a> and <a href="#rawdevices">Creating
3458 Raw Devices</a><a href="#software-rawdevices"> </a>for more information
3459 about setting up the quorum partitions.
3460 <p>See <a href="#software-rawdevices">Editing the rawdevices File</a> for
3461 information about editing the <b><font face="Courier New, Courier, mono">rawdevices</font></b>
3462 file to bind the raw character devices to the block devices each time the
3463 cluster systems boot.
3464 <br>&nbsp;
3465 <br>&nbsp;
3466 <p><a NAME="partition"></a>
3467 <h4>
3468 2.4.4.5 Partitioning Disks</h4>
3469 After you set up the shared disk storage hardware, you must partition the
3470 disks so they can be used in the cluster. You can then create file systems
3471 or raw devices on the partitions. For example, you must create two raw
3472 devices for the quorum partitions, using the guidelines described in <a href="#state-partitions">Configuring
3473 Quorum Partitions.</a>
3474 <p>Invoke the interactive <b><font face="Courier New, Courier, mono">fdisk</font></b>
3475 command to modify a disk partition table and divide the disk into partitions.
3476 Use the <b><font face="Courier New, Courier, mono">p</font></b> command
3477 to display the current partition table. Use the <b><font face="Courier New, Courier, mono">n</font></b>
3478 command to create a new partition.
3479 <p>The following example shows how to use the <b><font face="Courier New, Courier, mono">fdisk</font></b>
3480 command to partition a disk:
3481 <ol>
3482 <li>
3483 Invoke the interactive <b><font face="Courier New, Courier, mono">fdisk</font></b>
3484 command, specifying an available shared disk device. At the prompt, specify
3485 the <b><font face="Courier New, Courier, mono">p</font></b> command to
3486 display the current partition table. For example:</li>
3487
3488 <pre><font size=-1># <b>fdisk /dev/sde
3489 </b>Command (m for help): <b>p</b>&nbsp;
3490
3491 Disk /dev/sde: 255 heads, 63 sectors, 2213 cylinders&nbsp;
3492 Units = cylinders of 16065 * 512 bytes&nbsp;
3493
3494 Device&nbsp;&nbsp;&nbsp; Boot&nbsp;&nbsp;&nbsp; Start&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; End&nbsp;&nbsp;&nbsp; Blocks&nbsp;&nbsp; Id&nbsp; System
3495 /dev/sde1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 262&nbsp;&nbsp; 2104483+&nbsp; 83&nbsp; Linux&nbsp;
3496 /dev/sde2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 263&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 288&nbsp;&nbsp;&nbsp; 208845&nbsp;&nbsp; 83&nbsp; Linux</font></pre>
3497
3498 <li>
3499 Determine the number of the next available partition, and specify the <b><font face="Courier New, Courier, mono">n</font></b>
3500 command to add the partition. If there are already three partitions on
3501 the disk, specify <b><font face="Courier New, Courier, mono">e</font></b>
3502 for extended partition or <b><font face="Courier New, Courier, mono">p</font></b>
3503 to create a primary partition. For example:</li>
3504
3505 <pre><font size=-1>Command (m for help): <b>n</b>&nbsp;
3506 Command action&nbsp;
3507 &nbsp;&nbsp; e&nbsp;&nbsp; extended&nbsp;
3508 &nbsp;&nbsp; p&nbsp;&nbsp; primary partition (1-4)</font></pre>
3509
3510 <li>
3511 Specify the partition number that you want. For example:</li>
3512
3513 <pre><font size=-1>Partition number (1-4): <b>3</b></font></pre>
3514
3515 <li>
3516 Press the <b><font face="Courier New, Courier, mono">Enter</font></b> key
3517 or specify the next available cylinder. For example:</li>
3518
3519 <pre><font size=-1>First cylinder (289-2213, default 289): <b>289</b></font></pre>
3520
3521 <li>
3522 Specify the partition size that is required. For example:</li>
3523
3524 <pre><font size=-1>Last cylinder or +size or +sizeM or +sizeK (289-2213, default 2213): <b>+2000M</b></font></pre>
3525 Note that large partitions will increase the cluster service failover time
3526 if a file system on the partition must be checked with <b><font face="Courier New, Courier, mono">fsck</font></b>.
3527 Quorum partitions must be at least&nbsp; 10 MB.
3528 <li>
3529 Specify the <b><font face="Courier New, Courier, mono">w</font></b> command
3530 to write the new partition table to disk. For example:</li>
3531
3532 <pre><font size=-1>Command (m for help): <b>w</b>&nbsp;
3533 The partition table has been altered!&nbsp;
3534
3535 Calling ioctl() to re-read partition table.&nbsp;
3536
3537 WARNING: If you have created or modified any DOS 6.x&nbsp;
3538 partitions, please see the fdisk manual page for additional&nbsp;
3539 information.&nbsp;
3540
3541 Syncing disks.</font></pre>
3542
3543 <li>
3544 If you added a partition while both cluster systems are powered on and
3545 connected to the shared storage, you must reboot the other cluster system
3546 in order for it to recognize the new partition.</li>
3547 </ol>
3548 After you partition a disk, you can format it for use in the cluster. You
3549 must create raw devices for the quorum partitions. You can also format
3550 the remainder of the shared disks as needed by the cluster services. For
3551 example, you can create file systems or raw devices on the partitions.
3552 <p>See <a href="#rawdevices">Creating Raw Devices</a> and <a href="#filesystems">Creating
3553 File Systems</a> for more information.
3554 <br>&nbsp;
3555 <br>&nbsp;
3556 <p><a NAME="rawdevices"></a>
3557 <h4>
3558 2.4.4.6 Creating Raw Devices</h4>
3559 After you partition the shared storage disks, as described in <a href="#partition">Partitioning
3560 Disks</a>, you can create raw devices on the partitions. File systems are
3561 block devices (for example, <b><font face="Courier New, Courier, mono">/dev/sda1</font></b>)
3562 that cache recently-used data in memory in order to improve performance.
3563 Raw devices do not utilize system memory for caching. See <a href="#filesystems">Creating
3564 File Systems</a> for more information.
3565 <p>Linux supports raw character devices that are not hard-coded against
3566 specific block devices. Instead, Linux uses a character major number (currently
3567 162) to implement a series of unbound raw devices in the <b><font face="Courier New, Courier, mono">/dev/raw</font></b>
3568 directory. Any block device can have a character raw device front-end,
3569 even if the block device is loaded later at runtime.
3570 <p>To create a raw device, edit the <b>/etc/sysconfig/rawdevices</b> file
3571 to bind a raw character device to the appropriate block device. Once bound
3572 to a block device, a raw device can be opened, read, and written.
3573 <p>You must create raw devices for the quorum partitions. In addition,
3574 some database applications require raw devices, because these applications
3575 perform their own buffer caching for performance purposes. Quorum partitions
3576 cannot contain file systems because if state data was cached in system
3577 memory, the cluster systems would not have a consistent view of the state
3578 data.
3579 <ul>&nbsp;
3580 <pre></pre>
3581 </ul>
3582 Raw character devices must be bound to block devices each time a system
3583 boots. To ensure that this occurs, edit the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices</font></b>
3584 file and specify the quorum partition bindings. If you are using a raw
3585 device in a cluster service, you can also use this file to bind the devices
3586 at boot time. See <a href="#software-rawdevices">Editing the rawdevices
3587 File</a> for more information.
3588 <p>Query all the raw devices by using the<b><font face="Courier New, Courier, mono">
3589 raw -aq</font></b> command:
3590 <p># <b>raw -aq</b>
3591 <br>/dev/raw/raw1&nbsp;&nbsp; bound to major 8, minor 17
3592 <br>/dev/raw/raw2&nbsp;&nbsp; bound to major 8, minor 18
3593 <p>Note that, for raw devices, there is no cache coherency between the
3594 raw device and the block device. In addition, requests must be 512-byte
3595 aligned both in memory and on disk. For example, the standard <b><font face="Courier New, Courier, mono">dd</font></b>
3596 command cannot be used with raw devices because the memory buffer that
3597 the command passes to the write system call is not aligned on a 512-byte
3598 boundary.
3599 <p><a NAME="filesystems"></a>
3600 <h4>
3601 2.4.4.7 Creating File Systems</h4>
3602 Use the <b><font face="Courier New, Courier, mono">mkfs</font></b> command
3603 to create an <b><font face="Courier New, Courier, mono">ext2</font></b>
3604 file system on a partition. Specify the drive letter and the partition
3605 number. For example:
3606 <pre># <b>mkfs /dev/sde3</b></pre>
3607 For optimal performance, use a 4 KB block size when creating shared file
3608 systems. Note that some of the <b><font face="Courier New, Courier, mono">mkfs</font></b>
3609 file system build utilities default to a 1 KB block size, which can cause
3610 long <b><font face="Courier New, Courier, mono">fsck</font></b> times.
3611 <p>Similarly, to create an <b>ext3</b> filesystem, the following command
3612 can be used:
3613 <p># <b>mkfs -t ext2 -j /dev/sde3</b>
3614 <br>&nbsp;
3615 <p>
3616 <hr noshade width="80%">
3617 <h1>
3618 <a NAME="software"></a></h1>
3619
3620 <h1>
3621 3 Cluster Software Installation and Configuration</h1>
3622 After you install and configure the cluster hardware, you must install
3623 the cluster software and initialize the cluster systems. The following
3624 sections describe:
3625 <ul>
3626 <li>
3627 <a href="#software-steps">Steps for installing and initializing the cluster
3628 software</a></li>
3629
3630 <li>
3631 <a href="#software-check">Checking the cluster configuration</a></li>
3632
3633 <li>
3634 <a href="#software-logging">Configuring syslog event logging</a></li>
3635
3636 <li>
3637 <a href="#software-ui">Using the cluadmin utility</a></li>
3638
3639 <li>
3640 <a href="#software-gui">Configuring and using the graphical user interface</a></li>
3641 </ul>
3642
3643 <br>&nbsp;
3644 <h2>
3645 <a NAME="software-steps"></a></h2>
3646
3647 <h2>
3648 3.1 Steps for Installing and Initializing the Cluster Software</h2>
3649 <i>Editorial comment: this section may be unnecessary as the cluster rpm
3650 is automatically installed.</i>
3651 <p>Before installing Red Hat Cluster Manager, be sure that you have installed
3652 all the required software and kernel patches, as described in <a href="#linux-dist">Linux
3653 Distribution and Kernel Requirements</a>.
3654 <p>If you are updating the cluster software and want to preserve the existing
3655 cluster configuration database, you must back up the cluster database and
3656 stop the cluster software before you reinstall. See <a href="#cluster-reinstall">Updating
3657 the Cluster Software</a> for more information.
3658 <br>&nbsp;
3659 <li>
3660 To install Red Hat Cluster Manager, invoke the <b><font face="Courier New, Courier, mono">rpm
3661 --install clumanager-1.0.4-1.rpm</font></b> command.&nbsp; (The specific
3662 release numbers will change.)</li>
3663
3664 <br>&nbsp;
3665 <p>&nbsp;
3666 <br>&nbsp;
3667 <br>&nbsp;
3668 <p>To initialize and start the cluster software, perform the following
3669 tasks:
3670 <ol>
3671 <li>
3672 On both cluster systems, add a group named <b><font face="Courier New, Courier, mono">cluster</font></b>
3673 to the <b><font face="Courier New, Courier, mono">/etc/group</font></b>
3674 file.</li>
3675
3676 <li>
3677 Edit the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices
3678 </font></b>file
3679 on both cluster systems and specify the raw device special files and character
3680 devices for the primary and backup quorum partitions. You also must set
3681 the mode for the raw devices so that all users have read permission. See
3682 <a href="#state-partitions">Configuring
3683 the Quorum Partitions</a> and <a href="#software-rawdevices">Editing the
3684 rawdevices File</a> for more information.</li>
3685
3686 <br>&nbsp;
3687 <li>
3688 Reboot the systems. The first time that you reboot, the cluster will log
3689 messages stating that the quorum daemon is unable to determine which device
3690 special file to use as a quorum partition. This message does not indicate
3691 a problem and can be ignored. It occurs because you have not yet run the
3692 <b><font face="Courier New, Courier, mono">cluconfig</font></b>
3693 utility.</li>
3694
3695 <br>&nbsp;
3696 <li>
3697 Run the <b><font face="Courier New, Courier, mono">/sbin/cluconfig</font></b>
3698 utility on one cluster system. If you are updating the cluster software,
3699 the utility will prompt you whether to use the existing cluster database.
3700 If you do not choose to use the database, the utility will remove the cluster
3701 database.</li>
3702
3703 <br>&nbsp;
3704 <p>&nbsp;
3705 <br>&nbsp;
3706 <br>&nbsp;
3707 <p>If you are not using an existing cluster database, the utility will
3708 prompt you for the following cluster-specific information, which will be
3709 entered into the <b><font face="Courier New, Courier, mono">member</font></b>
3710 fields in the cluster database, a copy of which is located in the <b><font face="Courier New, Courier, mono">/etc/cluster.conf</font></b>
3711 file:
3712 <br>&nbsp;
3713 <ul>
3714 <li>
3715 Raw device special files for the primary and backup quorum partitions,
3716 as specified in the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices</font></b>
3717 file (for example,<b><font face="Courier New, Courier, mono"> /dev/raw/raw1</font></b>
3718 and <b><font face="Courier New, Courier, mono">/dev/raw/raw2</font></b>)</li>
3719
3720 <br>&nbsp;
3721 <li>
3722 Cluster system host names that are returned by the <b><font face="Courier New, Courier, mono">hostname</font></b>command</li>
3723
3724 <br>&nbsp;
3725 <li>
3726 Number of heartbeat connections (channels), both Ethernet and serial</li>
3727
3728 <br>&nbsp;
3729 <li>
3730 Device special file for each heartbeat serial line connection (for example,
3731 <b><font face="Courier New, Courier, mono">/dev/ttyS1</font></b>)</li>
3732
3733 <br>&nbsp;
3734 <li>
3735 IP host name associated with each heartbeat Ethernet interface</li>
3736
3737 <br>&nbsp;
3738 <li>
3739 Device special files for the serial ports to which the power switches are
3740 connected, if any (for example, <b><font face="Courier New, Courier, mono">/dev/ttyS0</font></b>)</li>
3741
3742 <br>&nbsp;
3743 <li>
3744 Power switch type (for example, <b><font face="Courier New, Courier, mono">RPS10</font></b>
3745 or <b><font face="Courier New, Courier, mono">None</font></b> if you are
3746 not using power switches)</li>
3747 </ul>
3748 See <a href="#software-config">Example of the cluconfig Utility</a> for
3749 an example of running the utility.
3750 <li>
3751 After you complete the cluster initialization on one cluster system, perform
3752 the following tasks on the other cluster system:</li>
3753
3754 <br>&nbsp;
3755 <ol type="a">
3756 <li>
3757 Run the <b><font face="Courier New, Courier, mono">/sbin/cluconfig --init=<i>raw_file</i></font></b>
3758 command, where <b><i><font face="Courier New, Courier, mono">raw_file</font></i></b>
3759 specifies the primary quorum partition. The script will use the information
3760 that you specified for the first cluster system as defaults. For example:</li>
3761
3762 <pre># <b>cluconfig --init=/dev/raw/raw1</b></pre>
3763 &nbsp;</ol>
3764
3765 <li>
3766 Check the cluster configuration:</li>
3767
3768 <br>&nbsp;
3769 <ul>
3770 <li>
3771 Invoke the <b><font face="Courier New, Courier, mono">cludiskutil</font></b>
3772 utility with the <b><font face="Courier New, Courier, mono">-t</font></b>
3773 option on both cluster systems to ensure that the quorum partitions map
3774 to the same physical device. See <a href="#cludiskutil">Testing the Quorum
3775 Partitions</a> for more information.</li>
3776
3777 <br>&nbsp;
3778 <li>
3779 If you are using power switches, invoke the <b>clustonith </b>command on
3780 both cluster systems to test the remote connections to the power switches.
3781 See <a href="#pswitch">Testing the Power Switches</a> for more information.</li>
3782 </ul>
3783
3784 <li>
3785 Configure event logging so that cluster messages are logged to a separate
3786 file. See <a href="#software-logging">Configuring syslog Event Logging</a>
3787 for information.</li>
3788
3789 <br>&nbsp;
3790 <li>
3791 Start the cluster by invoking the <b><font face="Courier New, Courier, mono">cluster
3792 start </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
3793 directory on both cluster systems. For example:</li>
3794
3795 <pre># <b>service cluster start</b></pre>
3796 </ol>
3797 After you have initialized the cluster, you can add cluster services. See
3798 <a href="#software-ui">Using
3799 the cluadmin Utility</a>, <a href="#software-gui">Configuring and Using
3800 the Graphical User Interface</a>, and <a href="#service-configure">Configuring
3801 a Service</a> for more information.
3802 <br>&nbsp;
3803 <p><a NAME="software-rawdevices"></a>
3804 <h3>
3805 3.1.1 Editing the rawdevices File</h3>
3806 The <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices</font></b>
3807 file is used to map the raw devices for the quorum partitions each time
3808 a cluster system boots. As part of the cluster software installation procedure,
3809 you must edit the <b><font face="Courier New, Courier, mono">rawdevices
3810 </font></b>file
3811 on each cluster system and specify the raw character devices and block
3812 devices for the primary and backup quorum partitions. This enables the
3813 cluster graphical interface to work correctly.
3814 <p>If you are using raw devices in a cluster service, you can also use
3815 the <b><font face="Courier New, Courier, mono">rawdevices</font></b> file
3816 to bind the devices at boot time. Edit the file and specify the raw character
3817 devices and block devices that you want to bind each time the system boots.
3818 <p>The following is an example rawdevices file which designates two quorum
3819 partitions:
3820 <pre># raw device bindings</pre>
3821
3822 <pre># format:&nbsp; &lt;rawdev> &lt;major> &lt;minor></pre>
3823
3824 <pre>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &lt;rawdev> &lt;blockdev></pre>
3825
3826 <pre># example: /dev/raw/raw1 /dev/sda1</pre>
3827
3828 <pre>#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /dev/raw/raw2 8 5</pre>
3829
3830 <pre>/dev/raw/raw1 /dev/sdb1</pre>
3831
3832 <pre>/dev/raw/raw2 /dev/sdb2</pre>
3833
3834 <p><br>See <a href="#state-partitions">Configuring Quorum Partitions</a>
3835 for more information about setting up the quorum partitions. See <a href="#rawdevices">Creating
3836 Raw Devices</a> for more information on using the <b><font face="Courier New, Courier, mono">raw</font></b>
3837 command to bind raw character devices to block devices.
3838 <p><a NAME="software-config"></a>
3839 <h3>
3840 3.1.2 Example of the cluconfig Utility</h3>
3841 This section includes an example of the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
3842 cluster configuration utility, which prompts you for information about
3843 the cluster members, and then enters the information into the cluster database,
3844 a copy of which is located in the <b><font face="Courier New, Courier, mono">cluster.conf</font></b>
3845 file. In the example, the information entered at the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
3846 prompts applies to the following configuration:
3847 <ul>
3848 <li>
3849 On the <b><font face="Courier New, Courier, mono">storage0</font></b> cluster
3850 system:</li>
3851
3852 <br>&nbsp;
3853 <p>&nbsp;
3854 <br>&nbsp;
3855 <br>&nbsp;
3856 <p>Ethernet heartbeat channels: <b><font face="Courier New, Courier, mono">storage0</font></b>
3857 and <b><font face="Courier New, Courier, mono">cstorage0</font></b>
3858 <br>Serial heartbeat channel: <b><font face="Courier New, Courier, mono">/dev/ttyS1</font></b>
3859 <br>Power switch serial port: <b><font face="Courier New, Courier, mono">/dev/ttyC0</font></b>
3860 <br>Power switch: <b><font face="Courier New, Courier, mono">RPS10</font></b>
3861 <br>Quorum partitions: <b><font face="Courier New, Courier, mono">/dev/raw/raw1</font></b>
3862 and <b><font face="Courier New, Courier, mono">/dev/raw/raw2</font></b>
3863 <br>&nbsp;
3864 <li>
3865 On the <b><font face="Courier New, Courier, mono">storage1</font></b> cluster
3866 system:</li>
3867
3868 <br>&nbsp;
3869 <p>&nbsp;
3870 <br>&nbsp;
3871 <br>&nbsp;
3872 <p>Ethernet heartbeat channels:<b><font face="Courier New, Courier, mono">
3873 storage1</font></b> and <b><font face="Courier New, Courier, mono">cstorage1</font></b>
3874 <br>Serial heartbeat channel: <b><font face="Courier New, Courier, mono">/dev/ttyS1</font></b>
3875 <br>Power switch serial port: <b><font face="Courier New, Courier, mono">/dev/ttyS0</font></b>
3876 <br>Power switch: <b><font face="Courier New, Courier, mono">RPS10</font></b>
3877 <br>Quorum partitions: <b><font face="Courier New, Courier, mono">/dev/raw/raw1</font></b>
3878 and <b><font face="Courier New, Courier, mono">/dev/raw/raw2</font></b></ul>
3879 <i>Editorial comment: need to put an updated screen capture of cluconfig
3880 in here.</i>
3881 <pre><font size=-1># <b>/sbin/cluconfig
3882 </b>------------------------------------
3883 Cluster Member Configuration Utility
3884 ------------------------------------
3885 Version: 1.1.2 Built: Thu Oct 26 12:09:30 EDT 2000
3886 &nbsp;
3887 This utility sets up the member systems of a 2-node cluster.
3888 It prompts you for the following information:
3889
3890 o&nbsp; Hostname
3891 o&nbsp; Number of heartbeat channels
3892 o&nbsp; Information about the type of channels and their names
3893 o&nbsp; Raw quorum partitions, both primary and shadow
3894 o&nbsp; Power switch type and device name
3895
3896 In addition, it performs checks to make sure that the information
3897 entered is consistent with the hardware, the Ethernet ports, the raw
3898 partitions and the character device files.
3899
3900 After all the information is entered, it initializes the partitions
3901 and saves the configuration information to the quorum partitions.
3902
3903 - Checking that cluster daemons are stopped: done
3904
3905 Your cluster configuration should include power switches for optimal
3906 data integrity.
3907
3908 - Does the cluster configuration include power switches? (yes/no) [yes]: <b>y
3909
3910 </b>----------------------------------------
3911 Setting information for cluster member 0
3912 ----------------------------------------
3913 Enter name of cluster member [storage0]: <b>storage0
3914 </b>Looking for host storage0 (may take a few seconds)...
3915 Host storage0 found
3916 Cluster member name set to: storage0
3917
3918 Enter number of heartbeat channels (minimum = 1) [1]: <b>3
3919 </b>You selected 3 channels
3920 Information about channel 0:&nbsp;
3921 Channel type: net or serial [net]: <b>net
3922 </b>Channel type set to: net
3923 Enter hostname of cluster member storage0 on heartbeat channel 0 [storage0]: <b>storage0
3924 </b>Looking for host storage0 (may take a few seconds)...
3925 Host storage0 found
3926 Hostname corresponds to an interface on member 0
3927 Channel name set to: storage0
3928
3929 Information about channel 1:&nbsp;
3930 Channel type: net or serial [net]: <b>net
3931 </b>Channel type set to: net
3932 Enter hostname this interface responds to [storage0]: <b>cstorage0
3933 </b>Looking for host cstorage0 (may take a few seconds)...
3934 Host cstorage0 found
3935 Hostname corresponds to an interface on member 0
3936 Channel name set to: cstorage0
3937
3938 Information about channel 2:&nbsp;
3939 Channel type: net or serial [net]: <b>serial
3940 </b>Channel type set to: serial
3941 Enter device name [/dev/ttyS1]: <b>/dev/ttyS1
3942 </b>Device /dev/ttyS1 found and no getty running on it
3943 Device name set to: /dev/ttyS1
3944
3945 Setting information about Quorum Partitions
3946 Enter Primary Quorum Partition [/dev/raw/raw1]: <b>/dev/raw/raw1
3947 </b>Raw device /dev/raw/raw1 found&nbsp;
3948 Primary Quorum Partition set to /dev/raw/raw1
3949 Enter Shadow Quorum Partition [/dev/raw/raw2]: <b>/dev/raw/raw2
3950 </b>Raw device /dev/raw/raw2 found&nbsp;
3951 Shadow Quorum Partition set to /dev/raw/raw2
3952
3953 Information about power switch connected to member 0
3954 Enter serial port for power switch [/dev/ttyC0]: <b>/dev/ttyC0
3955 </b>Device /dev/ttyC0 found and no getty running on it
3956 Serial port for power switch set to /dev/ttyC0
3957 Specify one of the following switches (RPS10/APC) [RPS10]: <b>RPS10
3958 </b>Power switch type set to RPS10
3959
3960 ----------------------------------------
3961 Setting information for cluster member 1
3962 ----------------------------------------
3963 Enter name of cluster member: <b>storage1
3964 </b>Looking for host storage1 (may take a few seconds)...
3965 Host storage1 found
3966 Cluster member name set to: storage1
3967
3968 You previously selected 3 channels
3969 Information about channel 0:&nbsp;
3970 Channel type selected as net
3971 Enter hostname of cluster member storage1 on heartbeat channel 0: <b>storage1
3972 </b>Looking for host storage1 (may take a few seconds)...
3973 Host storage1 found
3974 Channel name set to: storage1
3975
3976 Information about channel 1:&nbsp;
3977 Channel type selected as net
3978 Enter hostname this interface responds to [storage1]: <b>cstorage1
3979
3980 </b>Information about channel 2:&nbsp;
3981 Channel type selected as serial
3982 Enter device name [/dev/ttyS1]: <b>/dev/ttyS1
3983 </b>Device name set to: /dev/ttyS1
3984
3985 Setting information about Quorum Partitions
3986 Enter Primary Quorum Partition [/dev/raw/raw1]: <b>/dev/raw/raw1
3987 </b>Primary Quorum Partition set to /dev/raw/raw1
3988 Enter Shadow Quorum Partition [/dev/raw/raw2]: <b>/dev/raw/raw2
3989 </b>Shadow Quorum Partition set to /dev/raw/raw2
3990
3991 Information about power switch connected to member 1
3992 Enter serial port for power switch [/dev/ttyS0]: <b>/dev/ttyS0
3993 </b>Serial port for power switch set to /dev/ttyS0
3994 Specify one of the following switches (RPS10/APC) [RPS10]: <b>RPS10
3995 </b>Power switch type set to RPS10
3996
3997 ------------------------------------
3998 The following choices will be saved:
3999 ------------------------------------
4000 ---------------------
4001 Member 0 information:
4002 ---------------------
4003 Name: storage0
4004 Primary quorum partition set to /dev/raw/raw1
4005 Shadow quorum partition set to /dev/raw/raw2
4006 Heartbeat channels: 3
4007 Channel type: net. Name: storage0
4008 Channel type: net. Name: cstorage0
4009 Channel type: serial. Name: /dev/ttyS1
4010 Power Switch type: RPS10. Port: /dev/ttyC0
4011
4012 ---------------------
4013 Member 1 information:
4014 ---------------------
4015 Name: storage1
4016 Primary quorum partition set to /dev/raw/raw1
4017 Shadow quorum partition set to /dev/raw/raw2
4018 Heartbeat channels: 3
4019 Channel type: net. Name: storage1
4020 Channel type: net. Name: cstorage1
4021 Channel type: serial. Name: /dev/ttyS1
4022 Power Switch type: RPS10. Port: /dev/ttyS0
4023 ------------------------------------
4024
4025 Save changes? yes/no [yes]: <b>yes
4026 </b>Writing to output configuration file...done.
4027 Changes have been saved to /etc/cluster.conf
4028 ----------------------------
4029 Setting up Quorum Partitions
4030 ----------------------------
4031 Quorum partitions have not been set up yet.&nbsp;&nbsp;
4032 Run cludiskutil -I to set up the quorum partitions now?&nbsp; yes/no [yes]: <b>yes</b></font></pre>
4033
4034 <pre><font size=-1>Saving configuration information to quorum partition:&nbsp;
4035 ------------------------------------------------------------------
4036 Setup on this member is complete.&nbsp; If errors have been reported,
4037 correct them.
4038
4039 If you have not already set up the other cluster member, invoke the following
4040 command on the other cluster member:
4041 &nbsp;
4042 # /sbin/cluconfig --init=/dev/raw/raw1
4043 &nbsp;
4044 After running cluconfig on the other member system, you can start the
4045 cluster daemons on each cluster system by invoking the cluster start
4046 script located in the System V init directory.&nbsp; For example:
4047 &nbsp;
4048 # /etc/rc.d/init.d/cluster start</font>
4049
4050 </pre>
4051
4052 <p><br><a NAME="software-check"></a>
4053 <h2>
4054 3.2 Checking the Cluster Configuration</h2>
4055 To ensure that you have correctly configured the cluster software, check
4056 the configuration by using tools located in the <b><font face="Courier New, Courier, mono">/sbin</font></b>
4057 directory:
4058 <ul>
4059 <li>
4060 Test the quorum partitions and ensure that they are accessible</li>
4061
4062 <br>&nbsp;
4063 <p>&nbsp;
4064 <br>&nbsp;
4065 <br>&nbsp;
4066 <p>Invoke the <b><font face="Courier New, Courier, mono">cludiskutil</font></b>
4067 utility with the <b><font face="Courier New, Courier, mono">-t</font></b>
4068 option to test the accessibility of the quorum partitions. See <a href="#cludiskutil">Testing
4069 the Quorum Partitions</a> for more information.
4070 <br>&nbsp;
4071 <li>
4072 Test the operation of the power switches</li>
4073
4074 <br>&nbsp;
4075 <p>&nbsp;
4076 <br>&nbsp;
4077 <br>&nbsp;
4078 <p>If you are using power switches, run the <b>clustonith</b> command on
4079 each cluster system to ensure that it can remotely power-cycle the other
4080 cluster system. Do not run this command while the cluster software is running.
4081 See <a href="#pswitch">Testing Power Switches</a> for more information.
4082 <br>&nbsp;
4083 <li>
4084 Ensure that both cluster systems are running the same software version</li>
4085
4086 <br>&nbsp;
4087 <p>&nbsp;
4088 <br>&nbsp;
4089 <br>&nbsp;
4090 <p>Invoke the <b><font face="Courier New, Courier, mono">rpm -qa clumanager</font></b>
4091 command on each cluster system to display the revision of the installed
4092 cluster RPM.</ul>
4093 The following sections describe these tools.
4094 <br>&nbsp;
4095 <br>&nbsp;
4096 <p><a NAME="cludiskutil"></a>
4097 <h3>
4098 3.2.1 Testing the Quorum Partitions</h3>
4099 The quorum partitions must refer to the same physical device on both cluster
4100 systems. Invoke the <b><font face="Courier New, Courier, mono">cludiskutil</font></b>
4101 utility with the<b><font face="Courier New, Courier, mono"> -t </font></b>command
4102 to test the quorum partitions and verify that they are accessible.
4103 <p>If the command succeeds, run the <b><font face="Courier New, Courier, mono">cludiskutil
4104 -p</font></b> command on both cluster systems to display a summary of the
4105 header data structure for the quorum partitions. If the output is different
4106 on the systems, the quorum partitions do not point to the same devices
4107 on both systems. Check to make sure that the raw devices exist and are
4108 correctly specified in the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices</font></b>
4109 file. See <a href="#state-partitions">Configuring the Quorum Partitions</a>
4110 for more information.
4111 <p>The following example shows that the quorum partitions refer to the
4112 same physical device on two cluster systems:
4113 <pre><font size=-1>[root@devel0 /root]# <b>cludiskutil -p
4114 </b>----- Shared State Header ------
4115 Magic# = 0x39119fcd
4116 Version = 1
4117 Updated on Thu Sep 14 05:43:18 2000
4118 Updated by node 0
4119 --------------------------------
4120 [root@devel0 /root]#
4121
4122 [root@devel1 /root]# <b>/sbin/cludiskutil -p
4123 </b>----- Shared State Header ------
4124 Magic# = 0x39119fcd
4125 Version = 1
4126 Updated on Thu Sep 14 05:43:18 2000
4127 Updated by node 0
4128 --------------------------------
4129 [root@devel1 /root]#</font></pre>
4130 The <b><font face="Courier New, Courier, mono">Magic#</font></b> and <b><font face="Courier New, Courier, mono">Version</font></b>fields
4131 will be the same for all cluster configurations. The last two lines of
4132 output indicate the date that the quorum partitions were initialized with
4133 <b><font face="Courier New, Courier, mono">cludiskutil -I,</font></b> and
4134 the numeric identifier for the cluster system that invoked the initialization
4135 command.
4136 <p>If the output of the <b><font face="Courier New, Courier, mono">cludiskutil</font></b>
4137 utility with the <b><font face="Courier New, Courier, mono">-p</font></b>
4138 option is not the same on both cluster systems, you can do the following:
4139 <ul>
4140 <li>
4141 Examine the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices
4142 </font></b>file
4143 on each cluster system and ensure that you have accurately specified the
4144 raw character devices and block devices for the primary and backup quorum
4145 partitions. If not edit the file and correct any mistakes. Then re-run
4146 the <b><font face="Courier New, Courier, mono">cluconfig</font></b> utility.
4147 See <a href="#software-rawdevices">Editing the rawdevices File</a> for
4148 more information.</li>
4149
4150 <br>&nbsp;
4151 <li>
4152 Ensure that you have created the raw devices for the quorum partitions
4153 on each cluster system. See <a href="#state-partitions">Configuring the
4154 Quorum Partitions</a> for more information.</li>
4155
4156 <br>&nbsp;
4157 <li>
4158 On each cluster system, examine the system startup messages at the point
4159 where the system probes the SCSI subsystem to determine the bus configuration.
4160 Verify that both cluster systems identify the same shared storage devices
4161 and assign them the same name.</li>
4162
4163 <br>&nbsp;
4164 <li>
4165 Verify that a cluster system is not attempting to mount a file system on
4166 the quorum partition. For example, make sure that the actual device (for
4167 example, <b><font face="Courier New, Courier, mono">/dev/sdb1</font></b>)
4168 is not included in an <b><font face="Courier New, Courier, mono">/etc/fstab</font></b>
4169 file.</li>
4170 </ul>
4171 After you perform these tasks, re-run the <b><font face="Courier New, Courier, mono">cludiskutil</font></b>
4172 utility with the <b><font face="Courier New, Courier, mono">-p</font></b>
4173 option.
4174 <br>&nbsp;
4175 <p><a NAME="pswitch"></a>
4176 <h3>
4177 3.2.2 Testing the Power Switches</h3>
4178 If you are using power switches, after you install the cluster software,
4179 but before starting the cluster, use the <b>clustonith</b> command to test
4180 the power switches. Invoke the command on each cluster system to ensure
4181 that it can remotely power-cycle the other cluster system.
4182 <p>The <b>clustonith</b> command can accurately test a power switch only
4183 if the cluster software is not running. This&nbsp; is due to the fact that
4184 for serial attached switches, only one program at a time can access the
4185 serial port that connects a power switch to a cluster system. When you
4186 invoke the <b>clustonith</b> command, it checks the status of the cluster
4187 software. If the cluster software is running, the command exits with a
4188 message to stop the cluster software.
4189 <p>The format of the <b>clustonith</b> command is as follows:
4190 <pre>clustonith [-sSlLvr] [-t devicetype] [-F options-file] [-p stonith-parameters]&nbsp;
4191 Options:
4192 -s&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Silent mode, supresses error and log messages
4193 -S&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Display switch status
4194 -l&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; List the hosts a switch can access
4195 -L&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; List the set of supported switch types
4196 -r hostname&nbsp;&nbsp;&nbsp;&nbsp; Power cycle the specified host
4197 -v&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Increases verbose debugging level</pre>
4198
4199 <pre><i>Editorial note: we need a new manpage for clustonith(8). Once that is in place, there's no need to include all that info here.</i></pre>
4200 When testing power switches, the first step is to ensure that each cluster
4201 member can successfully communicate with its attached power switch. The
4202 following example of the <b>clustonith</b> command output shows that the
4203 cluster member is able to communicate with its power switch:
4204 <p># <b>clustonith -S</b>
4205 <br>WTI Network Power Switch device OK.
4206 <br>An example output of the <b>clustonith</b> command when it is unable
4207 to communicate with its power switch appears below:
4208 <br># <b>clustonith -S</b>
4209 <br>Unable to determine power switch type.
4210 <br>Unable to determine default power switch type.
4211 <br>&nbsp;
4212 <p>The above error indicates could be indicitive of the following types
4213 of problems:
4214 <ul>
4215 <li>
4216 For serial attached power switches:</li>
4217
4218 <ul>
4219 <li>
4220 Verify that the device special file for the remote power switch connection
4221 serial port (for example, <b><font face="Courier New, Courier, mono">/dev/ttyS0</font></b>)
4222 is specified correctly in the cluster database, as established via the
4223 <b>cluconfig
4224 </b>command.
4225 If necessary, use a terminal emulation package like <b><font face="Courier New, Courier, mono">minicom</font></b>
4226 to test if the cluster system can access the serial port.</li>
4227
4228 <br>&nbsp;
4229 <li>
4230 Ensure that a non-cluster program (for example, a getty program) is not
4231 using the serial port for the remote power switch connection. You can use
4232 the <b><font face="Courier New, Courier, mono">lsof</font></b> command
4233 to perform this task.</li>
4234
4235 <br>&nbsp;
4236 <li>
4237 Check that the cable connection to the remote power switch is correct.
4238 Verify that you are using the correct type of cable (for example, an RPS-10
4239 power switch requires a null modem cable), and all connections are secure.</li>
4240
4241 <br>&nbsp;
4242 <li>
4243 Verify that any physical dip switches or rotary switches on the power switch
4244 are set properly. If you are using an RPS-10 power switch, see <a href="#rps-10">Setting
4245 Up an RPS-10 Power Switch</a> for more information.</li>
4246 </ul>
4247
4248 <li>
4249 For network based power switches:</li>
4250
4251 <ul>
4252 <li>
4253 Verify that the network connection to network based switches is operational.&nbsp;
4254 Most switches have a <i>link</i> light which indicates connectivity.</li>
4255
4256 <li>
4257 You should be able to <b>ping </b>the network switch; if not it may not
4258 be properly configured for its network parameters.</li>
4259
4260 <li>
4261 Verify that the correct password and login name (depending on switch type)
4262 have been specified in the cluster configuration database (as established
4263 by running <b>cluconfig</b>).&nbsp; A useful diagnostic approach is to
4264 verify that you can <b>telnet</b> to the network switch using the same
4265 parameters as specified in the cluster configuration.</li>
4266 </ul>
4267 </ul>
4268 After you have successfully verified communication with the switch, you
4269 can then attempt to power cycle the other cluster member.&nbsp; Prior to
4270 doing this, it would be a good idea to verify that the other cluster member
4271 isn't actively performing any important functions (such as serving cluster
4272 services to active clients).&nbsp; The following command depicts a successful
4273 power cycle operation:
4274 <p>[root@clu4 /]# <b>clustonith -r clu3</b>
4275 <br>Successfully power cycled host clu3.
4276 <p><a NAME="release"></a>
4277 <h3>
4278 3.2.3 Displaying the Cluster Software Version</h3>
4279 Invoke the <b><font face="Courier New, Courier, mono">rpm -qa clumanager
4280 </font></b>command
4281 to display the revision of the installed cluster RPM. Ensure that both
4282 cluster systems are running the same version.
4283 <br><a NAME="software-logging"></a>
4284 <h2>
4285 3.3 Configuring syslog Event Logging</h2>
4286 You should edit the <b><font face="Courier New, Courier, mono">/etc/syslog.conf</font></b>
4287 file to enable the cluster to log events to a file that is different from
4288 the <b><font face="Courier New, Courier, mono">/var/log/messages</font></b>default
4289 log file. Logging cluster messages to a separate file will help you diagnose
4290 problems.
4291 <p>The cluster systems use the <b><font face="Courier New, Courier, mono">syslogd</font></b>
4292 daemon to log cluster-related events to a file, as specified in the <b><font face="Courier New, Courier, mono">/etc/syslog.conf</font></b>
4293 file. You can use the log file to diagnose problems in the cluster. It
4294 is recommended that you set up event logging so that the <b><font face="Courier New, Courier, mono">syslogd</font></b>
4295 daemon logs cluster messages only from the system on which it is running.
4296 Therefore, you need to examine the log files on both cluster systems to
4297 get a comprehensive view of the cluster.
4298 <p>The <b><font face="Courier New, Courier, mono">syslogd</font></b> daemon
4299 logs messages from the following cluster daemons:
4300 <ul>
4301 <li>
4302 <b><font face="Courier New, Courier, mono">cluquorumd</font></b> - Quorum
4303 daemon</li>
4304
4305 <li>
4306 <b><font face="Courier New, Courier, mono">clusvcmgrd</font></b> - Service
4307 manager daemon</li>
4308
4309 <li>
4310 <b><font face="Courier New, Courier, mono">clupowerd</font></b> - Power
4311 daemon</li>
4312
4313 <li>
4314 <b><font face="Courier New, Courier, mono">cluhbd</font></b> - Heartbeat
4315 daemon</li>
4316 </ul>
4317 The importance of an event determines the severity level of the log entry.
4318 Important events should be investigated before they affect cluster availability.
4319 The cluster can log messages with the following severity levels, listed
4320 in the order of decreasing severity:
4321 <ul>
4322 <li>
4323 <b><font face="Courier New, Courier, mono">emerg</font></b> - The cluster
4324 system is unusable.</li>
4325
4326 <li>
4327 <b><font face="Courier New, Courier, mono">alert</font></b> - Action must
4328 be taken immediately to address the problem.</li>
4329
4330 <li>
4331 <b><font face="Courier New, Courier, mono">crit</font></b> - A critical
4332 condition has occurred.</li>
4333
4334 <li>
4335 <b><font face="Courier New, Courier, mono">err </font></b>- An error has
4336 occurred.</li>
4337
4338 <li>
4339 <b><font face="Courier New, Courier, mono">warning</font></b> - A significant
4340 event that may require attention has occurred.</li>
4341
4342 <li>
4343 <b><font face="Courier New, Courier, mono">notice</font></b> - An event
4344 that does not affect system operation has occurred.</li>
4345
4346 <li>
4347 <b><font face="Courier New, Courier, mono">info</font></b> - An normal
4348 cluster operation has occurred.</li>
4349 </ul>
4350 The default logging severity levels for the cluster daemons are <b><font face="Courier New, Courier, mono">warning</font></b>
4351 and higher.
4352 <p>Examples of log file entries are as follows:
4353 <pre><font size=-1>May 31 20:42:06 clu2 clusvcmgrd[992]: &lt;info>&nbsp; Service Manager starting&nbsp;
4354 May 31 20:42:06 clu2 clusvcmgrd[992]: &lt;info>&nbsp; mount.ksh info: /dev/sda3 is not mounted&nbsp;&nbsp;
4355 May 31 20:49:38 clu2 clulog[1294]: &lt;notice>&nbsp; stop_service.ksh notice: Stopping service dbase_home&nbsp;&nbsp;&nbsp;
4356 May 31 20:49:39 clu2 clusvcmgrd[1287]: &lt;notice>&nbsp; Service Manager received a NODE_UP event for stor5&nbsp;&nbsp;
4357 Jun 01 12:56:51 clu2 cluquorumd[1640]: &lt;err>&nbsp; updateMyTimestamp: unable to update status block.&nbsp;
4358 Jun 01 12:34:24 clu2 cluquorumd[1268]: &lt;warning>&nbsp; Initiating cluster stop&nbsp;&nbsp;&nbsp;
4359 Jun 01 12:34:24 clu2 cluquorumd[1268]: &lt;warning>&nbsp; Completed cluster stop&nbsp;
4360 Jul 27 15:28:40 clu2 cluquorumd[390]: &lt;err>&nbsp; shoot_partner: successfully shot partner.&nbsp;&nbsp;</font>&nbsp;&nbsp;&nbsp;
4361 &nbsp;&nbsp;&nbsp; <b>[1]&nbsp;&nbsp;&nbsp;&nbsp; [2]&nbsp;&nbsp; [3]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [4]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [5]</b></pre>
4362 Each entry in the log file contains the following information:
4363 <blockquote>[1]Timestamp
4364 <br>[2] Cluster system on which the event was logged
4365 <br>[3] Subsystem that generated the event
4366 <br>[4] Severity level of the event
4367 <br>[5] Description of the event</blockquote>
4368 After you configure the cluster software, you should edit the <b><font face="Courier New, Courier, mono">/etc/syslog.conf</font></b>
4369 file to enable the cluster to log events to a file that is different from
4370 the default log file, <b><font face="Courier New, Courier, mono">/var/log/messages</font></b>.
4371 Using a cluster-specific log file facilitates cluster monitoring and problem
4372 solving. To log cluster events to both the <b><font face="Courier New, Courier, mono">/var/log/cluster</font></b>
4373 and <b><font face="Courier New, Courier, mono">/var/log/messages</font></b>
4374 files, add lines similar to the following to the <b><font face="Courier New, Courier, mono">/etc/syslog.conf</font></b>
4375 file:
4376 <pre>#
4377 # Cluster messages coming in on local4 go to /var/log/cluster
4378 #
4379 local4.*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /var/log/cluster</pre>
4380 To prevent the duplication of messages and log cluster events only to the
4381 <b><font face="Courier New, Courier, mono">/var/log/cluster</font></b>
4382 file, also add lines similar to the following to the <b><font face="Courier New, Courier, mono">/etc/syslog.conf</font></b>
4383 file:
4384 <pre># Log anything (except mail) of level info or higher.
4385 # Don't log private authentication messages!
4386 *.info;mail.none;news.none;authpriv.none;local4.none&nbsp;&nbsp; /var/log/messages</pre>
4387 To apply the previous changes, you can invoke the <b><font face="Courier New, Courier, mono">killall
4388 -HUP syslogd</font></b> command, or restart <b><font face="Courier New, Courier, mono">syslog</font></b>
4389 with a command similar to <b><font face="Courier New, Courier, mono">/etc/rc.d/init.d/syslog
4390 restart</font></b>.
4391 <p>In addition, you can modify the severity level of the events that are
4392 logged by the individual cluster daemons. See <a href="#cluster-logging">Modifying
4393 Cluster Event Logging</a> for more information.
4394 <br>&nbsp;
4395 <p><a NAME="software-ui"></a>
4396 <h2>
4397 3.4 Using the cluadmin Utility</h2>
4398 The <b><font face="Courier New, Courier, mono">cluadmin</font></b> utility
4399 provides a command-line user interface that enables you to monitor and
4400 manage the cluster systems and services. For example, you can use the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4401 utility to perform the following tasks:
4402 <ul>
4403 <li>
4404 Add, modify, and delete services</li>
4405
4406 <li>
4407 Disable and enable services</li>
4408
4409 <li>
4410 Display cluster and service status</li>
4411
4412 <li>
4413 Modify cluster daemon event logging</li>
4414
4415 <li>
4416 Backup and restore the cluster database</li>
4417 </ul>
4418
4419 <p><br>The cluster uses an advisory lock to prevent the cluster database
4420 from being simultaneously modified by multiple users on either cluster
4421 system. You can only modify the database if you hold the advisory lock.
4422 <p>When you invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4423 utility, the cluster software checks if the lock is already assigned to
4424 a user. If the lock is not already assigned, the cluster software assigns
4425 you the lock. When you exit from the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4426 utility, you relinquish the lock.
4427 <p>If another user holds the lock, a warning will be displayed indicating
4428 that there is already a lock on the database. The cluster software gives
4429 you the option of taking the lock. If you take the lock, the previous holder
4430 of the lock can no longer modify the cluster database.
4431 <p>You should take the lock only if necessary, because uncoordinated simultaneous
4432 configuration sessions may cause unpredictable cluster behavior. In addition,
4433 it is recommended that you make only one change to the cluster database
4434 (for example, adding, modifying, or deleting services) at one time.
4435 <p>You can specify the following <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4436 command line options:
4437 <dl>
4438 <dt>
4439 <b><font face="Courier New, Courier, mono">-d</font></b> or <b><font face="Courier New, Courier, mono">--debug</font></b></dt>
4440
4441 <dd>
4442 Displays extensive diagnostic information.</dd>
4443
4444 <br>&nbsp;
4445 <dt>
4446 <b><font face="Courier New, Courier, mono">-h</font></b>, <b><font face="Courier New, Courier, mono">-?</font></b>,
4447 or <b><font face="Courier New, Courier, mono">--help</font></b></dt>
4448
4449 <dd>
4450 Displays help about the utility, and then exits.</dd>
4451
4452 <br>&nbsp;
4453 <dt>
4454 <b><font face="Courier New, Courier, mono">-n</font></b> or <b><font face="Courier New, Courier, mono">--nointeractive</font></b></dt>
4455
4456 <dd>
4457 Bypasses the cluadmin utility's top-level command loop processing. This
4458 option is used for cluadmin debugging purposes.</dd>
4459
4460 <br>&nbsp;
4461 <dt>
4462 <b><font face="Courier New, Courier, mono">-t</font></b> or <b><font face="Courier New, Courier, mono">--tcl</font></b></dt>
4463
4464 <dd>
4465 Adds a Tcl command to the cluadmin utility's top- level command interpreter.
4466 To pass a Tcl command directly to the utility's internal Tcl interpreter,
4467 at the <b><font face="Courier New, Courier, mono">cluadmin></font></b>
4468 prompt, preface the Tcl command with <b><font face="Courier New, Courier, mono">tcl</font></b>.
4469 This option is used for cluadmin debugging purposes.</dd>
4470
4471 <br>&nbsp;
4472 <dt>
4473 <b><font face="Courier New, Courier, mono">-V</font></b> or <b><font face="Courier New, Courier, mono">--version</font></b></dt>
4474
4475 <dd>
4476 Displays information about the current version of cluadmin.</dd>
4477 </dl>
4478 When you invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4479 utility without the <b><font face="Courier New, Courier, mono">-n</font></b>
4480 option, the <b><font face="Courier New, Courier, mono">cluadmin></font></b>
4481 prompt appears. You can then specify commands and subcommands. The following
4482 table describes the commands and subcommands for the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4483 utility:
4484 <br>&nbsp;
4485 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
4486 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#99CCCC">
4487 <td WIDTH="16%">
4488 <h3>
4489 cluadmin Command</h3>
4490 </td>
4491
4492 <td WIDTH="17%">
4493 <h3>
4494 cluadmin Subcommand</h3>
4495 </td>
4496
4497 <td WIDTH="67%">
4498 <h3>
4499 Description</h3>
4500 </td>
4501 </tr>
4502
4503 <tr>
4504 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%"><b><font face="Courier New, Courier, mono">help</font></b></td>
4505
4506 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%">None</td>
4507
4508 <td WIDTH="67%">Displays help for the specified <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4509 command or subcommand. For example:&nbsp;
4510 <pre>cluadmin> <b>help service add&nbsp;</b></pre>
4511 </td>
4512 </tr>
4513
4514 <tr>
4515 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%"><b><font face="Courier New, Courier, mono">cluster</font></b></td>
4516
4517 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">status</font></b></td>
4518
4519 <td WIDTH="67%">Displays a snapshot of the current cluster status. See
4520 <a href="#cluster-status">Displaying
4521 Cluster and Service Status</a> for information. For example:&nbsp;
4522 <pre>cluadmin> <b>cluster status</b></pre>
4523 </td>
4524 </tr>
4525
4526 <tr>
4527 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%"></td>
4528
4529 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">monitor</font></b></td>
4530
4531 <td WIDTH="67%">Continuously displays snapshots of the cluster status at
4532 five-second intervals. Press the <b><font face="Courier New, Courier, mono">Return</font></b>
4533 or <b><font face="Courier New, Courier, mono">Enter</font></b> key to stop
4534 the display. You can specify the <b><font face="Courier New, Courier, mono">-interval</font></b>
4535 option with a numeric argument to display snapshots at the specified time
4536 interval (in seconds). In addition, you can specify the <b><font face="Courier New, Courier, mono">-clear</font></b>
4537 option with a yes argument to clear the screen after each snapshot display
4538 or with a no argument to not clear the screen. See <a href="#cluster-status">Displaying
4539 Cluster and Service Status</a> for information. For example:&nbsp;
4540 <pre>cluadmin> <b>cluster monitor -clear yes -interval 10</b></pre>
4541 </td>
4542 </tr>
4543
4544 <tr>
4545 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4546
4547 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">loglevel</font></b></td>
4548
4549 <td WIDTH="67%">Sets the logging for the specified cluster daemon to the
4550 specified severity level. See <a href="#cluster-logging">Modifying Cluster
4551 Event Logging </a>for information. For example:&nbsp;
4552 <pre>cluadmin> <b>cluster loglevel cluquorumd 7&nbsp;</b></pre>
4553 </td>
4554 </tr>
4555
4556 <tr>
4557 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4558
4559 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">reload</font></b></td>
4560
4561 <td WIDTH="67%">Forces the cluster daemons to re-read the cluster configuration
4562 database. See <a href="#cluster-reload">Reloading the Cluster Database</a>
4563 for information. For example:&nbsp;
4564 <pre>cluadmin> <b>cluster reload&nbsp;</b></pre>
4565 </td>
4566 </tr>
4567
4568 <tr>
4569 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4570
4571 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">name</font></b></td>
4572
4573 <td WIDTH="67%">Sets the name of the cluster to the specified name. The
4574 cluster name is included in the output of the <b><font face="Courier New, Courier, mono">clustat</font></b>
4575 cluster monitoring command. See <a href="#cluster-name">Changing the Cluster
4576 Name</a> for information. For example:&nbsp;
4577 <pre>cluadmin> <b>cluster name dbasecluster</b></pre>
4578 </td>
4579 </tr>
4580
4581 <tr>
4582 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4583
4584 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">backup</font></b></td>
4585
4586 <td WIDTH="67%">Saves a copy of the cluster configuration database in the
4587 <b><font face="Courier New, Courier, mono">/etc/cluster.conf.bak</font></b>
4588 file. See <a href="#cluster-backup">Backing Up and Restoring the Cluster
4589 Database</a> for information. For example:&nbsp;
4590 <pre>cluadmin> <b>cluster backup&nbsp;</b></pre>
4591 </td>
4592 </tr>
4593
4594 <tr>
4595 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4596
4597 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">restore</font></b></td>
4598
4599 <td WIDTH="67%">Restores the cluster configuration database from the backup
4600 copy in the <b><font face="Courier New, Courier, mono">/etc/cluster.conf.bak</font></b>
4601 file. See <a href="#cluster-backup">Backing Up and Restoring the Cluster
4602 Database</a> for information. For example:&nbsp;
4603 <pre>cluadmin> <b>cluster restore</b></pre>
4604 </td>
4605 </tr>
4606
4607 <tr>
4608 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4609
4610 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">saveas</font></b></td>
4611
4612 <td WIDTH="67%">Saves the cluster configuration database to the specified
4613 file. See <a href="#cluster-backup">Backing Up and Restoring the Cluster
4614 Database</a> for information. For example:&nbsp;
4615 <pre>cluadmin> <b>cluster saveas cluster_backup.conf</b>&nbsp;</pre>
4616 </td>
4617 </tr>
4618
4619 <tr>
4620 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4621
4622 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">restorefrom</font></b></td>
4623
4624 <td WIDTH="67%">Restores the cluster configuration database from the specified
4625 file. See <a href="#cluster-backup">Backing Up and Restoring the Cluster
4626 Database</a> for information. For example:&nbsp;
4627 <pre>cluadmin> <b>cluster restorefrom cluster_backup.conf&nbsp;</b></pre>
4628 </td>
4629 </tr>
4630
4631 <tr>
4632 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%" HEIGHT="83"><b><font face="Courier New, Courier, mono">service</font></b></td>
4633
4634 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%" HEIGHT="83"><b><font face="Courier New, Courier, mono">add</font></b></td>
4635
4636 <td WIDTH="67%" HEIGHT="83">Adds a cluster service to the cluster database.
4637 The command prompts you for information about service resources and properties.
4638 See
4639 <a href="#service-configure">Configuring a Service</a> for information.
4640 For example:&nbsp;
4641 <pre>cluadmin> <b>service add&nbsp;</b></pre>
4642 </td>
4643 </tr>
4644
4645 <tr>
4646 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4647
4648 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">modify</font></b></td>
4649
4650 <td WIDTH="67%">Modifies the resources or properties of the specified service.
4651 You can modify any of the information that you specified when the service
4652 was created. See <a href="#service-modify">Modifying a Service</a> for
4653 information. For example:&nbsp;
4654 <pre>cluadmin> <b>service modify dbservice&nbsp;</b></pre>
4655 </td>
4656 </tr>
4657
4658 <tr>
4659 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4660
4661 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">show
4662 state</font></b></td>
4663
4664 <td WIDTH="67%">Displays the current status of all services or the specified
4665 service. See <a href="#cluster-status">Displaying Cluster and Service Status</a>for
4666 information. For example:&nbsp;
4667 <pre>cluadmin> <b>service show state dbservice</b></pre>
4668 </td>
4669 </tr>
4670
4671 <tr>
4672 <td></td>
4673
4674 <td>
4675 <center><b>relocate</b></center>
4676 </td>
4677
4678 <td>Editorial comment: Need to add relocate description and corresponding
4679 section.</td>
4680 </tr>
4681
4682 <tr>
4683 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4684
4685 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">show
4686 config</font></b></td>
4687
4688 <td WIDTH="67%">Displays the current configuration for the specified service.
4689 See <a href="#service-status">Displaying a Service Configuration</a> for
4690 information. For example:&nbsp;
4691 <pre>cluadmin> <b>service show config dbservice</b></pre>
4692 </td>
4693 </tr>
4694
4695 <tr>
4696 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4697
4698 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">disable</font></b></td>
4699
4700 <td WIDTH="67%">Stops the specified service. You must enable a service
4701 to make it available again. See <a href="#service-disable">Disabling a
4702 Service</a> for information. For example:&nbsp;
4703 <pre>cluadmin> <b>service disable dbservice</b></pre>
4704 </td>
4705 </tr>
4706
4707 <tr>
4708 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4709
4710 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">enable</font></b></td>
4711
4712 <td WIDTH="67%">Starts the specified disabled service. See <a href="#service-enable">Enabling
4713 a Service</a> for information. For example:&nbsp;
4714 <pre>cluadmin> <b>service enable dbservice&nbsp;</b></pre>
4715 </td>
4716 </tr>
4717
4718 <tr>
4719 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%">&nbsp;</td>
4720
4721 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%"><b><font face="Courier New, Courier, mono">delete</font></b></td>
4722
4723 <td WIDTH="67%">Deletes the specified service from the cluster configuration
4724 database. See <a href="#service-delete">Deleting a Service</a> for information.
4725 For example:&nbsp;
4726 <pre cluadmin>&nbsp;<b>service delete dbservice&nbsp;</b></pre>
4727 </td>
4728 </tr>
4729
4730 <tr>
4731 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%"><b><font face="Courier New, Courier, mono">apropos</font></b></td>
4732
4733 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%">None</td>
4734
4735 <td WIDTH="67%">Displays the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4736 commands that match the specified character string argument or, if no argument
4737 is specified, displays all <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4738 commands. For example:&nbsp;
4739 <pre>cluadmin> <b>apropos service</b></pre>
4740 </td>
4741 </tr>
4742
4743 <tr>
4744 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%"><b><font face="Courier New, Courier, mono">clear</font></b></td>
4745
4746 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%">None</td>
4747
4748 <td WIDTH="67%">Clears the screen display. For example:&nbsp;
4749 <pre>cluadmin> <b>clear&nbsp;</b></pre>
4750 </td>
4751 </tr>
4752
4753 <tr>
4754 <td ALIGN=CENTER VALIGN=TOP WIDTH="16%" HEIGHT="27"><b><font face="Courier New, Courier, mono">exit</font></b></td>
4755
4756 <td ALIGN=CENTER VALIGN=TOP WIDTH="17%" HEIGHT="27">None</td>
4757
4758 <td WIDTH="67%" HEIGHT="27">Exits from <b><font face="Courier New, Courier, mono">cluadmin</font></b>.
4759 For example:
4760 <pre>cluadmin> <b>exit</b></pre>
4761 </td>
4762 </tr>
4763
4764 <tr>
4765 <td>
4766 <center><b>quit</b></center>
4767 </td>
4768
4769 <td>
4770 <center>None</center>
4771 </td>
4772
4773 <td>Exits from <b><font face="Courier New, Courier, mono">cluadmin</font></b>.
4774 For example:
4775 <br>cluadmin> <b>quit</b></td>
4776 </tr>
4777 </table>
4778
4779 <p>While using <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4780 utility, you can press the <b><font face="Courier New, Courier, mono">Tab</font></b>
4781 key to help identify <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4782 commands. For example, pressing the <b><font face="Courier New, Courier, mono">Tab</font></b>
4783 key at the <b><font face="Courier New, Courier, mono">cluadmin></font></b>
4784 utility displays a list of all the commands. Entering a letter at the prompt
4785 and then pressing the <b><font face="Courier New, Courier, mono">Tab</font></b>
4786 key displays the commands that begin with the specified letter. Specifying
4787 a command and then pressing the <b><font face="Courier New, Courier, mono">Tab</font></b>
4788 key displays a list of all the subcommands that can be specified with that
4789 command.
4790 <p>In addition, you can display the history of <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4791 commands by pressing the up arrow and down arrow keys at the prompt. The
4792 command history is stored in the <b><font face="Courier New, Courier, mono">.cluadmin_history</font></b>
4793 file in your home directory.
4794 <br>&nbsp;
4795 <p>
4796 <hr noshade width="80%">
4797 <p><a NAME="service"></a>
4798 <h1>
4799 4 Service Configuration and Administration</h1>
4800 The following sections describe how to set up and administer cluster services:
4801 <ul>
4802 <li>
4803 <a href="#service-configure">Configuring a Service</a></li>
4804
4805 <li>
4806 <a href="#service-status">Displaying a Service Configuration</a></li>
4807
4808 <li>
4809 <a href="#service-disable">Disabling a Service</a></li>
4810
4811 <li>
4812 <a href="#service-enable">Enabling a Service</a></li>
4813
4814 <li>
4815 <a href="#service-modify">Modifying a Service</a></li>
4816
4817 <li>
4818 <a href="#service-relocate">Relocating a Service</a></li>
4819
4820 <li>
4821 <a href="#service-delete">Deleting a Service</a></li>
4822
4823 <li>
4824 <a href="#service-error">Handling Services in an Error State</a></li>
4825 </ul>
4826
4827 <br>&nbsp;
4828 <h2>
4829 <a NAME="service-configure"></a></h2>
4830
4831 <h2>
4832 4.1 Configuring a Service</h2>
4833 To configure a service, you must prepare the cluster systems for the service.
4834 For example, you must set up any disk storage or applications used in the
4835 services. You can then add information about the service properties and
4836 resources to the cluster database by using the <b>cluadmin</b> utility.
4837 This information is used as parameters to scripts that start and stop the
4838 service.
4839 <p>To configure a service, follow these steps
4840 <ol>
4841 <li>
4842 If applicable, create a script that will start and stop the application
4843 used in the service. See <a href="#service-scripts">Creating Service Scripts</a>
4844 for information.</li>
4845
4846 <br>&nbsp;
4847 <li>
4848 Gather information about service resources and properties. See <a href="#service-gather">Gathering
4849 Service Information</a> for information.</li>
4850
4851 <br>&nbsp;
4852 <li>
4853 Set up the file systems or raw devices that the service will use. See <a href="#service-storage">Configuring
4854 Service Disk Storage</a> for information.</li>
4855
4856 <br>&nbsp;
4857 <li>
4858 Ensure that the application software can run on each cluster system and
4859 that the service script, if any, can start and stop the service application.
4860 See <a href="#service-app">Verifying Application Software and Service Scripts</a>
4861 for information.</li>
4862
4863 <br>&nbsp;
4864 <li>
4865 Back up the <b><font face="Courier New, Courier, mono">/etc/cluster.conf</font></b>
4866 file. See <a href="#cluster-backup">Backing Up and Restoring the Cluster
4867 Database</a> for information.</li>
4868
4869 <br>&nbsp;
4870 <li>
4871 Invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
4872 utility and specify the <b><font face="Courier New, Courier, mono">service
4873 add </font></b>command. You will be prompted for information about the
4874 service resources and properties obtained in step 2. If the service passes
4875 the configuration checks, it will be started on the cluster system on which
4876 you are running <b><font face="Courier New, Courier, mono">cluadmin</font></b>,
4877 unless you choose to keep the service disabled. For example:</li>
4878
4879 <pre>cluadmin> <b>service add</b></pre>
4880 </ol>
4881 For more information about adding a cluster service, see the following:
4882 <br>&nbsp;
4883 <ul>
4884 <li>
4885 <a href="#service-dbase">Setting Up an Oracle Service</a></li>
4886
4887 <li>
4888 <a href="#service-mysql">Setting Up a MySQL Service</a></li>
4889
4890 <li>
4891 <a href="#service-db2">Setting Up a DB2 Service</a></li>
4892
4893 <li>
4894 <a href="#service-apache">Setting Up an Apache Service</a></li>
4895
4896 <li>
4897 <a href="#service-nfs">Setting Up an NFS Service</a></li>
4898 </ul>
4899 See <font size=+0><a href="#software-manual">Cluster Database Fields</a></font>
4900 for a description of the service fields in the database.
4901 <p><a NAME="service-gather"></a>
4902 <h3>
4903 4.1.1 Gathering Service Information</h3>
4904 Before you create a service, you must gather information about the service
4905 resources and properties. When you add a service to the cluster database,
4906 the <b><font face="Courier New, Courier, mono">cluadmin</font></b> utility
4907 prompts you for this information.
4908 <p>In some cases, you can specify multiple resources for a service. For
4909 example, you can specify multiple IP addresses and disk devices.
4910 <p>The service properties and resources that you can specify are described
4911 in the following table.
4912 <dl>&nbsp;
4913 <table BORDER CELLSPACING=0 CELLPADDING=5 WIDTH="100%" >
4914 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#99CCCC">
4915 <td WIDTH="22%" HEIGHT="21">
4916 <h3>
4917 Service Property or Resource</h3>
4918 </td>
4919
4920 <td WIDTH="78%" HEIGHT="21">
4921 <h3>
4922 Description</h3>
4923 </td>
4924 </tr>
4925
4926 <tr ALIGN=LEFT VALIGN=TOP>
4927 <td WIDTH="22%" HEIGHT="34"><b>Service name</b></td>
4928
4929 <td WIDTH="78%" HEIGHT="34">Each service must have a unique name. A service
4930 name can consist of one to 63 characters and must consist of a combination
4931 of letters (either uppercase or lowercase), integers, underscores, periods,
4932 and dashes. However, a service name must begin with a letter or an underscore.&nbsp;</td>
4933 </tr>
4934
4935 <tr ALIGN=LEFT VALIGN=TOP>
4936 <td WIDTH="22%" HEIGHT="32"><b>Preferred member</b></td>
4937
4938 <td WIDTH="78%" HEIGHT="32">Specify the cluster system, if any, on which
4939 you want the service to run unless failover has occurred or unless you
4940 manually relocate the service.&nbsp;</td>
4941 </tr>
4942
4943 <tr ALIGN=LEFT VALIGN=TOP>
4944 <td WIDTH="22%" HEIGHT="101"><b>Preferred member relocation policy&nbsp;</b>
4945 <p>&nbsp;</td>
4946
4947 <td WIDTH="78%" HEIGHT="101">If you enable this policy, the service will
4948 automatically relocate to its preferred member when that system joins the
4949 cluster. If you disable this policy, the service will remain running on
4950 the non-preferred member. For example, if you enable this policy and the
4951 failed preferred member for the service reboots and joins the cluster,
4952 the service will automatically restart on the preferred member.&nbsp;</td>
4953 </tr>
4954
4955 <tr ALIGN=LEFT VALIGN=TOP>
4956 <td WIDTH="22%"><b>Script location</b></td>
4957
4958 <td WIDTH="78%">If applicable, specify the full path name for the script
4959 that will be used to start and stop the service. See <a href="#service-scripts">Creating
4960 Service Scripts</a> for more information.</td>
4961 </tr>
4962
4963 <tr ALIGN=LEFT VALIGN=TOP>
4964 <td WIDTH="22%"><b>IP address</b></td>
4965
4966 <td WIDTH="78%">You can assign one or more Internet protocol (IP) addresses
4967 to a service. This IP address (sometimes called a "floating" IP address)
4968 is different from the IP address associated with the host name Ethernet
4969 interface for a cluster system, because it is automatically relocated along
4970 with the service resources, when failover occurs. If clients use this IP
4971 address to access the service, they do not know which cluster system is
4972 running the service, and failover is transparent to the clients.&nbsp;
4973 <p>Note that cluster members must have network interface cards configured
4974 in the IP subnet of each IP address used in a service.
4975 <p>You can also specify netmask and broadcast addresses for each IP address.
4976 If you do not specify this information, the cluster uses the netmask and
4977 broadcast addresses from the network interconnect in the subnet.&nbsp;</td>
4978 </tr>
4979
4980 <tr ALIGN=LEFT VALIGN=TOP>
4981 <td WIDTH="22%" HEIGHT="48"><b>Disk partition, owner, group, and access
4982 mode</b></td>
4983
4984 <td WIDTH="78%" HEIGHT="48">Specify each shared disk partition used in
4985 a service. In addition, you can specify the owner, group, and access mode
4986 (for example, 755) for each mount point or raw device.&nbsp;</td>
4987 </tr>
4988
4989 <tr ALIGN=LEFT VALIGN=TOP>
4990 <td WIDTH="22%" HEIGHT="146"><b>Mount points, file system type,&nbsp; mount
4991 and NFS export options</b></td>
4992
4993 <td WIDTH="78%" HEIGHT="146">If you are using a file system, you must specify
4994 the type of file system, a mount point, and any mount options. Mount options
4995 that you can specify are the standard file system mount options that are
4996 described in the <b><font face="Courier New, Courier, mono">mount.8</font></b>
4997 manpage. If you are using a raw device, you do not have to specify mount
4998 information.&nbsp;
4999 <p>The ext2and ext3 file systems are the recommended file systems for a
5000 cluster. Although you can use a different file system in a cluster,&nbsp;
5001 other file system types such as reiserfs tested.&nbsp;
5002 <p>You must specify whether you want to enable forced unmount for a file
5003 system. Forced unmount enables the cluster service management infrastructure
5004 to unmount a file system even if it is being accessed by an application
5005 or user (that is, even if the file system is "busy"). This is accomplished
5006 by terminating any applications that are accessing the file system.&nbsp;
5007 <p>In addition, you are asked whether you wish to NFS export the filesystem
5008 and if so, what access permissions should be applied.&nbsp; Refer to <a href="#service-nfs">Creating
5009 NFS Services</a> for details.&nbsp;</td>
5010 </tr>
5011
5012 <tr ALIGN=LEFT VALIGN=TOP>
5013 <td WIDTH="22%"><b>Disable service policy</b></td>
5014
5015 <td WIDTH="78%">If you do not want to automatically start a service after
5016 it is added to the cluster, you can choose to keep the new service disabled,
5017 until an administrator explicitly enables the service.</td>
5018 </tr>
5019 </table>
5020 </dl>
5021 <a NAME="service-scripts"></a>
5022 <h3>
5023 4.1.2 Creating Service Scripts</h3>
5024 For services that include an application, you must create a script that
5025 contains specific instructions to start and stop the application (for example,
5026 a database application). The script will be called with a <b><font face="Courier New, Courier, mono">start</font></b>
5027 or <b><font face="Courier New, Courier, mono">stop</font></b> argument
5028 and will run at service start time and stop time. The script should be
5029 similar to the scripts found in the System V <b><font face="Courier New, Courier, mono">init</font></b>
5030 directory.
5031 <p><i>Editorial comment: Add description of status argument.</i>
5032 <p>The <b><font face="Courier New, Courier, mono">/usr/share/cluster/doc/services/examples</font></b>
5033 directory contains a template that you can use to create service scripts,
5034 in addition to examples of scripts. See <a href="#service-dbase">Setting
5035 Up an Oracle Service</a>, <a href="#service-mysql">Setting Up a MySQL Service</a>,
5036 <a href="#service-apache">Setting
5037 Up an Apache Service</a>, and <a href="#service-db2">Setting Up a DB2 Service</a>
5038 for sample scripts.
5039 <p><a NAME="service-storage"></a>
5040 <h3>
5041 4.1.3 Configuring Service Disk Storage</h3>
5042 Before you create a service, set up the shared file systems and raw devices
5043 that the service will use. See <a href="#hardware-storage">Configuring
5044 Shared Disk Storage</a> for more information.
5045 <p>If you are using raw devices in a cluster service, you can use the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices</font></b>
5046 file to bind the devices at boot time. Edit the file and specify the raw
5047 character devices and block devices that you want to bind each time the
5048 system boots. See <a href="#software-rawdevices">Editing the rawdevices
5049 File</a> for more information.
5050 <p>Note that software RAID, SCSI adapter-based RAID, and host-based RAID
5051 are not supported for shared disk storage.
5052 <p>You should adhere to these <b>service disk storage recommendations</b>:
5053 <ul>
5054 <li>
5055 For optimal performance, use a 4 KB block size when creating file systems.
5056 Note that some of the <b><font face="Courier New, Courier, mono">mkfs</font></b>
5057 file system build utilities default to a 1 KB block size, which can cause
5058 long <b><font face="Courier New, Courier, mono">fsck</font></b> times.</li>
5059
5060 <br>&nbsp;
5061 <li>
5062 For large file systems, use the <b><font face="Courier New, Courier, mono">mount</font></b>
5063 command with the <b><font face="Courier New, Courier, mono">nocheck</font></b>
5064 option to bypass code that checks all the block groups on the partition.
5065 Specifying the <b><font face="Courier New, Courier, mono">nocheck</font></b>
5066 option can significantly decrease the time required to mount a large file
5067 system.</li>
5068 </ul>
5069 <a NAME="service-app"></a>
5070 <h3>
5071 4.1.4 Verifying Application Software and Service Scripts</h3>
5072 Before you set up a service, install any application that will be used
5073 in a service on each system. After you install the application, verify
5074 that the application runs and can access shared disk storage. To prevent
5075 data corruption, do not run the application simultaneously on both systems.
5076 <p>If you are using a script to start and stop the service application,
5077 you must install and test the script on both cluster systems, and verify
5078 that it can be used to start and stop the application. See <a href="#service-scripts">Creating
5079 Service Scripts</a> for information.
5080 <p><a NAME="service-dbase"></a>
5081 <h3>
5082 4.1.5 Setting Up an Oracle Service</h3>
5083 A database service can serve highly-available data to a database application.
5084 The application can then provide network access to database client systems,
5085 such as Web servers. If the service fails over, the application accesses
5086 the shared database data through the new cluster system. A network-accessible
5087 database service is usually assigned an IP address, which is failed over
5088 along with the service to maintain transparent access for clients.
5089 <p>This section provides an example of setting up a cluster service for
5090 an Oracle database. Although the variables used in the service scripts
5091 depend on the specific Oracle configuration, the example may help you set
5092 up a service for your environment. See <a href="#app-tuning">Tuning Oracle
5093 Services</a> for information about improving service performance.
5094 <p>In the example that follows:
5095 <ul>
5096 <li>
5097 The service includes one IP address for the Oracle clients to use.</li>
5098
5099 <br>&nbsp;
5100 <li>
5101 The service has two mounted file systems, one for the Oracle software (<b><font face="Courier New, Courier, mono">/u01</font></b>)
5102 and the other for the Oracle database (<b><font face="Courier New, Courier, mono">/u02</font></b>),
5103 which were set up before the service was added.</li>
5104
5105 <br>&nbsp;
5106 <li>
5107 An Oracle administration account with the name <b><font face="Courier New, Courier, mono">oracle</font></b>
5108 was created on both cluster systems before the service was added.</li>
5109
5110 <br>&nbsp;
5111 <li>
5112 Network access in this example is through Perl DBI proxy.</li>
5113
5114 <br>&nbsp;
5115 <li>
5116 The administration directory is on a shared disk that is used in conjunction
5117 with the Oracle service (for example, <b><font face="Courier New, Courier, mono">/u01/app/oracle/admin/db1</font></b>).</li>
5118 </ul>
5119 The Oracle service example uses five scripts that must be placed in <b><font face="Courier New, Courier, mono">/home/oracle</font></b>
5120 and owned by the Oracle administration account. The <b><font face="Courier New, Courier, mono">oracle</font></b>
5121 script is used to start and stop the Oracle service. Specify this script
5122 when you add the service. This script calls the other Oracle example scripts.
5123 The <b><font face="Courier New, Courier, mono">startdb</font></b> and <b><font face="Courier New, Courier, mono">stopdb</font></b>
5124 scripts start and stop the database. The <b><font face="Courier New, Courier, mono">startdbi</font></b>
5125 and <b><font face="Courier New, Courier, mono">stopdbi</font></b> scripts
5126 start and stop a Web application that has been written by using Perl scripts
5127 and modules and is used to interact with the Oracle database. Note that
5128 there are many ways for an application to interact with an Oracle database.
5129 <p>The following is an example of the <b><font face="Courier New, Courier, mono">oracle</font></b>
5130 script, which is used to start and stop the Oracle service. Note that the
5131 script is run as user <b><font face="Courier New, Courier, mono">oracle</font></b>,
5132 instead of <b><font face="Courier New, Courier, mono">root</font></b>.
5133 <pre><font size=-1>#!/bin/sh
5134 #
5135 # Cluster service script to start/stop oracle
5136 #
5137
5138 cd /home/oracle
5139
5140 case $1 in
5141 'start')
5142 &nbsp;&nbsp;&nbsp; su - oracle -c ./startdbi
5143 &nbsp;&nbsp;&nbsp; su - oracle -c ./startdb
5144 &nbsp;&nbsp;&nbsp; ;;
5145 'stop')
5146 &nbsp;&nbsp;&nbsp; su - oracle -c ./stopdb
5147 &nbsp;&nbsp;&nbsp; su - oracle -c ./stopdbi
5148 &nbsp;&nbsp;&nbsp; ;;
5149 esac</font></pre>
5150 The following is an example of the <b><font face="Courier New, Courier, mono">startdb</font></b>
5151 script, which is used to start the Oracle Database Server instance:
5152 <pre><font size=-1>#!/bin/sh
5153 #
5154
5155 #
5156 # Script to start the Oracle Database Server instance.
5157 #
5158 ###########################################################################
5159 #
5160 # ORACLE_RELEASE
5161 #
5162 # Specifies the Oracle product release.
5163 #
5164 ###########################################################################
5165
5166 ORACLE_RELEASE=8.1.6
5167
5168 ###########################################################################
5169 #
5170 # ORACLE_SID
5171 #
5172 # Specifies the Oracle system identifier or "sid", which is the name of the
5173 # Oracle Server instance.
5174 #
5175 ###########################################################################
5176
5177 export ORACLE_SID=TESTDB
5178
5179 ###########################################################################
5180 #
5181 # ORACLE_BASE
5182 #
5183 # Specifies the directory at the top of the Oracle software product and
5184 # administrative file structure.
5185 #
5186 ###########################################################################
5187
5188 export ORACLE_BASE=/u01/app/oracle
5189
5190 ###########################################################################
5191 #
5192 # ORACLE_HOME
5193 #
5194 # Specifies the directory containing the software for a given release.
5195 # The Oracle recommended value is $ORACLE_BASE/product/&lt;release>
5196 #
5197 ###########################################################################
5198
5199 export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
5200
5201 ###########################################################################
5202 #
5203 # LD_LIBRARY_PATH
5204 #
5205 # Required when using Oracle products that use shared libraries.
5206 #
5207 ###########################################################################
5208
5209 export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
5210
5211 ###########################################################################
5212 #
5213 # PATH
5214 #
5215 # Verify that the users search path includes $ORCLE_HOME/bin&nbsp;
5216 #
5217 ###########################################################################
5218
5219 export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
5220
5221 ###########################################################################
5222 #
5223 # This does the actual work.
5224 #
5225 # The oracle server manager is used to start the Oracle Server instance
5226 # based on the initSID.ora initialization parameters file specified.
5227 #&nbsp;
5228 ###########################################################################
5229
5230 /u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl &lt;&lt; EOF
5231 spool /home/oracle/startdb.log
5232 connect internal;
5233 startup pfile = /u01/app/oracle/admin/db1/pfile/initTESTDB.ora open;
5234 spool off
5235 EOF
5236
5237 exit 0</font>
5238
5239 </pre>
5240 The following is an example of the <b><font face="Courier New, Courier, mono">stopdb</font></b>
5241 script, which is used to stop the Oracle Database Server instance:
5242 <pre><font size=-1>#!/bin/sh
5243 #
5244 #
5245 # Script to STOP the Oracle Database Server instance.
5246 #
5247 ###########################################################################
5248 #
5249 # ORACLE_RELEASE
5250 #
5251 # Specifies the Oracle product release.
5252 #
5253 ###########################################################################
5254
5255 ORACLE_RELEASE=8.1.6
5256
5257 ###########################################################################
5258 #
5259 # ORACLE_SID
5260 #
5261 # Specifies the Oracle system identifier or "sid", which is the name of the
5262 # Oracle Server instance.
5263 #
5264 ###########################################################################
5265
5266 export ORACLE_SID=TESTDB
5267
5268 ###########################################################################
5269 #
5270 # ORACLE_BASE
5271 #
5272 # Specifies the directory at the top of the Oracle software product and
5273 # administrative file structure.
5274 #
5275 ###########################################################################
5276
5277 export ORACLE_BASE=/u01/app/oracle
5278
5279 ###########################################################################
5280 #
5281 # ORACLE_HOME
5282 #
5283 # Specifies the directory containing the software for a given release.
5284 # The Oracle recommended value is $ORACLE_BASE/product/&lt;release>
5285 #
5286 ###########################################################################
5287
5288 export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
5289
5290 ###########################################################################
5291 #
5292 # LD_LIBRARY_PATH
5293 #
5294 # Required when using Oracle products that use shared libraries.
5295 #
5296 ###########################################################################
5297
5298 export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
5299
5300 ###########################################################################
5301 #
5302 # PATH
5303 #
5304 # Verify that the users search path includes $ORCLE_HOME/bin&nbsp;
5305 #
5306 ###########################################################################
5307
5308 export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
5309
5310 ###########################################################################
5311 #
5312 # This does the actual work.
5313 #
5314 # The oracle server manager is used to STOP the Oracle Server instance
5315 # in a tidy fashion.
5316 #&nbsp;
5317 ###########################################################################
5318
5319 /u01/app/oracle/product/${ORACLE_RELEASE}/bin/svrmgrl &lt;&lt; EOF
5320 spool /home/oracle/stopdb.log
5321 connect internal;
5322 shutdown abort;
5323 spool off
5324 EOF
5325
5326 exit 0</font>
5327
5328 </pre>
5329 The following is an example of the <b><font face="Courier New, Courier, mono">startdbi</font></b>
5330 script, which is used to start a networking DBI proxy daemon:
5331 <pre><font size=-1>#!/bin/sh
5332 #
5333 #
5334 ###########################################################################
5335 #
5336 # This script allows are Web Server application (perl scripts) to
5337 # work in a distributed environment. The technology we use is
5338 # base upon the DBD::Oracle/DBI CPAN perl modules.
5339 #
5340 # This script STARTS the networking DBI Proxy daemon.
5341 #
5342 ###########################################################################
5343
5344 export ORACLE_RELEASE=8.1.6
5345 export ORACLE_SID=TESTDB
5346 export ORACLE_BASE=/u01/app/oracle
5347 export ORACLE_HOME=/u01/app/oracle/product/${ORACLE_RELEASE}
5348 export LD_LIBRARY_PATH=/u01/app/oracle/product/${ORACLE_RELEASE}/lib
5349 export PATH=$PATH:/u01/app/oracle/product/${ORACLE_RELEASE}/bin
5350
5351 #
5352 # This line does the real work.
5353 #
5354
5355 /usr/bin/dbiproxy --logfile /home/oracle/dbiproxy.log --localport 1100 &amp;
5356
5357 exit 0</font>
5358
5359 </pre>
5360 The following is an example of the <b><font face="Courier New, Courier, mono">stopdbi</font></b>
5361 script, which is used to stop a networking DBI proxy daemon:
5362 <pre><font size=-1>#!/bin/sh
5363 #
5364 #
5365 #######################################################################
5366 #
5367 # Our Web Server application (perl scripts) work in a distributed&nbsp;
5368 # environment. The technology we use is base upon the DBD::Oracle/DBI&nbsp;
5369 # CPAN perl modules.
5370 #
5371 # This script STOPS the required networking DBI Proxy daemon.
5372 #
5373 ########################################################################
5374
5375
5376 PIDS=$(ps ax | grep /usr/bin/dbiproxy | awk '{print $1}')
5377
5378 for pid in $PIDS
5379 do
5380 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; kill -9 $pid
5381 done
5382
5383 exit 0</font>
5384
5385 </pre>
5386 The following example shows how to use <b><font face="Courier New, Courier, mono">cluadmin</font></b>
5387 to add an Oracle service.
5388 <pre><font size=-1>cluadmin> <b>service add oracle
5389
5390 </b>&nbsp; The user interface will prompt you for information about the service.
5391 &nbsp; Not all information is required for all services.
5392
5393 &nbsp; Enter a question mark (?) at a prompt to obtain help.
5394
5395 &nbsp; Enter a colon (:) and a single-character command at a prompt to do
5396 &nbsp; one of the following:
5397
5398 &nbsp; c - Cancel and return to the top-level cluadmin command
5399 &nbsp; r - Restart to the initial prompt while keeping previous responses
5400 &nbsp; p - Proceed with the next prompt
5401 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5402 Preferred member [None]: <b><font face="Courier New, Courier, mono">ministor0
5403 </font></b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
5404 </font></b>User script (e.g., /usr/foo/script or None) [None]: <b><font face="Courier New, Courier, mono">/home/oracle/oracle
5405
5406 </font></b>Do you want to add an IP address to the service (yes/no/?): <b><font face="Courier New, Courier, mono">yes
5407
5408 </font></b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address Information&nbsp;&nbsp;&nbsp;
5409
5410 IP address: <b><font face="Courier New, Courier, mono">10.1.16.132
5411 </font></b>Netmask (e.g. 255.255.255.0 or None) [None]: <b><font face="Courier New, Courier, mono">255.255.255.0
5412 </font></b>Broadcast (e.g. X.Y.Z.255 or None) [None]: <b><font face="Courier New, Courier, mono">10.1.16.255
5413
5414 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
5415 &nbsp;or are you (f)inished adding IP addresses: <b><font face="Courier New, Courier, mono">f
5416
5417 </font></b>Do you want to add a disk device to the service (yes/no/?): <b><font face="Courier New, Courier, mono">yes
5418
5419 </font></b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Disk Device Information
5420
5421 Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono">/dev/sda1
5422 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono">ext2
5423 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono">/u01
5424 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono">[Return]
5425 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
5426
5427 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,&nbsp;
5428 &nbsp; or are you (f)inished adding device information: <b><font face="Courier New, Courier, mono">a
5429
5430 </font></b>Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono">/dev/sda2
5431 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono">ext2
5432 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono">/u02
5433 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono">[Return]
5434 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
5435
5436
5437 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,&nbsp;
5438 &nbsp; or are you (f)inished adding devices: <b><font face="Courier New, Courier, mono">f
5439
5440 </font></b>Disable service (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">no
5441
5442 </font></b>name: oracle
5443 disabled: no
5444 preferred node: ministor0
5445 relocate: yes
5446 user script: /home/oracle/oracle
5447 IP address 0: 10.1.16.132
5448 &nbsp; netmask 0: 255.255.255.0
5449 &nbsp; broadcast 0: 10.1.16.255
5450 device 0: /dev/sda1
5451 &nbsp; mount point, device 0: /u01
5452 &nbsp; mount fstype, device 0: ext2
5453 &nbsp; force unmount, device 0: yes
5454 device 1: /dev/sda2
5455 &nbsp; mount point, device 1: /u02
5456 &nbsp; mount fstype, device 1: ext2
5457 &nbsp; force unmount, device 1: yes
5458
5459 Add oracle service as shown? (yes/no/?) <b>y
5460 </b>notice: Starting service oracle ...
5461 info: Starting IP address 10.1.16.132
5462 info: Sending Gratuitous arp for 10.1.16.132 (00:90:27:EB:56:B8)
5463 notice: Running user script '/home/oracle/oracle start'
5464 notice, Server starting
5465 Added oracle.
5466 cluadmin></font>
5467
5468 </pre>
5469 <a NAME="service-mysql"></a>
5470 <h3>
5471 4.1.6 Setting Up a MySQL Service</h3>
5472 A database service can serve highly-available data to a database application.
5473 The application can then provide network access to database client systems,
5474 such as Web servers. If the service fails over, the application accesses
5475 the shared database data through the new cluster system. A network-accessible
5476 database service is usually assigned an IP address, which is failed over
5477 along with the service to maintain transparent access for clients.
5478 <p>You can set up a MySQL database service in a cluster. Note that MySQL
5479 does not provide full transactional semantics; therefore, it may not be
5480 suitable for update-intensive applications.
5481 <p>An example of a MySQL database service is as follows:
5482 <ul>
5483 <li>
5484 The MySQL server and the database instance both reside on a file system
5485 that is located on a disk partition on shared storage. This allows the
5486 database data and its run-time state information, which is required for
5487 failover, to be accessed by both cluster systems. In the example, the file
5488 system is mounted as <b><font face="Courier New, Courier, mono">/var/mysql</font></b>,
5489 using the shared disk partition <b><font face="Courier New, Courier, mono">/dev/sda1</font></b>.</li>
5490
5491 <br>&nbsp;
5492 <li>
5493 An IP address is associated with the MySQL database to accommodate network
5494 access by clients of the database service. This IP address will automatically
5495 be migrated among the cluster members as the service fails over. In the
5496 example below, the IP address is 10.1.16.12.</li>
5497
5498 <br>&nbsp;
5499 <li>
5500 The script that is used to start and stop the MySQL database is the standard
5501 System V <b><font face="Courier New, Courier, mono">init</font></b> script,
5502 which has been modified with configuration parameters to match the file
5503 system on which the database is installed.</li>
5504
5505 <br>&nbsp;
5506 <li>
5507 By default, a client connection to a MySQL server will time out after eight
5508 hours of inactivity. You can modify this connection limit by setting the
5509 <b><font face="Courier New, Courier, mono">wait_timeout</font></b>
5510 variable when you start <b><font face="Courier New, Courier, mono">mysqld</font></b>.</li>
5511
5512 <br>&nbsp;
5513 <p>&nbsp;
5514 <p>To check if a MySQL server has timed out, invoke the <b><font face="Courier New, Courier, mono">mysqladmin
5515 version</font></b> command and examine the uptime. Invoke the query again
5516 to automatically reconnect to the server.
5517 <p>Depending on the Linux distribution, one of the following messages may
5518 indicate a MySQL server timeout:
5519 <br>&nbsp;
5520 <pre>CR_SERVER_GONE_ERROR
5521 CR_SERVER_LOST</pre>
5522 </ul>
5523 A sample script to start and stop the MySQL database is located in <b><font face="Courier New, Courier, mono">/usr/share/cluster/doc/services/examples/mysql.server</font></b>,
5524 and is shown below:
5525 <pre><font size=-1>#!/bin/sh
5526 # Copyright Abandoned 1996 TCX DataKonsult AB &amp; Monty Program KB &amp; Detron HB
5527 # This file is public domain and comes with NO WARRANTY of any kind
5528
5529 # Mysql daemon start/stop script.
5530
5531 # Usually this is put in /etc/init.d (at least on machines SYSV R4
5532 # based systems) and linked to /etc/rc3.d/S99mysql. When this is done
5533 # the mysql server will be started when the machine is started.
5534
5535 # Comments to support chkconfig on RedHat Linux
5536 # chkconfig: 2345 90 90
5537 # description: A very fast and reliable SQL database engine.
5538
5539 PATH=/sbin:/usr/sbin:/bin:/usr/bin
5540 basedir=/var/mysql
5541 bindir=/var/mysql/bin
5542 datadir=/var/mysql/var
5543 pid_file=/var/mysql/var/mysqld.pid
5544 mysql_daemon_user=root&nbsp; # Run mysqld as this user.
5545 export PATH
5546
5547 mode=$1
5548
5549 if test -w /&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # determine if we should look at the root config file
5550 then&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # or user config file
5551 &nbsp; conf=/etc/my.cnf
5552 else
5553 &nbsp; conf=$HOME/.my.cnf&nbsp;&nbsp;&nbsp; # Using the users config file
5554 fi
5555
5556 # The following code tries to get the variables safe_mysqld needs from the
5557 # config file.&nbsp; This isn't perfect as this ignores groups, but it should
5558 # work as the options doesn't conflict with anything else.
5559
5560 if test -f "$conf"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # Extract those fields we need from config file.
5561 then
5562 &nbsp; if grep "^datadir" $conf > /dev/null
5563 &nbsp; then
5564 &nbsp;&nbsp;&nbsp; datadir=`grep "^datadir" $conf | cut -f 2 -d= | tr -d ' '`
5565 &nbsp; fi
5566 &nbsp; if grep "^user" $conf > /dev/null
5567 &nbsp; then
5568 &nbsp;&nbsp;&nbsp; mysql_daemon_user=`grep "^user" $conf | cut -f 2 -d= | tr -d ' ' | head -1`
5569 &nbsp; fi
5570 &nbsp; if grep "^pid-file" $conf > /dev/null
5571 &nbsp; then
5572 &nbsp;&nbsp;&nbsp; pid_file=`grep "^pid-file" $conf | cut -f 2 -d= | tr -d ' '`
5573 &nbsp; else
5574 &nbsp;&nbsp;&nbsp; if test -d "$datadir"
5575 &nbsp;&nbsp;&nbsp; then
5576 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; pid_file=$datadir/`hostname`.pid
5577 &nbsp;&nbsp;&nbsp; fi
5578 &nbsp; fi
5579 &nbsp; if grep "^basedir" $conf > /dev/null
5580 &nbsp; then
5581 &nbsp;&nbsp;&nbsp; basedir=`grep "^basedir" $conf | cut -f 2 -d= | tr -d ' '`
5582 &nbsp;&nbsp;&nbsp; bindir=$basedir/bin
5583 &nbsp; fi
5584 &nbsp; if grep "^bindir" $conf > /dev/null
5585 &nbsp; then
5586 &nbsp;&nbsp;&nbsp; bindir=`grep "^bindir" $conf | cut -f 2 -d=| tr -d ' '`
5587 &nbsp; fi
5588 fi
5589
5590
5591 # Safeguard (relative paths, core dumps..)
5592 cd $basedir
5593
5594 case "$mode" in
5595 &nbsp; 'start')
5596 &nbsp;&nbsp;&nbsp; # Start daemon
5597
5598 &nbsp;&nbsp;&nbsp; if test -x $bindir/safe_mysqld
5599 &nbsp;&nbsp;&nbsp; then
5600 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # Give extra arguments to mysqld with the my.cnf file. This script may
5601 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # be overwritten at next upgrade.
5602 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; $bindir/safe_mysqld --user=$mysql_daemon_user --pid-file=$pid_file --datadir=$datadir &amp;
5603 &nbsp;&nbsp;&nbsp; else
5604 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; echo "Can't execute $bindir/safe_mysqld"
5605 &nbsp;&nbsp;&nbsp; fi
5606 &nbsp;&nbsp;&nbsp; ;;
5607
5608 &nbsp; 'stop')
5609 &nbsp;&nbsp;&nbsp; # Stop daemon. We use a signal here to avoid having to know the
5610 &nbsp;&nbsp;&nbsp; # root password.
5611 &nbsp;&nbsp;&nbsp; if test -f "$pid_file"
5612 &nbsp;&nbsp;&nbsp; then
5613 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; mysqld_pid=`cat $pid_file`
5614 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; echo "Killing mysqld with pid $mysqld_pid"
5615 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; kill $mysqld_pid
5616 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # mysqld should remove the pid_file when it exits.
5617 &nbsp;&nbsp;&nbsp; else
5618 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; echo "No mysqld pid file found. Looked for $pid_file."
5619 &nbsp;&nbsp;&nbsp; fi
5620 &nbsp;&nbsp;&nbsp; ;;
5621
5622 &nbsp; *)
5623 &nbsp;&nbsp;&nbsp; # usage
5624 &nbsp;&nbsp;&nbsp; echo "usage: $0 start|stop"
5625 &nbsp;&nbsp;&nbsp; exit 1
5626 &nbsp;&nbsp;&nbsp; ;;
5627 esac</font></pre>
5628 The following example shows how to use <b><font face="Courier New, Courier, mono">cluadmin</font></b>
5629 to add a MySQL service.
5630 <pre><font size=-1>cluadmin> <b>service add
5631
5632 </b>&nbsp; The user interface will prompt you for information about the service.
5633 &nbsp; Not all information is required for all services.
5634
5635 &nbsp; Enter a question mark (?) at a prompt to obtain help.
5636
5637 &nbsp; Enter a colon (:) and a single-character command at a prompt to do
5638 &nbsp; one of the following:
5639
5640 &nbsp; c - Cancel and return to the top-level cluadmin command
5641 &nbsp; r - Restart to the initial prompt while keeping previous responses
5642 &nbsp; p - Proceed with the next prompt
5643 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5644 Currently defined services:
5645
5646 &nbsp; databse1
5647 &nbsp; apache2
5648 &nbsp; dbase_home
5649 &nbsp; mp3_failover
5650
5651 Service name: <b><font face="Courier New, Courier, mono">mysql_1
5652 </font></b>Preferred member [None]: <b><font face="Courier New, Courier, mono">devel0
5653 </font></b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
5654 </font></b>User script (e.g., /usr/foo/script or None) [None]: <b><font face="Courier New, Courier, mono">/etc/rc.d/init.d/mysql.server
5655
5656 </font></b>Do you want to add an IP address to the service (yes/no/?): <b><font face="Courier New, Courier, mono">yes
5657
5658 </font></b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address Information&nbsp;&nbsp;&nbsp;
5659
5660 IP address: <b><font face="Courier New, Courier, mono">10.1.16.12
5661 </font></b>Netmask (e.g. 255.255.255.0 or None) [None]: <b><font face="Courier New, Courier, mono">[Return]
5662 </font></b>Broadcast (e.g. X.Y.Z.255 or None) [None]: <b><font face="Courier New, Courier, mono">[Return]
5663
5664 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
5665 &nbsp;or are you (f)inished adding IP addresses: <b><font face="Courier New, Courier, mono">f
5666
5667 </font></b>Do you want to add a disk device to the service (yes/no/?): <b><font face="Courier New, Courier, mono">yes
5668
5669 </font></b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Disk Device Information
5670
5671 Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono">/dev/sda1
5672 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono">ext2
5673 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono">/var/mysql
5674 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono">rw
5675 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
5676
5677 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,&nbsp;
5678 &nbsp; or are you (f)inished adding device information: <b><font face="Courier New, Courier, mono">f
5679
5680 </font></b>Disable service (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
5681
5682 </font></b>name: mysql_1
5683 disabled: yes
5684 preferred node: devel0
5685 relocate: yes
5686 user script: /etc/rc.d/init.d/mysql.server
5687 IP address 0: 10.1.16.12
5688 &nbsp; netmask 0: None
5689 &nbsp; broadcast 0: None
5690 device 0: /dev/sda1
5691 &nbsp; mount point, device 0: /var/mysql
5692 &nbsp; mount fstype, device 0: ext2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5693 &nbsp; mount options, device 0: rw
5694 &nbsp; force unmount, device 0: yes
5695
5696 Add mysql_1 service as shown? (yes/no/?) <b>y
5697 </b>Added mysql_1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
5698 cluadmin></font></pre>
5699 &nbsp;
5700 <p>&nbsp;
5701 <br>&nbsp;
5702 <br>&nbsp;
5703 <br>&nbsp;
5704 <br>&nbsp;
5705 <br>&nbsp;
5706 <br>&nbsp;
5707 <br>&nbsp;
5708 <br>&nbsp;
5709 <p><a NAME="service-db2"></a>
5710 <h3>
5711 4.1.7 Setting Up an DB2 Service</h3>
5712 This section provides an example of setting up a cluster service that will
5713 fail over IBM DB2 Enterprise/Workgroup Edition on a cluster. This example
5714 assumes that NIS is not running on the cluster systems.
5715 <p>To install the software and database on the cluster systems, follow
5716 these steps:
5717 <ol>
5718 <li>
5719 On both cluster systems, log in as root and add the IP address and host
5720 name that will be used to access the DB2 service to <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
5721 file. For example:</li>
5722
5723 <pre>10.1.16.182&nbsp;&nbsp;&nbsp;&nbsp; ibmdb2.class.cluster.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ibmdb2</pre>
5724
5725 <li>
5726 Choose an unused partition on a shared disk to use for hosting DB2 administration
5727 and instance data, and create a file system on it. For example:</li>
5728
5729 <pre># <b>mke2fs /dev/sda3</b></pre>
5730
5731 <li>
5732 Create a mount point on both cluster systems for the file system created
5733 in Step 2. For example:</li>
5734
5735 <pre># <b>mkdir /db2home</b></pre>
5736
5737 <li>
5738 On the first cluster system, <b><font face="Courier New, Courier, mono">devel0</font></b>,
5739 mount the file system created in Step 2 on the mount point created in Step
5740 3. For example:</li>
5741
5742 <pre>devel0# <b>mount -t ext2 /dev/sda3 /db2home</b></pre>
5743
5744 <li>
5745 On the first cluster system, <b><font face="Courier New, Courier, mono">devel0</font></b>,
5746 mount the DB2 cdrom and copy the setup response file included in the distribution
5747 to <b><font face="Courier New, Courier, mono">/root</font></b>. For example:</li>
5748
5749 <pre>devel0% <b>mount -t iso9660 /dev/cdrom /mnt/cdrom
5750 </b>devel0% <b>cp /mnt/cdrom/IBM/DB2/db2server.rsp /root</b></pre>
5751
5752 <li>
5753 Modify the setup response file, <b><font face="Courier New, Courier, mono">db2server.rsp</font></b>,
5754 to reflect local configuration settings. Make sure that the UIDs and GIDs
5755 are reserved on both cluster systems. For example:</li>
5756
5757 <pre>-----------Instance Creation Settings------------
5758 -------------------------------------------------
5759 DB2.UID = 2001
5760 DB2.GID = 2001
5761 DB2.HOME_DIRECTORY = /db2home/db2inst1
5762
5763 -----------Fenced User Creation Settings----------
5764 --------------------------------------------------
5765 UDF.UID = 2000
5766 UDF.GID = 2000
5767 UDF.HOME_DIRECTORY = /db2home/db2fenc1
5768
5769 -----------Instance Profile Registry Settings------
5770 ---------------------------------------------------
5771 DB2.DB2COMM = TCPIP
5772
5773 ----------Administration Server Creation Settings---
5774 ----------------------------------------------------
5775 ADMIN.UID = 2002
5776 ADMIN.GID = 2002
5777 ADMIN.HOME_DIRECTORY = /db2home/db2as
5778
5779 ---------Administration Server Profile Registry Settings-
5780 ---------------------------------------------------------
5781 ADMIN.DB2COMM = TCPIP
5782
5783 ---------Global Profile Registry Settings-------------
5784 ------------------------------------------------------
5785 DB2SYSTEM = ibmdb2</pre>
5786
5787 <li>
5788 Start the installation. For example:</li>
5789
5790 <pre>devel0# <b>cd /mnt/cdrom/IBM/DB2
5791 </b>devel0# <b>./db2setup -d -r /root/db2server.rsp 1>/dev/null 2>/dev/null &amp;</b></pre>
5792
5793 <li>
5794 Check for errors during the installation by examining the installation
5795 log file, <b><font face="Courier New, Courier, mono">/tmp/db2setup.log</font></b>.
5796 Every step in the installation must be marked as <b><font face="Courier New, Courier, mono">SUCCESS</font></b>
5797 at the end of the log file.</li>
5798
5799 <br>&nbsp;
5800 <li>
5801 Stop the DB2 instance and administration server on the first cluster system.
5802 For example:</li>
5803
5804 <pre>devel0# <b>su - db2inst1</b>&nbsp;
5805 devel0# <b>db2stop
5806 </b>devel0# <b>exit
5807 </b>devel0# <b>su - db2as</b>&nbsp;
5808 devel0# <b>db2admin stop
5809 </b>devel0# <b>exit</b></pre>
5810
5811 <li>
5812 Unmount the DB2 instance and administration data partition on the first
5813 cluster system. For example:</li>
5814
5815 <pre>devel0# <b>umount /db2home</b></pre>
5816
5817 <li>
5818 Mount the DB2 instance and administration data partition on the second
5819 cluster system, devel1. For example:</li>
5820
5821 <pre>devel1# <b>mount -t ext2 /dev/sda3 /db2home</b></pre>
5822
5823 <li>
5824 Mount the DB2 cdrom on the second cluster system and remotely copy the
5825 <b><font face="Courier New, Courier, mono">db2server.rsp</font></b>
5826 file to <b><font face="Courier New, Courier, mono">/root</font></b>. For
5827 example:</li>
5828
5829 <pre>devel1# <b>mount -t iso9660 /dev/cdrom /mnt/cdrom
5830 </b>devel1# <b>rcp devel0:/root/db2server.rsp /root</b></pre>
5831
5832 <li>
5833 Start the installation on the second cluster system, <b><font face="Courier New, Courier, mono">devel1</font></b>.
5834 For example:</li>
5835
5836 <pre>devel1# <b>cd /mnt/cdrom/IBM/DB2
5837 </b>devel1# <b>./db2setup -d -r /root/db2server.rsp 1>/dev/null 2>/dev/null &amp;</b></pre>
5838
5839 <li>
5840 Check for errors during the installation by examining the installation
5841 log file. Every step in the installation must be marked as <b><font face="Courier New, Courier, mono">SUCCESS</font></b>
5842 except for the following:</li>
5843
5844 <pre>DB2 Instance Creation&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FAILURE
5845 Update DBM configuration file for TCP/IP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CANCEL
5846 Update parameter DB2COMM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CANCEL
5847 Auto start DB2 Instance&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CANCEL
5848 DB2 Sample Database&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CANCEL
5849 Start DB2 Instance
5850 Administration Server Creation&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FAILURE
5851 Update parameter DB2COMM&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CANCEL
5852 Start Administration Serve&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; CANCEL</pre>
5853
5854 <li>
5855 Test the database installation by invoking the following commands, first
5856 on one cluster system, and then on the other cluster system:</li>
5857
5858 <pre># <b>mount -t ext2 /dev/sda3 /db2home
5859 </b># <b>su - db2inst1
5860 </b># <b>db2start
5861 </b># <b>db2 connect to sample
5862 </b># <b>db2 select tabname from syscat.tables
5863 </b># <b>db2 connect reset
5864 </b># <b>db2stop
5865 </b># <b>exit
5866 </b># <b>umount /db2home</b></pre>
5867
5868 <li>
5869 Create the DB2 cluster start/stop script on the DB2 administration and
5870 instance data partition. For example:</li>
5871
5872 <pre># vi /db2home/ibmdb2
5873 # chmod u+x /db2home/ibmdb2
5874
5875 #!/bin/sh
5876 #
5877 # IBM DB2 Database Cluster Start/Stop Script
5878 #
5879 &nbsp;
5880 DB2DIR=/usr/IBMdb2/V6.1
5881 &nbsp;
5882 case $1 in
5883 "start")
5884 &nbsp;&nbsp; $DB2DIR/instance/db2istrt
5885 &nbsp;&nbsp; ;;
5886 "stop")
5887 &nbsp;&nbsp; $DB2DIR/instance/db2ishut
5888 &nbsp;&nbsp; ;;
5889 esac</pre>
5890
5891 <li>
5892 Modify the <b><font face="Courier New, Courier, mono">/usr/IBMdb2/V6.1/instance/db2ishut</font></b>
5893 file on both cluster systems to forcefully disconnect active applications
5894 before stopping the database. For example:</li>
5895
5896 <pre>for DB2INST in ${DB2INSTLIST?}; do
5897 &nbsp;&nbsp;&nbsp; echo "Stopping DB2 Instance "${DB2INST?}"..." >> ${LOGFILE?}
5898 &nbsp;&nbsp;&nbsp; find_homedir ${DB2INST?}
5899 &nbsp;&nbsp;&nbsp; INSTHOME="${USERHOME?}"
5900 &nbsp;&nbsp;&nbsp; su ${DB2INST?} -c " \
5901 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; source ${INSTHOME?}/sqllib/db2cshrc&nbsp;&nbsp; 1> /dev/null 2> /dev/null; \
5902 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ${INSTHOME?}/sqllib/db2profile 1> /dev/null 2> /dev/null; \
5903 >>>>>>> db2 force application all; \
5904 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; db2stop&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "&nbsp; 1>> ${LOGFILE?} 2>> ${LOGFILE?}
5905 &nbsp;&nbsp;&nbsp; if [ $? -ne 0 ]; then
5906 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ERRORFOUND=${TRUE?}
5907 &nbsp;&nbsp;&nbsp; fi
5908 done</pre>
5909
5910 <li>
5911 Edit the <b><font face="Courier New, Courier, mono">inittab</font></b>
5912 file and comment out the DB2 line to enable the cluster service to handle
5913 starting and stopping the DB2 service. This is usually the last line in
5914 the file. For example:</li>
5915
5916 <pre># db:234:once:/etc/rc.db2 > /dev/console 2>&amp;1 # Autostart DB2 Services</pre>
5917 </ol>
5918 Use the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
5919 utility to create the DB2 service. Add the IP address from Step 1, the
5920 shared partition created in Step 2, and the start/stop script created in
5921 Step 16.
5922 <p>To install the DB2 client on a third system, invoke these commands:
5923 <pre>display# <b>mount -t iso9660 /dev/cdrom /mnt/cdrom
5924 </b>display# <b>cd /mnt/cdrom/IBM/DB2
5925 </b>display# <b>./db2setup -d -r /root/db2client.rsp</b></pre>
5926 To configure a DB2 client, add the service's IP address to the <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
5927 file on the client system. For example:
5928 <pre>10.1.16.182&nbsp;&nbsp; ibmdb2.lowell.mclinux.com&nbsp;&nbsp; ibmdb2</pre>
5929 Then, add the following entry to the <b><font face="Courier New, Courier, mono">/etc/services</font></b>
5930 file on the client system:
5931 <pre>db2cdb2inst1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 50000/tcp</pre>
5932 Invoke the following commands on the client system:
5933 <pre># <b>su - db2inst1
5934 </b># <b>db2 catalog tcpip node ibmdb2 remote ibmdb2 server db2cdb2inst1
5935 </b># <b>db2 catalog database sample as db2 at node ibmdb2
5936 </b># <b>db2 list node directory
5937 </b># <b>db2 list database directory</b></pre>
5938 To test the database from the DB2 client system, invoke the following commands:
5939 <pre># <b>db2 connect to db2 user db2inst1 using ibmdb2
5940 </b># <b>db2 select tabname from syscat.tables
5941 </b># <b>db2 connect reset</b></pre>
5942
5943 <p><br><a NAME="service-nfs"></a>
5944 <h3>
5945 4.1.8 Setting Up an NFS Service</h3>
5946 <i>(Editorial Note: the heading numbers need to be re-indexed.)</i>
5947 <p>Highly available network fileservers (NFS) are one of the key strengths
5948 of the clustering infrastructure.&nbsp; Advantages of clustered NFS services
5949 include:
5950 <ul>
5951 <li>
5952 Ensures that NFS clients maintain uninterrupted access to key data in the
5953 event of server failure.</li>
5954
5955 <li>
5956 Facilitates planned maintenance by allowing you to transparently relocate
5957 NFS services to one cluster member, allowing you to fix or upgrade the
5958 other cluster member.</li>
5959
5960 <li>
5961 Allows you to setup an active-active configuration to maximize equipment
5962 utilization.&nbsp; More details on active-active configurations appear
5963 below.</li>
5964 </ul>
5965
5966 <h4>
5967 NFS Server Requirements</h4>
5968 If you intend to create highly available NFS services, then there are a
5969 few requirements which must be met by each cluster server. <i>(Note: these
5970 requirements do not pertain to NFS client systems.)
5971 </i>These requirements
5972 include:
5973 <ul>
5974 <li>
5975 Kernel support for the NFS server must be enabled.&nbsp; NFS can be either
5976 configured statically or as a module.&nbsp; Both NFS V2 and NFS V3 are
5977 supported.</li>
5978
5979 <li>
5980 The kernel support for NFS provided with this Red Hat release incorporates
5981 enhancements (initially developed by Mission Critical Linux Inc.) which
5982 allow for transparent relocation of NFS services.&nbsp; These kernel enhancements
5983 prevent NFS clients from receiving <i>Stale file handle</i> errors after
5984 an NFS service has been relocated.&nbsp; If you are using kernel sources
5985 which do not include these NFS enhancements, you will still be able to
5986 configure and run NFS services within the cluster; but you will see warning
5987 messages emitted during service start and stop pointing out the absence
5988 of these kernel enhancements.</li>
5989
5990 <li>
5991 The NFS daemons must be running on all cluster servers.&nbsp; This is accomplished
5992 by enabling the <b>nfs</b> init.d run level script.&nbsp; For example:
5993 <b>chkconfig
5994 --level 345 nfs on</b>. NFS services will not start unless the following
5995 NFS daemons are running: <b>nfsd</b>, <b>rpc.mountd</b>,
5996 <b>rpc.statd</b>.</li>
5997
5998 <li>
5999 Filesystem mounts and their associated exports for clustered NFS services
6000 should not be included in <b>/etc/fstab</b> and <b>/etc/exports</b> respectively.&nbsp;
6001 Rather, for clustered NFS services, the parameters describing mounts and
6002 exports are entered via the <b>cluadmin</b> configuration utility.</li>
6003 </ul>
6004
6005 <h4>
6006 Gathering NFS Service Configuration Parameters</h4>
6007 In preparation of configuring NFS services you need to plan out how the
6008 filesystems will be exported and failed over.&nbsp; The following information
6009 is required in order to configure NFS services:
6010 <ul>
6011 <li>
6012 <b>Service Name</b> - A name used to uniquely identify this service within
6013 the cluster.</li>
6014
6015 <li>
6016 <b>Preferred Member</b> - Defines which system will be the NFS server for
6017 this service if more than one cluster member is operational.</li>
6018
6019 <li>
6020 <b>Relocation Policy</b> - whether to relocate the service to the preferred
6021 member if the preferred member wasn't running at the time the service was
6022 initially started.&nbsp; This parameter is useful as a means of load balancing
6023 the cluster members as NFS servers by assigning half the load to each.</li>
6024
6025 <li>
6026 <b>IP Address</b> - NFS clients access filesystems from an NFS server which
6027 is designated by its IP Address (or associated hostname).&nbsp; In order
6028 to abstract NFS clients from knowing which specific cluster member is the
6029 acting NFS server, the client systems should not use the cluster member's
6030 hostname as the IP address by which a service is mounted.&nbsp; Rather,
6031 clustered NFS services are assigned <i>floating</i> IP addresses which
6032 are distinct from the cluster server's IP addresses.&nbsp; This floating
6033 IP address is then configured on which ever cluster member is actively
6034 serving the NFS export.&nbsp; Following this approach, the NFS clients
6035 are only aware of the floating IP address and are unaware of the fact that
6036 clustered NFS server has been deployed.&nbsp; When you enter an NFS service's
6037 IP address, you will also be prompted to enter an associated netmask and
6038 broadcast address.&nbsp; If you select the default of None, then the assigned
6039 netmask and broadcast will be the same as what the network interface is
6040 currently configured to.</li>
6041
6042 <li>
6043 <b>Mount Information</b> - for non-clustered filesystems, the mount information
6044 is typically placed in /<b>etc/fstab</b>.&nbsp; In contrast, clustered
6045 filesystems must <b>not</b> be placed in <b>/etc/fstab</b>.&nbsp; This
6046 is necessary to ensure that only one cluster member at a time has the filesystem
6047 mounted.&nbsp; Failure to do so will result in filesystem corruption and
6048 likely system crashes.</li>
6049
6050 <ul>
6051 <li>
6052 <b>Device special file</b> - The mount information designates the disk's
6053 device special file and the directory on which the filesystem will be mounted.&nbsp;
6054 In the process of configuring an NFS service you will be prompted for this
6055 information.</li>
6056
6057 <li>
6058 <b>Mount point directory</b> - An NFS service can include more than one
6059 filesystem mount.&nbsp; In this manner, the filesystems will be grouped
6060 together as a single failover unit.</li>
6061
6062 <li>
6063 <b>Mount options</b> - The mount information also designates the mount
6064 options.&nbsp; Note: by default, the Linux NFS server does not guarantee
6065 that all write operations are synchronously written to disk.&nbsp; In order
6066 to ensure synchronous writes you must specify the <b>sync</b> mount option.&nbsp;
6067 Specifying the <b>sync</b> mount option favors data integrity at the expense
6068 of performance. Refer to <i>mount(8)</i> for detailed descriptions of the
6069 mount related parameters.</li>
6070
6071 <li>
6072 <b>Forced unmount </b>- As part of the mount information, you will be prompted
6073 as to whether forced unmount should be enabled or not.&nbsp; When forced
6074 unmount is enabled, if any applications running on the cluster server have
6075 the designated filesystem mounted when the service is being disabled or
6076 relocated, then that application will be killed off to allow the unmount
6077 to proceed.</li>
6078 </ul>
6079
6080 <li>
6081 <b>Export Information</b> - for non-clustered NFS services, export information
6082 is typically placed in <b>/etc/exports</b>.&nbsp; In contrast, clustered
6083 NFS services should <b>not </b>place export information in <b>/etc/exports</b>;
6084 rather you will be prompted for this information during service configuration.&nbsp;
6085 Export information includes:</li>
6086
6087 <ul>
6088 <li>
6089 <b>Export directory</b> - the export directory can be the same as the mount
6090 point specified with the mount information.&nbsp; In this case, the entire
6091 filesystem is accessible through NFS.&nbsp; Alternatively, you may wish
6092 to only export a portion (subdirectory) of a mounted filesystem.&nbsp;
6093 By exporting subdirectories of a mountpoint, you can also specify different
6094 access rights to different sets of NFS clients.</li>
6095
6096 <li>
6097 <b>Export client names</b> - this parameter defines which systems will
6098 be allowed to access the filesystem as NFS clients.&nbsp; Here you can
6099 individually designate systems (e.g. fred), or you can use wildcards to
6100 allow groups of systems (e.g. *.wizzbang.com).&nbsp; Entering a client
6101 name of * allows any client to mount the filesystem.</li>
6102
6103 <li>
6104 <b>Export client options</b> - this parameter defines the access rights
6105 afforded to the corresponding client(s).&nbsp; Examples include <b>ro</b>
6106 (read only), and <b>rw</b> (read write).&nbsp;&nbsp; Unless explicitly
6107 specified otherwise, the default export options are <b>ro,async,wdelay,root_squash</b>.</li>
6108 </ul>
6109 Refer to <i>exports(5)</i> for detailed descriptions of the export parameter
6110 syntax.</ul>
6111 When running the <b>cluadmin</b> utility to configure NFS services:
6112 <ul>
6113 <li>
6114 Please take care that you correctly enter the service parameters.&nbsp;
6115 The validation logic associated with NFS parameters is currently not very
6116 robust.</li>
6117
6118 <li>
6119 In response to most of the prompts, you can enter the <b>? </b>character
6120 to obtain descriptive help text.</li>
6121 </ul>
6122
6123 <h4>
6124 Example NFS Service Configuration</h4>
6125 In order to illustrate the configuration process for an NFS service, an
6126 example configuration is described in this section.&nbsp; This example
6127 consists of setting up a single NFS export which houses the home directories
6128 of 4 members of the accounting team.&nbsp; NFS client access will be restricted
6129 to these 4 user's systems.
6130 <p>The following are the service configuration parameters which will be
6131 used as well as some descriptive commentary.
6132 <br>&nbsp;
6133 <ul>
6134 <li>
6135 Service Name - <b>nfs_accounting</b>. This name was chosen as a reminder
6136 of the service's intended function to provide exports to the members of
6137 the accounting team.</li>
6138
6139 <li>
6140 Preferred Member - <b>clu4</b>.&nbsp; In this example cluster, the member
6141 names are clu3 and clu4.</li>
6142
6143 <li>
6144 IP Address - <b>10.0.0.10</b>.&nbsp; There is a corresponding hostname
6145 of clunfsacct associated with this IP address, by which NFS clients mount
6146 the filesystem.&nbsp; Note that this IP address is distinct from that of
6147 both cluster members (clu3 and clu4).&nbsp; The default netmask and broadcast
6148 address will be used.</li>
6149
6150 <li>
6151 Mount Information - /<b>dev/sdb10</b>, which refers to the partition on
6152 the shared storage RAID box on which the filesystem will be physically
6153 stored. <b>ext3 </b>- referring to the filesystem type which was specified
6154 when the filesystem was created.&nbsp; <b>/mnt/users/accounting</b> - specifies
6155 the filesystem mount point. <b>rw,nosuid,sync</b> - are the mount options.</li>
6156
6157 <li>
6158 Export Information - for this example, the entire mounted filesystem will
6159 be made accessible on a read write basis by four members of the accounting
6160 team.&nbsp; The names of the systems used by these four team members are
6161 <b>burke</b>,
6162 <b>stevens</b>,
6163 <b>needle</b>
6164 and <b>dwalsh</b>.</li>
6165 </ul>
6166 The following is an excerpt of the /etc/hosts file used to represent IP
6167 addresses and associated hostnames used within the cluster:
6168 <pre>10.0.0.3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clu3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # cluster member</pre>
6169
6170 <pre>10.0.0.4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clu4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # second cluster member</pre>
6171
6172 <pre>10.0.0.10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clunfsacct&nbsp;&nbsp;&nbsp; # floating IP address associated with accounting team NFS service</pre>
6173
6174 <pre>10.0.0.11&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clunfseng&nbsp;&nbsp;&nbsp;&nbsp; # floating IP address associated with engineering team NFS service</pre>
6175 The following is excerpted from running <b>cluadmin</b> to configure this
6176 example NFS service:
6177 <p>cluadmin> <b>service add</b>
6178 <pre>Service name: <b>nfs_accounting
6179 </b>Preferred member [None]: clu4
6180 Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b>yes
6181 </b>User script (e.g., /usr/foo/script or None) [None]:&nbsp;
6182 Do you want to add an IP address to the service (yes/no/?) [no]: <b>yes
6183
6184 </b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address Information
6185
6186 IP address: <b>10.0.0.10
6187 </b>Netmask (e.g. 255.255.255.0 or None) [None]:&nbsp;&nbsp;&nbsp;&nbsp;
6188 Broadcast (e.g. X.Y.Z.255 or None) [None]:&nbsp;
6189 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]:&nbsp;
6190 Do you want to add a disk device to the service (yes/no/?) [no]: <b>yes
6191
6192 </b>Disk Device Information
6193
6194 Device special file (e.g., /dev/sdb4): <b>/dev/sdb10
6195 </b>Filesystem type (e.g., ext2, ext3 or None): <b>ext3
6196 </b>Mount point (e.g., /usr/mnt/service1) [None]: <b>/mnt/users/accounting
6197 </b>Mount options (e.g., rw,nosuid,sync): <b>rw,nosuid,sync
6198 </b>Forced unmount support (yes/no/?) [yes]:&nbsp;
6199 Would you like to allow NFS access to this filesystem (yes/no/?)&nbsp; [no]: <b>yes
6200
6201 </b>You will now be prompted for the NFS export configuration:&nbsp;
6202
6203 Export directory name: <b>/mnt/users/accounting
6204
6205 </b>Authorized NFS clients
6206
6207 Export client name [*]: <b>burke
6208 </b>Export client options [None]: <b>rw
6209 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>a
6210
6211 </b>Export client name [*]: <b>stevens
6212 </b>Export client options [None]: <b>rw
6213 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>a
6214
6215 </b>Export client name [*]: <b>needle
6216 </b>Export client options [None]: <b>rw
6217 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>a
6218
6219 </b>Export client name [*]: <b>dwalsh
6220 </b>Export client options [None]: <b>rw
6221 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>f
6222 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]:&nbsp;
6223 Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]:&nbsp;
6224 Disable service (yes/no/?) [no]:&nbsp;
6225 name: nfs_eng
6226 disabled: no
6227 preferred node: clu4
6228 relocate: yes
6229 user script: None
6230 IP address 0: 10.0.0.10
6231 &nbsp; netmask 0: None
6232 &nbsp; broadcast 0: None
6233 device 0: /dev/sdb10
6234 &nbsp; mount point, device 0: /mnt/users/accounting
6235 &nbsp; mount fstype, device 0: ext3
6236 &nbsp; mount options, device 0: rw,nosuid,sync
6237 &nbsp; force unmount, device 0: yes
6238 NFS export 0: /mnt/users/accounting
6239 &nbsp; Client 0: burke, rw
6240 &nbsp; Client 1: stevens, rw
6241 &nbsp; Client 2: needle, rw
6242 &nbsp; Client 3: dwalsh, rw
6243 Add nfs_eng service as shown? (yes/no/?) yes
6244 Added nfs_eng.
6245 cluadmin></pre>
6246
6247 <h4>
6248 NFS Client Access</h4>
6249 The NFS usage model for clients is completely unchanged from its normal
6250 approach.&nbsp; Following the prior example, if a client system wishes
6251 to mount the highly available NFS service, it simply needs to have an entry
6252 like the following in its <b>/etc/fstab</b> file:
6253 <pre>clunfsacct:/mnt/users/accounting&nbsp; /mnt/users/&nbsp; nfs bg 0 0</pre>
6254
6255 <h4>
6256 Active-Active NFS Configuration</h4>
6257 In the previous section, an example configuration of a simple NFS service
6258 was discussed.&nbsp; This section describes how to setup a more complex
6259 NFS service.
6260 <p>The example in this section involves configuring a pair of highly available
6261 NFS services.&nbsp; In this example, suppose you had 2 separate teams of
6262 users who will be accessing NFS filesystems served by the cluster.&nbsp;
6263 To serve these users, two separate NFS services will be configured.&nbsp;
6264 Each service will have its own separate IP address and be preferred to
6265 distinct cluster members.&nbsp; In this manner, under normal operating&nbsp;
6266 circumstances, when both cluster members are running, each will be NFS
6267 exporting one of the filesystems.&nbsp; This enables you to most effectively
6268 utilize the capacity of your two server systems.&nbsp; In the event of
6269 a failure (or planned maintenance) on either of the cluster members, then
6270 both NFS services will be running on the surviving cluster member.
6271 <p>This example configuration will expand upon the NFS service created
6272 in the prior section by adding in a second service.&nbsp; The following
6273 service configuration parameters apply to this second service:
6274 <br>&nbsp;
6275 <blockquote>
6276 <li>
6277 Service Name - <b>nfs_engineering</b>. This name was chosen as a reminder
6278 of the service's intended function to provide NFS exports to the members
6279 of the engineering team.</li>
6280
6281 <li>
6282 Preferred Member - <b>clu3</b>.&nbsp; In this example cluster, the member
6283 names are clu3 and clu4.&nbsp; Note that here we specify clu3 because the
6284 other cluster service (nfs_accounting) has clu4 specified as its preferred
6285 server.</li>
6286
6287 <li>
6288 IP Address - <b>10.0.0.11</b>.&nbsp; There is a corresponding hostname
6289 of clunfseng associated with this IP address, by which NFS clients mount
6290 the filesystem.&nbsp; Note that this IP address is distinct from that of
6291 both cluster members (clu3 and clu4).&nbsp; Also note that this IP address
6292 is different from the one associated with the other NFS service (nfs_accounting).
6293 The default netmask and broadcast address will be used.</li>
6294
6295 <li>
6296 Mount Information - /<b>dev/sdb11</b>, which refers to the partition on
6297 the shared storage RAID box on which the filesystem will be physically
6298 stored. <b>ext2 </b>- referring to the filesystem type which was specified
6299 when the filesystem was created.&nbsp; <b>/mnt/users/engineering</b> -
6300 specifies the filesystem mount point. r<b>w,nosuid,sync</b> - are the mount
6301 options.</li>
6302
6303 <li>
6304 Export Information - for this example, individual subdirectories of the
6305 mounted filesystem will be made accessible on a read write basis by three
6306 members of the engineering team.&nbsp; The names of the systems used by
6307 these three team members are <b>ferris</b>,
6308 <b>denham</b>, and <b>brown</b>.&nbsp;
6309 Also to make this example more illustrative, you will see that each team
6310 member will only be able to NFS mount their specific subdirectory.</li>
6311 </blockquote>
6312 Shown below is an excerpt output from running cluadmin to create this second
6313 NFS service on the same cluster as used in the prior example when the service
6314 nfs_accounting was created.
6315 <br>&nbsp;
6316 <pre>cluadmin> <b>service add
6317
6318 </b>Service name: nfs_engineering
6319 Preferred member [None]: <b>clu3
6320 </b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b>yes
6321 </b>User script (e.g., /usr/foo/script or None) [None]:&nbsp;
6322 Do you want to add an IP address to the service (yes/no/?) [no]: <b>yes
6323
6324 </b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address Information
6325
6326 IP address: <b>10.0.0.11
6327 </b>Netmask (e.g. 255.255.255.0 or None) [None]:&nbsp;
6328 Broadcast (e.g. X.Y.Z.255 or None) [None]:&nbsp;
6329 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]: <b>f
6330 </b>Do you want to add a disk device to the service (yes/no/?) [no]: <b>yes
6331
6332 </b>Disk Device Information
6333
6334 Device special file (e.g., /dev/sdb4): <b>/dev/sdb11
6335 </b>Filesystem type (e.g., ext2, ext3 or None): <b>ext2
6336 </b>Mount point (e.g., /usr/mnt/service1) [None]: <b>/mnt/users/engineering
6337 </b>Mount options (e.g., rw,nosuid,sync): <b>rw,nosuid,sync
6338 </b>Forced unmount support (yes/no/?) [yes]:&nbsp;
6339 Would you like to allow NFS access to this filesystem (yes/no/?)&nbsp; [no]: <b>yes
6340
6341 </b>You will now be prompted for the NFS export configuration:&nbsp;
6342
6343 Export directory name: <b>/mnt/users/engineering/ferris
6344
6345 </b>Authorized NFS clients
6346
6347 Export client name [*]: <b>ferris
6348 </b>Export client options [None]: <b>rw
6349 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>f
6350 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: <b>a
6351
6352 </b>Export directory name: <b>/mnt/users/engineering/denham
6353
6354 </b>Authorized NFS clients
6355
6356 Export client name [*]: <b>denham
6357 </b>Export client options [None]: <b>rw
6358 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]:&nbsp;
6359 Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: <b>a
6360
6361 </b>Export directory name: <b>/mnt/users/engineering/brown
6362
6363 </b>Authorized NFS clients
6364
6365 Export client name [*]: <b>brown
6366 </b>Export client options [None]: <b>rw
6367 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS CLIENTS, or are you (f)inished adding CLIENTS [f]: <b>f
6368 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how NFS EXPORTS, or are you (f)inished adding EXPORTS [f]: <b>a
6369 </b>Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]:&nbsp;
6370 Disable service (yes/no/?) [no]:&nbsp;
6371 name: nfs_engineering
6372 disabled: no
6373 preferred node: clu3
6374 relocate: yes
6375 user script: None
6376 IP address 0: 10.0.0.11
6377 &nbsp; netmask 0: None
6378 &nbsp; broadcast 0: None
6379 device 0: /dev/sdb11
6380 &nbsp; mount point, device 0: /mnt/users/engineering
6381 &nbsp; mount fstype, device 0: ext2
6382 &nbsp; mount options, device 0: rw,nosuid,sync
6383 &nbsp; force unmount, device 0: yes
6384 NFS export 0: /mnt/users/engineering/ferris
6385 &nbsp; Client 0: ferris, rw
6386 NFS export 0: /mnt/users/engineering/denham
6387 &nbsp; Client 0: denham, rw
6388 NFS export 0: /mnt/users/engineering/brown
6389 &nbsp; Client 0: brown, rw
6390 Add nfs_engineering service as shown? (yes/no/?) yes
6391 Added nfs_engineering.
6392 cluadmin></pre>
6393
6394 <h4>
6395 NFS Caveats</h4>
6396 The following points need to be taken into consideration when clustered
6397 NFS services are configured.
6398 <h5>
6399 Avoid using `exportfs -r`</h5>
6400 Filesystems being NFS exported by cluster members do not get specified
6401 in the conventional <b>/etc/exports </b>file.&nbsp; Rather, the NFS exports
6402 associated&nbsp; with a cluster services are specified in the cluster configuration
6403 file (as established by <b>cluadmin</b>).
6404 <p>The command <b><i>exportfs -r </i></b>removes any exports which are
6405 not explicitly specified in the <b>/etc/exports</b> file.&nbsp; Running
6406 this command will cause the clustered NFS services to become unavailable
6407 until the service is restarted. For this reason you should avoid using
6408 the <b><i>exportfs -r </i></b>command on a cluster on which highly available
6409 NFS services are configured.&nbsp; To recover from unintended usage of
6410 <b>exportfs
6411 -r</b>, the NFS cluster service must be stopped and then restarted.
6412 <br>&nbsp;
6413 <h5>
6414 NFS File Locking</h5>
6415 NFS file locks are <b>not</b> preserved across a failover or service relocation.&nbsp;
6416 This is due to the fact that the Linux NFS implementation stores file locking
6417 information in system files.&nbsp; These system files representing NFS
6418 locking state are not replicated across the cluster.&nbsp; The implication
6419 is that locks may be regranted subsequent to the failover operation.
6420 <br>&nbsp;
6421 <h3>
6422 <a NAME="Setting up a Samba Service"></a></h3>
6423
6424 <h3>
6425 4.1.8 Setting Up a High Availability Samba Service</h3>
6426 <i>(Editorial Note: this is a preliminary writeup, its rough - needing
6427 editorial cleanup.)</i>
6428 <p>Highly available network file services are one of the key strengths
6429 of the clustering infrastructure.&nbsp; Advantages of high availibility
6430 Samba services include:
6431 <ul>
6432 <li>
6433 Provides heterogeneous file serving capabilities to Windows (trademark
6434 symbol?) based clients using the CIFS/SMB protocol.</li>
6435
6436 <li>
6437 Allows the same set of filesystems to be simultaneously network served
6438 to both NFS and Windows based clients.</li>
6439
6440 <li>
6441 Ensures that Windows based clients maintain access to key data, or allowed
6442 to quickly reestablish connection in the event of server failure.</li>
6443
6444 <li>
6445 Facilitates planned maintenance by allowing you to transparently relocate
6446 Samba services to one cluster member, allowing you to fix or upgrade the
6447 other cluster member.</li>
6448
6449 <li>
6450 Allows you to setup an active-active configuration to maximize equipment
6451 utilization.&nbsp; More details on active-active configurations appear
6452 below.</li>
6453 </ul>
6454 Note: a complete description of Samba configuration is beyond the scope
6455 of this document.&nbsp; Rather, this documention merely highlights aspects
6456 which are crucial for clustered operation.&nbsp; Refer to <i>&lt;&lt;tbd
6457 link in RH documentation>></i> for more details on Samba configuration.&nbsp;
6458 As a prerequisite to configuring high availability Samba services, you
6459 should know how to configure conventional non-clustered Samba fileserving.
6460 <br>&nbsp;
6461 <h4>
6462 Samba Server Requirements</h4>
6463 If you intend to create highly available Samba services, then there are
6464 a few requirements which must be met by each cluster server. These requirements
6465 include:
6466 <ul>
6467 <li>
6468 The Samba RPM packages must be installed.&nbsp; For example: <b>samba</b>,
6469 <b>samba-common</b>.
6470 There have been no modifications to the Samba RPMs themselves in support
6471 of high availability.</li>
6472
6473 <li>
6474 The Samba daemons will be started and stopped&nbsp; up by the cluster infrastructure
6475 on a per-service basis.&nbsp; Consequently, the Samba configuration information
6476 should not be specified in the conventional <b>/etc/samba/smb.conf</b>.
6477 The automated system startup of the samba daemons smbd and nmbd should
6478 not be enabled in&nbsp; init.d run levels&nbsp; For example:
6479 <b>chkconfig
6480 --del smb</b>.</li>
6481
6482 <li>
6483 Since the cluster infrastructure stops the cluster related samba deamons
6484 appropriately, system administrators should not manually run the conventional
6485 samba stop script (e.g. <b>service smb stop</b>) as this will terminate
6486 all cluster related samba daemons.</li>
6487
6488 <li>
6489 Filesystem mounts&nbsp; for clustered Samba services should not be included
6490 in <b>/etc/fstab.</b>&nbsp; Rather, for clustered services, the parameters
6491 describing mounts are entered via the <b>cluadmin</b> configuration utility.</li>
6492
6493 <li>
6494 Failover of samba printer shares is not currently supported.</li>
6495
6496 <li>
6497 <i>Editorial Note - need to describe the incorporation of kernel patches,
6498 once we work that out.</i></li>
6499 </ul>
6500
6501 <h4>
6502 Samba Operating Model</h4>
6503 This section provides background information describing the implementation
6504 model in support of Samba high availability services.&nbsp; Knowledge of
6505 this information will provide the context for understanding the configuration
6506 requirements of clustered Samba services.
6507 <p>The conventional non-clustered Samba configuration model consists of
6508 editing the <b>/etc/samba/smb.conf </b>file to designate which filesystems
6509 are to be made network accessible to the specified Windows clients.&nbsp;
6510 It also designates access permissions and other mapping capabilities.&nbsp;
6511 In the single system model, a single instance of each of the <b>smbd</b>
6512 and <b>nmbd </b>daemons are automatically started up by the init.d run
6513 level script <b>smb</b>.
6514 <p>In order to implement high availibility Samba services, rather than
6515 having a single /<b>etc/samba/smb.conf </b>file; there is an individual
6516 per-service samba configuration file.&nbsp; These files are called /<b>etc/samba/smb.conf.sharename</b>;
6517 where <b>sharename</b> is replaced by the specific name of the individual
6518 configuration file associated with a Samba service.&nbsp; For example,
6519 suppose you wished to call one share <b>eng </b>and another share <b>acct,</b>
6520 the corresponding Samba configuration files would be <b>/etc/samba/smb.conf.eng</b>
6521 and /<b>etc/samba/smb.conf.acct.</b>
6522 <p>The format of the <b>smb.conf.sharename</b> file is identical to the
6523 conventional <b>smb.conf</b> format.&nbsp; No additional fields have been
6524 created for clustered operation. There are several fields within the <b>smb.conf.sharename</b>
6525 file which are required for correct cluster operation; these fields will
6526 be described in an upcoming section.&nbsp; When a new Samba service is
6527 created using the <b>cluadmin</b> utility, a default template <b>smb.conf.sharename</b>
6528 file will be created based on the service specific parameters.&nbsp; This
6529 file should be used as a starting point from which the system administrator
6530 should then adjust to add in the appropriate Windows client systems, specific
6531 directories to share as well as permissions.
6532 <p>The system administrator is required to copy the <b>/etc/samba/smb.conf.sharename</b>
6533 files onto both cluster members.&nbsp; After the initial configuration
6534 time, should any changes be made to any <b>smb.conf.sharename</b> file,
6535 it is necessary to also copy this updated version to the other cluster
6536 member.
6537 <p>To facilitate high availibility Samba functionality, each individual
6538 Samba service configured within the cluster (via <b>cluadmin</b>) will
6539 have its own individual pair of <b>smbd</b>/<b>nmbd</b> daemons.&nbsp;
6540 Consequently, if there are more than one Samba services configured with
6541 the cluster, you may see multiple instances of these daemon pairs running
6542 on an individual cluster server.&nbsp; These Samba daemons <b>smbd</b>/<b>nmbd
6543 </b>are
6544 not initiated via the conventional init.d run level scripts; rather they
6545 are initiated by the cluster infrastructure based on which node is the
6546 active service provider.
6547 <p>In order to allow a single system to run multiple instances of the Samba
6548 daemons, each pair of daemons is required to have its own locking directory.&nbsp;
6549 Consequently, there will be a separate per-service Samba daemon locking
6550 directory.&nbsp; This directory is given a the name <b>/var/cache/samba/sharename</b>;
6551 where <b>sharename</b> is replaced by the Samba share name specified within
6552 the service configuration information (via <b>cluadmin</b>).&nbsp; Following
6553 the prior example, the corresponding lock directories would be <b>/var/cache/samba/eng</b>
6554 and <b>/var/cache/samba/acct</b>.
6555 <p>When the <b>cluadmin</b> utility is used to configure a Samba service,
6556 the <b>/var/cache/samba/sharename</b> directory will be automatically created
6557 on the system on which the <b>cluadmin</b> utility is running.&nbsp; At
6558 this time a reminder will be displayed that you need to manually create
6559 this lock directory on the other cluster member.&nbsp; For example: <b>mkdir
6560 /var/cache/samba/eng</b>.
6561 <br>&nbsp;
6562 <h4>
6563 Gathering Samba Service Configuration Parameters</h4>
6564 In preparation of configuring Samba services you need to determine configuration
6565 information such as which filesystems will be presented as shares to Windows
6566 based clients.&nbsp; The following information is required in order to
6567 configure NFS services:
6568 <ul>
6569 <li>
6570 <b>Service Name</b> - A name used to uniquely identify this service within
6571 the cluster.</li>
6572
6573 <li>
6574 <b>Preferred Member</b> - Defines which system will be the Samba server
6575 for this service when more than one cluster member is operational.</li>
6576
6577 <li>
6578 <b>Relocation Policy</b> - whether to relocate the service to the preferred
6579 member if the preferred member wasn't running at the time the service was
6580 initially started.&nbsp; This parameter is useful as a means of load balancing
6581 the cluster members as Sambaservers by assigning half the load to each.</li>
6582
6583 <li>
6584 <b>Status Check Interval - </b>specifies how often (in seconds) the cluster
6585 subsystem should verify that the pair of Samba daemons <b>smbd</b>/<b>nmbd</b>
6586 which are associated with this service are running.&nbsp; In the event
6587 that either of these daemons have unexpectedly exited, they will be automatically
6588 restarted to resume services.&nbsp; If you specify a value of 0, then no
6589 monitoring will be performed.&nbsp; For example, designating an interval
6590 of 90 seconds will result in monitoring at that interval.</li>
6591
6592 <li>
6593 <b>IP Address</b> - Windows clients access file shares from an server as
6594 designated by its IP Address (or associated hostname).&nbsp; In order to
6595 abstract Windows clients from knowing which specific cluster member is
6596 the acting Samba server, the client systems should not use the cluster
6597 member's hostname as the IP address by which a service is accessed.&nbsp;
6598 Rather, clustered Samba services are assigned <i>floating</i> IP addresses
6599 which are distinct from the cluster server's IP addresses.&nbsp; This floating
6600 IP address is then configured on which ever cluster member is actively
6601 serving the share.&nbsp; Following this approach, the Windows clients are
6602 only aware of the floating IP address and are unaware of the fact that
6603 clustered SAmba services have been deployed.&nbsp; When you enter a Samba
6604 service's IP address, you will also be prompted to enter an associated
6605 netmask and broadcast address.&nbsp; If you select the default of None,
6606 then the assigned netmask and broadcast will be the same as what the network
6607 interface is currently configured to.</li>
6608
6609 <li>
6610 <b>Mount Information</b> - for non-clustered filesystems, the mount information
6611 is typically placed in /<b>etc/fstab</b>.&nbsp; In contrast, clustered
6612 filesystems must <b>not</b> be placed in <b>/etc/fstab</b>.&nbsp; This
6613 is necessary to ensure that only one cluster member at a time has the filesystem
6614 mounted.&nbsp; Failure to do so will result in filesystem corruption and
6615 likely system crashes.</li>
6616
6617 <ul>
6618 <li>
6619 <b>Device special file</b> - The mount information designates the disk's
6620 device special file and the directory on which the filesystem will be mounted.&nbsp;
6621 In the process of configuring a Samba service you will be prompted for
6622 this information.</li>
6623
6624 <li>
6625 <b>Mount point directory</b> - A Samba service can include more than one
6626 filesystem mount.&nbsp; In this manner, the filesystems will be grouped
6627 together as a single failover unit.</li>
6628
6629 <li>
6630 <b>Mount options</b> - The mount information also designates the mount
6631 options.</li>
6632
6633 <li>
6634 <b>Forced unmount </b>- As part of the mount information, you will be prompted
6635 as to whether forced unmount should be enabled or not.&nbsp; When forced
6636 unmount is enabled, if any applications running on the cluster server have
6637 the designated filesystem mounted when the service is being disabled or
6638 relocated, then that application will be killed off to allow the unmount
6639 to proceed.</li>
6640 </ul>
6641
6642 <li>
6643 <b>Export Information</b> - this information is required for NFS services
6644 only.&nbsp; If you are only performing file serving to Windows based clients,
6645 answer <i>no</i> when prompted regarding NFS exports.&nbsp; Alternatively,&nbsp;
6646 you can configure a service to perform heterogeneous file serving by designating
6647 both NFS exports parameters and the Samba share parameter.</li>
6648
6649 <li>
6650 <b>Samba Share Name</b> - In the process of configuring&nbsp; a service
6651 you will be asked if you wish to share the filesystem to Windows clients.&nbsp;
6652 If you answer <i>yes</i> to this question, you will then be prompted for
6653 the Samba share name.&nbsp; Based on the name you specify here, there will
6654 be a corresponding <b>/etc/samba/smb.conf.sharename</b> file and lock directory
6655 <b>/var/cache/samba/sharename</b>.&nbsp;
6656 By convention the actual Windows share name specified within the smb.conf.sharename
6657 will be set in accordance with this parameter.&nbsp; In practice, you can
6658 designate more than one Samba share within an individual <b>smb.conf.sharename</b>
6659 file. There can be at most 1 samba configuration specified per service;
6660 which must be&nbsp; specified with the first device. For example, if you
6661 have multiple disk devices (and corresponding filesystem mounts) within
6662 a single service, then specify a single <b>sharename</b> for the service.&nbsp;
6663 Then within the <b>/etc/samab/smb.conf.sharename</b> file, designate multiple
6664 individual samba shares to share directories from the multiple devices.
6665 To disable samba sharing of a service, the share name should be set to
6666 <b>None</b>.</li>
6667 </ul>
6668 When running the <b>cluadmin</b> utility to configure Samba services:
6669 <ul>
6670 <li>
6671 Please take care that you correctly enter the service parameters.&nbsp;
6672 The validation logic associated with Samba parameters is currently not
6673 very robust.</li>
6674
6675 <li>
6676 In response to most of the prompts, you can enter the <b>? </b>character
6677 to obtain descriptive help text.</li>
6678
6679 <li>
6680 After configuring a Samba service via <b>cluadmin</b>, remember to tune
6681 the <b>/etc/samba/smb.conf.sharename</b> file for each service in accordance
6682 with the clients and authorization scheme you desire.</li>
6683
6684 <li>
6685 Remember to copy the <b>smb.conf.sharename</b> file over to the other cluster
6686 member.</li>
6687
6688 <li>
6689 Perform the recommended step to create the Samba daemon's lock directory
6690 on the other cluster member, eg <b>mkdir /var/cache/samba/acct.</b></li>
6691
6692 <li>
6693 If you delete a Samba service, be sure to manually remove the <b>/etc/samba/smb.conf/sharename
6694 </b>file.&nbsp;
6695 The <b>cluadmin</b> utility does not autmoatically delete this file in
6696 order to preserve your site specific configuration parameters for possible
6697 later usage.</li>
6698 </ul>
6699
6700 <h4>
6701 Example Samba Service Configuration</h4>
6702 In order to illustrate the configuration process for a Samba service, an
6703 example configuration is described in this section.&nbsp; This example
6704 consists of setting up a single Samba share which houses the home directories
6705 of 4 members of the accounting team.&nbsp; The accounting team will then
6706 access this share from their Windows based systems.
6707 <p>The following are the service configuration parameters which will be
6708 used as well as some descriptive commentary.
6709 <br>&nbsp;
6710 <ul>
6711 <li>
6712 Service Name - <b>samba_acct.</b> This name was chosen as a reminder of
6713 the service's intended function to provide exports to the members of the
6714 accounting team.</li>
6715
6716 <li>
6717 Preferred Member - <b>clu4</b>.&nbsp; In this example cluster, the member
6718 names are clu3 and clu4.</li>
6719
6720 <li>
6721 Monitoring Interval - <b>90</b> seconds.</li>
6722
6723 <li>
6724 IP Address - <b>10.0.0.10</b>.&nbsp; There is a corresponding hostname
6725 of cluacct associated with this IP address, by which Windows based clients
6726 access the share.&nbsp; Note that this IP address is distinct from that
6727 of both cluster members (clu3 and clu4).&nbsp; The default netmask and
6728 broadcast address will be used.</li>
6729
6730 <li>
6731 Mount Information - /<b>dev/sdb10</b>, which refers to the partition on
6732 the shared storage RAID box on which the filesystem will be physically
6733 stored. <b>ext2 </b>- referring to the filesystem type which was specified
6734 when the filesystem was created.&nbsp; <b>/mnt/users/accounting</b> - specifies
6735 the filesystem mount point. <b>rw,nosuid,sync</b> - are the mount options.</li>
6736
6737 <li>
6738 Export Information - for simplicity in this example, the filesystem is
6739 not being NFS exported.</li>
6740
6741 <li>
6742 Share Name - <b>acct</b> - this is the share name by which Windows based
6743 clients will access this Samba share, e.g. \\10.0.0.10\acct.</li>
6744 </ul>
6745 The following is an excerpt of the /etc/hosts file used to represent IP
6746 addresses and associated hostnames used within the cluster:
6747 <pre>10.0.0.3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clu3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # cluster member</pre>
6748
6749 <pre>10.0.0.4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; clu4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # second cluster member</pre>
6750
6751 <pre>10.0.0.10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; cluacct&nbsp;&nbsp;&nbsp; # floating IP address associated with accounting team NFS service</pre>
6752 The following is excerpted from running <b>cluadmin</b> to configure this
6753 example Samba service:
6754 <pre>Service name: <b>samba_acct
6755 </b>Preferred member [None]: <b>clu4
6756 </b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes
6757 User script (e.g., /usr/foo/script or None) [None]:&nbsp;
6758 Status check interval [0]: <b>90
6759 </b>Do you want to add an IP address to the service (yes/no/?) [no]: yes
6760
6761 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address Information
6762
6763 IP address: <b>10.0.0.10
6764 </b>Netmask (e.g. 255.255.255.0 or None) [None]:&nbsp;
6765 Broadcast (e.g. X.Y.Z.255 or None) [None]:&nbsp;
6766 Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or are you (f)inished adding IP addresses [f]:&nbsp;
6767 Do you want to add a disk device to the service (yes/no/?) [no]: yes
6768
6769 Disk Device Information
6770
6771 Device special file (e.g., /dev/sdb4): <b>/dev/sdb12
6772 </b>Filesystem type (e.g., ext2, ext3 or None): <b>ext2
6773 </b>Mount point (e.g., /usr/mnt/service1) [None]: <b>/mnt/users/accounting
6774 </b>Mount options (e.g., rw,nosuid,sync): <b>rw,nosuid,sync
6775 </b>Forced unmount support (yes/no/?) [yes]:&nbsp;
6776 Would you like to allow NFS access to this filesystem (yes/no/?)&nbsp; [no]:&nbsp;
6777 Would you like to share to Windows clients (yes/no/?)&nbsp; [no]: <b>yes
6778
6779 </b>You will now be prompted for the Samba configuration:&nbsp;
6780 Samba share name: <b>acct
6781
6782 </b>The samba config file /etc/samba/smb.conf.acct does not exist.
6783
6784 Would you like a default config file created (yes/no/?)&nbsp; [no]: <b>yes
6785
6786 </b>Successfully created daemon lock directory /var/cache/samba/acct.
6787 Please run `mkdir /var/cache/samba/acct` on the other cluster member.
6788
6789 Successfully created /etc/samba/smb.conf.acct.
6790 Please remember to make necessary customizations and then copy the file
6791 over to the other cluster member.
6792
6793 Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or are you (f)inished adding DEVICES [f]: <b>f
6794 </b>name: samba_acct
6795 preferred node: clu4
6796 relocate: yes
6797 user script: None
6798 monitor interval: 90
6799 IP address 0: 10.0.0.10
6800 &nbsp; netmask 0: None
6801 &nbsp; broadcast 0: None
6802 device 0: /dev/sdb12
6803 &nbsp; mount point, device 0: /mnt/users/accounting
6804 &nbsp; mount fstype, device 0: ext2
6805 &nbsp; mount options, device 0: rw,nosuid,sync
6806 &nbsp; force unmount, device 0: yes
6807 &nbsp; samba share, device 0: acct
6808 Add samba_acct service as shown? (yes/no/?) <b>yes</b></pre>
6809 After running cluadmin as shown above to configure the service, remember
6810 to:
6811 <ul>
6812 <li>
6813 Customize <b>/etc/samba/smb.conf.sharename</b> accordingly.</li>
6814
6815 <li>
6816 Copy /<b>etc/samba/smb.conf.sharename</b> over to the other cluster member.</li>
6817
6818 <li>
6819 Create the suggested lock directory on the other cluster member, e.g. <b>mkdir
6820 /var/cache/samba/acct</b></li>
6821 </ul>
6822
6823 <h4>
6824 smb.conf.sharename File Fields</h4>
6825 This section describes the fieles within the <b>smb.conf.sharename</b>
6826 file which are most relevent to the correct operation of highly available
6827 Samba services.&nbsp; It is beyond the scope of this document to completely
6828 describe all of the fields within a Samba configuration file.&nbsp; There
6829 have been no additional field names added in support of clustering, the
6830 file format follows the normal Samba conventions.
6831 <p>Shown below is an example <b>smb.conf.sharename</b> file which was automatically
6832 generated by <b>cluadmin</b> in response to the service specific parameters.&nbsp;
6833 This example file matches the above <b>cluadmin</b> service configuration
6834 example. Following the file will be a description of the most relevent
6835 fields.
6836 <pre># Template samba service configuration file - please modify to specify
6837 # subdirectories and client access permissions.
6838 # Remember to copy this file over to other cluster member, and create
6839 # the daemon lock directory /var/cache/samba/acct.
6840 #
6841 # From a cluster perspective, the key fields are:
6842 # lock directory - must be unique per samba service.
6843 # bind interfaces only - must be present set to yes.
6844 # interfaces - must be set to service floating IP address.
6845 # path - must be the service mountpoint or subdirectory thereof.
6846 # Refer to the cluster documentation for details.
6847
6848 [global]
6849 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; workgroup = RHCLUSTER
6850 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; lock directory = /var/cache/samba/acct
6851 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; log file = /var/log/samba/%m.log
6852 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; encrypt passwords = yes
6853 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bind interfaces only = yes
6854 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; interfaces = 10.0.0.10
6855
6856 [acct]
6857 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; comment = High Availability Samba Service
6858 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; browsable = yes
6859 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; writable = no
6860 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; public = yes
6861 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; path = /mnt/service12</pre>
6862 The following is a description of the most relevent fields, from a cluster
6863 perspective, in the <b>/etc/samba/smb.conf.sharename</b> file.&nbsp; In
6864 this example, the file&nbsp; is named <b>/etc/samba/smb.conf.acct </b>in
6865 accordance with the share name being specified as <b>acct</b> while running
6866 cluadmin.&nbsp; Only the cluster&nbsp; specific fields are described below.&nbsp;
6867 The remaining fields follow standard Samba convention and should be tailored
6868 accordingly.
6869 <p>Global Parameters - These parameters pertain to all shares which are
6870 specified in this smb.conf.sharename file.&nbsp; Remember that you are
6871 free to designate more than one share within this file; provided that the
6872 directories described within it are within the service's filesystem mounts.
6873 <p><b>lock directory</b> - dictates the name of the directory in which
6874 the Samba daemons <b>smbd</b>/<b>nmbd</b> will place thier locking files.&nbsp;
6875 This must be set to <b>/var/cache/samba/sharename</b>, where <b>sharename</b>
6876 varies based on the parameter specified in <b>cluadmin</b>.&nbsp; Specification
6877 of a lock directory is required in order to allow a separate per-service
6878 instance of <b>smbd</b>/<b>nmbd</b>.
6879 <br><b>bind interfaces only</b> - This parameter must be set to <b>yes</b>
6880 in order to allow each <b>smbd</b>/<b>nmbd</b> pair to bind to the floating
6881 IP address associated with this clustered Samba service.
6882 <br><b>interfaces</b> -specifies the IP address associated with the Samba
6883 service.&nbsp; If you specified a netmask within the service, this field
6884 would appear like the following example: i<b>nterfaces = 10.0.0.10/255.255.254.0</b>.
6885 <p>Share specific parameters - these parameters pertain to a specific Samba
6886 share.
6887 <br><b>writable</b> - by default, this share access permissions are conservatively
6888 set as non-writable.&nbsp; Tune according to your site-specific preferences.
6889 <br><b>path</b> - defaults to the first filesystem mount point specified
6890 within the service configuration.&nbsp; This should be adjusted to match
6891 the specific directory or subdirectory you intend to make available as
6892 a share to Windows clients.
6893 <br>&nbsp;
6894 <h4>
6895 Windows Client Access to Samba Shares</h4>
6896 Windows clients are oblivious to the fact that the shares are being served
6897 by a high availability cluster.&nbsp; From the windows client's perspective
6898 the only requirement is that they access the Samba share via its floating
6899 IP address (or associated hostname) which was configured using cluadmin,
6900 e.g. 10.0.0.10.&nbsp; The Windows clients should not directly access the
6901 share from either of the cluster member system's IP address (e.g. clu3
6902 or clu4).
6903 <p>Depending upon the authorization scheme you intend to utilize in your
6904 environment, you may have to use the <b>smbpasswd</b> command to establish
6905 Windows account information on the cluster servers. When establishing these
6906 accounts it is required that the same Samba related account information
6907 be setup on both cluster members.&nbsp; This can either be accomplished
6908 by running smbpassword similarly on both cluster members, or by copying
6909 over the resulting <b>/etc/smbpasswd</b> file. For example, to enable a
6910 Windows client system named <b>sarge</b> to access a Samba share served
6911 by the cluster members, you would run the following command on both cluster
6912 members, taking care to specify the same username and password each time:
6913 <b>smbpasswd
6914 -a sarge</b>.
6915 <p>On a Windows client, the Samba share can then be accessed in the conventional
6916 manner.&nbsp; For example you could click on the <b>Start</b> button on
6917 the main taskbar, followed by selecting <b>Run</b>.&nbsp; This brings up
6918 a dialog box in which you can specify the clustered Samba share name.&nbsp;
6919 For example: \<b>\10.0.0.10\acct </b>or equivalently <b>\\cluacct\acct</b>.
6920 To access the samba share from a Windows client you can also use the <b>Map
6921 Network Drive </b>feature. It is important to take care to ensure that
6922 the hostname portion of the share name refers to the floating service IP
6923 address.&nbsp; Following the hostname / IP addresses from the above <b>/etc/hosts</b>
6924 excerpt; the correct name to refer to this highly available cluster share
6925 is \<b>\cluacct\acct</b>.&nbsp; The share should not be accessed by referring
6926 to the name of the cluster server.&nbsp; For example, do not access this
6927 share as either <b>\\clu3\acct </b>or \<b>\clu4\acct</b>.&nbsp; If a share
6928 is incorrectly referred to by the cluster server name (e.g. \<b>\clu3\acct</b>),
6929 then the Windows client will only be able to access the share while it
6930 is being actively served by <b>clu3</b>; thereby subverting the high availability
6931 benfits.
6932 <p>Unlike the NFS protocol, the Windows based CIFS/SMB protocol is much
6933 more stateful.&nbsp; As a consequence, in the Windows environment, it is
6934 the responsibility of the individual application to take appropriate measures
6935 in response to lack of immediate response from the Samba server.&nbsp;
6936 In the case of either a planned service relocation or a true failover scenario,
6937 there is a period of time where the Windows clients will not get immediate
6938 response from the Samba server.&nbsp; Robust Windows applications will
6939 retry requests which timeout during this interval.
6940 <p>We have observed that well behaved applications correctly retry appropriately
6941 resulting in Windows clients being completely unaware of service relocations
6942 or failover operations.&nbsp; In contrast, poorly behaved Windows applications
6943 will result in error messages in the event of a failover or relocation
6944 indicating inability to access the share.&nbsp; It may be necessary to
6945 retry the operation or restart the application in order to enable Windows
6946 client systems to re-attach to a Samba share for poorly written applications.
6947 <p>The behavior of a Windows based client in response to either failover
6948 or relocation of a samba service also varies on which release of windows
6949 is installed on each client system.&nbsp; For example, Windows 98 based
6950 systems often enounter errors like <i>The network path was not found</i>.&nbsp;
6951 Whereas, later versions such as Windows 2000 transparently recover under
6952 the same set of circumstances.
6953 <p><i>Editorial comment: Add in description of the impact of the kernel
6954 patches for Stale File Handle errors once we determine whether that will
6955 be incorporated.</i>
6956 <br>&nbsp;
6957 <p><a NAME="service-apache"></a>
6958 <h3>
6959 Setting Up an Apache Service</h3>
6960 This section provides an example of setting up a cluster service that will
6961 fail over an Apache Web server. Although the actual variables that you
6962 use in the service depend on your specific configuration, the example may
6963 help you set up a service for your environment.
6964 <p><i>Editorial comment: Here the distinction of Piranha as a load balancer
6965 vs a highly available apache server for static content should be discussed.</i>
6966 <p>To set up an Apache service, you must configure both cluster systems
6967 as Apache servers. The cluster software ensures that only one cluster system
6968 runs the Apache software at one time.&nbsp; The Apache configuration will
6969 consist of installing the apache rpm's on both cluster members and configuring
6970 a shared filesystem to house the web site's content.
6971 <p>When you install the Apache software on the cluster systems, do not
6972 configure the cluster systems so that Apache automatically starts when
6973 the system boots. For example, running <b>chkconfig --del httpd</b>.&nbsp;
6974 Rather than having the system startup scripts spawn httpd, the cluster
6975 infrastructure will do that on the active cluster server for the Apache
6976 service.&nbsp; This will ensure that the corresponding IP address and filesystem
6977 mounts are active on only one cluster member at a time.
6978 <p>When you add an Apache service, you must assign it a "floating" IP address.
6979 The cluster infrastructure binds this IP address to the network interface
6980 on the cluster system that is currently running the Apache service. This
6981 IP address ensures that the cluster system running the Apache software
6982 is transparent to the HTTP clients accessing the Apache server.
6983 <p>The file systems that contain the Web content must not be automatically
6984 mounted on shared disk storage when the cluster systems boot. Instead,
6985 the cluster software must mount and unmount the file systems as the Apache
6986 service is started and stopped on the cluster systems. This prevents both
6987 cluster systems from accessing the same data simultaneously, which may
6988 result in data corruption. Therefore, do not include the file systems in
6989 the <b><font face="Courier New, Courier, mono">/etc/fstab </font></b>file.
6990 <p>Setting up an Apache service involves the following four steps:
6991 <ol>
6992 <li>
6993 Set up the shared file system for the service.&nbsp; This filesystem is
6994 used to house the web site's content.</li>
6995
6996 <li>
6997 Install the Apache software on both cluster systems.</li>
6998
6999 <li>
7000 Configure the Apache software on both cluster systems.</li>
7001
7002 <li>
7003 Add the service to the cluster database.</li>
7004 </ol>
7005 To set up the shared file systems for the Apache service, become root and
7006 perform the following tasks on one cluster system:
7007 <ol>
7008 <li>
7009 On a shared disk, use the interactive <b><font face="Courier New, Courier, mono">fdisk</font></b>
7010 command to create a partition that will be used for the Apache document
7011 root directory. Note that you can create multiple document root directories
7012 on different disk partitions. See <a href="#partition">Partitioning Disks</a>
7013 for more information.</li>
7014
7015 <br>&nbsp;
7016 <li>
7017 Use the <b><font face="Courier New, Courier, mono">mkfs</font></b> command
7018 to create an ext2 file system on the partition you created in the previous
7019 step. Specify the drive letter and the partition number. For example:</li>
7020
7021 <pre># <b>mkfs /dev/sde3</b></pre>
7022
7023 <li>
7024 Mount the file system that will contain the Web content on the Apache document
7025 root directory. For example:</li>
7026
7027 <pre># <b>mount /dev/sde3 /var/www/html</b></pre>
7028 Do not add this mount information to the <b><font face="Courier New, Courier, mono">/etc/fstab</font></b>
7029 file, because only the cluster software can mount and unmount file systems
7030 used in a service.
7031 <li>
7032 Copy all the required files to the document root directory.</li>
7033
7034 <br>&nbsp;
7035 <li>
7036 If you have CGI files or other files that must be in different directories
7037 or is separate partitions, repeat these steps, as needed.</li>
7038 </ol>
7039 You must install the Apache software on both cluster systems. Note that
7040 the basic Apache server configuration must be the same on both cluster
7041 systems in order for the service to fail over correctly. The following
7042 example shows a basic Apache Web server installation, with no third-party
7043 modules or performance tuning. To install Apache with modules, or to tune
7044 it for better performance, see the Apache documentation that is located
7045 in the Apache installation directory, or on the Apache Web site, <a href="http://www.apache.org" target="_blank">www.apache.org</a>.
7046 <p>On both cluster systems, install the Apache RPM's.&nbsp; For example:
7047 <b>apache-1.3.20-16</b>
7048 <p>To configure the cluster systems as Apache servers, customize the <b><font face="Courier New, Courier, mono">httpd.conf</font></b>
7049 Apache configuration file, and create a script that will start and stop
7050 the Apache service. Then, copy the files to the other cluster system. The
7051 files must be identical on both cluster systems in order for the Apache
7052 service to fail over correctly.
7053 <p>On one system, perform the following tasks:
7054 <ol>
7055 <li>
7056 Edit the <b><font face="Courier New, Courier, mono">/etc/httpd/conf/httpd.conf</font></b>
7057 Apache configuration file and customize the file according to your configuration.
7058 For example:</li>
7059
7060 <ul>
7061 <li>
7062 Specify the directory that will contain the HTML files. You will specify
7063 this mount point when you add the Apache service to the cluster database.
7064 You are only required to change this field if the mountpoint for the web
7065 site's content differs from the default setting of /<b>var/www/html.</b>
7066 For example:</li>
7067
7068 <pre>DocumentRoot "/mnt/apacheservice/html"</pre>
7069
7070 <li>
7071 If you have modified the script directory to reside in a non-standard location,
7072 specify the directory that will contain the CGI programs. For example:</li>
7073
7074 <pre>ScriptAlias /cgi-bin/ "/mnt/apacheservice/cgi-bin/"</pre>
7075
7076 <li>
7077 Specify the path that was used in the previous step, and set the access
7078 permissions to default for that directory. For example:</li>
7079
7080 <pre>&lt;Directory mnt/apacheservice/cgi-bin">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
7081 &nbsp; AllowOverride None
7082 &nbsp; Options None&nbsp;
7083 &nbsp; Order allow,deny&nbsp;
7084 &nbsp; Allow from all&nbsp;
7085 &lt;/Directory></pre>
7086 </ul>
7087 If you want to tune Apache or add third-party module functionality, you
7088 may have to make additional changes. For information on setting up other
7089 options, see the Apache project documentation.
7090 <br>&nbsp;
7091 <li>
7092 The standard Apache start script, <b>/etc/rc.d/init.d/httpd </b>will also
7093 be used within the cluster framework to start and stop the Apache server
7094 on the active cluster member.&nbsp; Accordingly, when configuring the service,
7095 specify that script when prompted for the <b>User script.</b> <i>Editorial
7096 comment: unclear if a modified status section is needed in the httpd init.d
7097 script.</i></li>
7098 </ol>
7099 Before you add the Apache service to the cluster database, ensure that
7100 the Apache directories are not mounted. Then, on one cluster system, add
7101 the service. You must specify an IP address, which the cluster infrastructure
7102 will bind to the network interface on the cluster system that runs the
7103 Apache service.
7104 <p>The following is an example of using the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7105 utility to add an Apache service.
7106 <pre><font size=-1>cluadmin> <b>service add apache
7107
7108 </b>&nbsp; The user interface will prompt you for information about the service.
7109 &nbsp; Not all information is required for all services.
7110
7111 &nbsp; Enter a question mark (?) at a prompt to obtain help.
7112
7113 &nbsp; Enter a colon (:) and a single-character command at a prompt to do
7114 &nbsp; one of the following:
7115
7116 &nbsp; c - Cancel and return to the top-level cluadmin command
7117 &nbsp; r - Restart to the initial prompt while keeping previous responses
7118 &nbsp; p - Proceed with the next prompt
7119 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
7120 Preferred member [None]: <b><font face="Courier New, Courier, mono">devel0
7121 </font></b>Relocate when the preferred member joins the cluster (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
7122 </font></b>User script (e.g., /usr/foo/script or None) [None]: <b><font face="Courier New, Courier, mono">/etc/rc.d/init.d/httpd
7123
7124 </font></b>Do you want to add an IP address to the service (yes/no/?): <b><font face="Courier New, Courier, mono">yes
7125
7126 </font></b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; IP Address Information&nbsp;&nbsp;&nbsp;
7127
7128 IP address: <b><font face="Courier New, Courier, mono">10.1.16.150
7129 </font></b>Netmask (e.g. 255.255.255.0 or None) [None]: <b><font face="Courier New, Courier, mono">255.255.255.0
7130 </font></b>Broadcast (e.g. X.Y.Z.255 or None) [None]: <b><font face="Courier New, Courier, mono">10.1.16.255
7131
7132 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address,
7133 &nbsp;or are you (f)inished adding IP addresses: <b><font face="Courier New, Courier, mono">f
7134
7135 </font></b>Do you want to add a disk device to the service (yes/no/?): <b><font face="Courier New, Courier, mono">yes
7136
7137 </font></b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Disk Device Information
7138
7139 Device special file (e.g., /dev/sda1): <b><font face="Courier New, Courier, mono">/dev/sdb3
7140 </font></b>Filesystem type (e.g., ext2, reiserfs, ext3 or None): <b><font face="Courier New, Courier, mono">ext3
7141 </font></b>Mount point (e.g., /usr/mnt/service1 or None) [None]: <b><font face="Courier New, Courier, mono">/var/www/html
7142 </font></b>Mount options (e.g., rw, nosuid): <b><font face="Courier New, Courier, mono">rw
7143 </font></b>Forced unmount support (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">yes
7144
7145 </font></b>Do you want to (a)dd, (m)odify, (d)elete or (s)how devices,&nbsp;
7146 &nbsp; or are you (f)inished adding device information: <b><font face="Courier New, Courier, mono">f
7147
7148 </font></b>Disable service (yes/no/?) [no]: <b><font face="Courier New, Courier, mono">no
7149
7150 </font></b>name: apache&nbsp;
7151 disabled: no&nbsp;
7152 preferred node: node1&nbsp;
7153 relocate: yes&nbsp;
7154 user script: /etc/rc.d/init/httpd&nbsp;
7155 IP address 0: 10.1.16.150&nbsp;
7156 &nbsp;netmask 0: 255.255.255.0&nbsp;
7157 &nbsp;broadcast 0: 10.1.16.255&nbsp;
7158 device 0: /dev/sde3&nbsp;
7159 &nbsp;mount point, device 0: /var/www/html
7160 &nbsp;mount fstype, device 0: ext3
7161 &nbsp;mount options, device 0: rw,sync&nbsp;
7162 &nbsp;force unmount, device 0: yes&nbsp;
7163 &nbsp;owner, device 0: nobody&nbsp;
7164 &nbsp;group, device 0: nobody&nbsp;
7165 Add apache service as shown? (yes/no/?) <b>y
7166
7167 </b>Added apache.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
7168 cluadmin></font></pre>
7169
7170 <p><br><a NAME="service-status"></a>
7171 <h2>
7172 4.2 Displaying a Service Configuration</h2>
7173 You can display detailed information about the configuration of a service.
7174 This information includes the following:
7175 <ul>
7176 <li>
7177 Service name</li>
7178
7179 <li>
7180 Whether the service was disabled after it was added</li>
7181
7182 <li>
7183 Preferred member system</li>
7184
7185 <li>
7186 Whether the service will relocate to its preferred member when it joins
7187 the cluster</li>
7188
7189 <li>
7190 Service start script location</li>
7191
7192 <li>
7193 IP addresses</li>
7194
7195 <li>
7196 Disk partitions</li>
7197
7198 <li>
7199 File system type</li>
7200
7201 <li>
7202 Mount points and mount options</li>
7203
7204 <li>
7205 NFS exports</li>
7206 </ul>
7207 To display cluster service status, see <a href="#cluster-status">Displaying
7208 Cluster and Service Status</a>.
7209 <p>To display service configuration information, invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7210 utility and specify the <b><font face="Courier New, Courier, mono">service
7211 show config</font></b> command. For example:
7212 <pre><font size=-1>cluadmin> service show config
7213 &nbsp; 0) dummy
7214 &nbsp; 1) nfs_pref_clu4
7215 &nbsp; 2) nfs_pref_clu3
7216 &nbsp; 3) nfs_nopref
7217 &nbsp; 4) ext3
7218 &nbsp; 5) nfs_eng
7219 &nbsp; 6) nfs_engineering
7220 &nbsp; c) cancel
7221
7222 Choose service: 6
7223 name: nfs_engineering
7224 disabled: no
7225 preferred node: clu3
7226 relocate: yes
7227 IP address 0: 172.16.33.164
7228 device 0: /dev/sdb11
7229 &nbsp; mount point, device 0: /mnt/users/engineering
7230 &nbsp; mount fstype, device 0: ext2
7231 &nbsp; mount options, device 0: rw,nosuid,sync
7232 &nbsp; force unmount, device 0: yes
7233 NFS export 0: /mnt/users/engineering/ferris
7234 &nbsp; Client 0: ferris, rw
7235 NFS export 0: /mnt/users/engineering/denham
7236 &nbsp; Client 0: denham, rw
7237 NFS export 0: /mnt/users/engineering/brown
7238 &nbsp; Client 0: brown, rw
7239 cluadmin></font></pre>
7240 If you know the name of the service, you can specify the <b><font face="Courier New, Courier, mono">service
7241 show config <i>service_name</i></font></b> command.
7242 <br>&nbsp;
7243 <br>&nbsp;
7244 <br>&nbsp;
7245 <p><a NAME="service-disable"></a>
7246 <h2>
7247 4.3 Disabling a Service</h2>
7248 You can disable a running service to stop the service and make it unavailable.
7249 To start a disabled service, you must enable it. See <a href="#service-enable">Enabling
7250 a Service</a> for information.
7251 <p>There are several situations in which you may need to disable a running
7252 service:
7253 <ul>&nbsp;
7254 <li>
7255 You want to modify a service.</li>
7256
7257 <br>&nbsp;
7258 <p>&nbsp;
7259 <br>&nbsp;
7260 <br>&nbsp;
7261 <p>You must disable a running service before you can modify it. See
7262 <a href="#service-modify">Modifying
7263 a Service</a> for more information.
7264 <br>&nbsp;
7265 <li>
7266 You want to temporarily stop a service.</li>
7267
7268 <br>&nbsp;
7269 <p>&nbsp;
7270 <br>&nbsp;
7271 <br>&nbsp;
7272 <p>For example, you can disable a service to make it unavailable to clients,
7273 without having to delete the service.</ul>
7274 To disable a running service, invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7275 utility and specify the <font face="Courier New, Courier, mono"><b>service
7276 disable</b> <b><i>service_name</i></b></font> command. For example:
7277 <pre>cluadmin> <b>service disable user_home
7278 </b>Are you sure? (yes/no/?) <b>y
7279 </b>notice: Stopping service user_home&nbsp; ...
7280 notice: Service user_home is disabled
7281 service user_home disabled</pre>
7282 &nbsp;
7283 <p>&nbsp;
7284 <br>&nbsp;
7285 <br>&nbsp;
7286 <br>&nbsp;
7287 <br>&nbsp;
7288 <p><a NAME="service-enable"></a>
7289 <h2>
7290 4.4 Enabling a Service</h2>
7291 You can enable a disabled service to start the service and make it available.&nbsp;
7292 See <a href="#service-error">Handling Services in an Error State</a> for
7293 more information.
7294 <p>To enable a disabled service, invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7295 utility and specify the <b><font face="Courier New, Courier, mono">service
7296 enable <i>service_name</i></font></b> command. For example:
7297 <pre>cluadmin> <b>service enable user_home
7298 </b>Are you sure? (yes/no/?) <b>y
7299 </b>notice: Starting service user_home ...
7300 notice: Service user_home is running
7301 service user_home enabled</pre>
7302 <i>Editorial comment: probably need a new cluadmin output here as it probably
7303 prompts for which member to start the service on.</i>
7304 <br>&nbsp;
7305 <br>&nbsp;
7306 <br>&nbsp;
7307 <br>&nbsp;
7308 <br>&nbsp;
7309 <br>&nbsp;
7310 <br>&nbsp;
7311 <br>&nbsp;
7312 <br>&nbsp;
7313 <br>&nbsp;
7314 <br>&nbsp;
7315 <br>&nbsp;
7316 <br>&nbsp;
7317 <br>&nbsp;
7318 <p><a NAME="service-modify"></a>
7319 <h2>
7320 4.5 Modifying a Service</h2>
7321 You can modify any property that you specified when you created the service.
7322 For example, you can change the IP address. You can also add more resources
7323 to a service. For example, you can add more file systems. See <a href="#service-gather">Gathering
7324 Service Information</a> for information.
7325 <p>You must disable a service before you can modify it. If you attempt
7326 to modify a running service, you will be prompted to disable it. See <a href="#service-disable">Disabling
7327 a Service </a>for more information.
7328 <p>Because a service is unavailable while you modify it, be sure to gather
7329 all the necessary service information before you disable the service, in
7330 order to minimize service down time. In addition, you may want to back
7331 up the cluster database before modifying a service. See <a href="#cluster-backup">Backing
7332 Up and Restoring the Cluster Database</a> for more information.
7333 <p>To modify a disabled service, invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7334 utility and specify the <b><font face="Courier New, Courier, mono">service
7335 modify <i>service_name</i></font></b> command.
7336 <pre>cluadmin> <b>service modify web1</b></pre>
7337 You can then modify the service properties and resources, as needed. The
7338 cluster will check the service modifications, and allow you to correct
7339 any mistakes. If you submit the changes, the cluster verifies the service
7340 modification and then starts the service, unless you chose to keep the
7341 service disabled. If you do not submit the changes, the service will be
7342 started, if possible, using the original configuration.
7343 <br>&nbsp;
7344 <h2>
7345 <a NAME="service-relocate"></a></h2>
7346
7347 <h2>
7348 4.6 Relocating a Service</h2>
7349 In addition to providing automatic service failover, a cluster enables
7350 you to cleanly stop a service on one cluster system and then start it on
7351 the other cluster system. This service relocation functionality enables
7352 administrators to perform maintenance on a cluster system, while maintaining
7353 application and data availability.
7354 <p>To relocate a service by using the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7355 utility, invole the <b>service relocate</b> command.
7356 <p><i>Editorial comment: include cluadmin output example.</i>
7357 <br>&nbsp;
7358 <br>&nbsp;
7359 <p><a NAME="service-delete"></a>
7360 <h2>
7361 4.7 Deleting a Service</h2>
7362 You can delete a cluster service. You may want to back up the cluster database
7363 before deleting a service. See <a href="#cluster-backup">Backing Up and
7364 Restoring the Cluster Database</a> for information.
7365 <p>To delete a service by using the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7366 utility, follow these steps:
7367 <ol>
7368 <li>
7369 Invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7370 utility on the cluster system that is running the service, and specify
7371 the <b><font face="Courier New, Courier, mono">service disable <i>service_name</i></font></b>
7372 command. See <a href="#service-disable">Disabling a Service</a> for more
7373 information</li>
7374
7375 <br>&nbsp;
7376 <li>
7377 Specify the <b><font face="Courier New, Courier, mono">service delete <i>service_name</i></font></b>
7378 command to delete the service.</li>
7379 </ol>
7380 For example:
7381 <pre>cluadmin> <b>service disable user_home
7382 </b>Are you sure? (yes/no/?) <b>y
7383 </b>notice: Stopping service user_home&nbsp; ...
7384 notice: Service user_home is disabled
7385 service user_home disabled&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
7386
7387 cluadmin> <b>service delete user_home
7388 </b>Deleting user_home, are you sure? (yes/no/?): <b>y
7389 </b>user_home deleted.
7390 cluadmin></pre>
7391
7392 <p><br><a NAME="service-error"></a>
7393 <h2>
7394 4.8 Handling Services in an Error State</h2>
7395 <i>Editorial comment: the error state no longer exists.&nbsp; Need to rework
7396 this section somewhat and incorporate with the disabled service section.</i>
7397 <p>A service in the <b><font face="Courier New, Courier, mono">error</font></b>
7398 state is still owned by a cluster system, but the status of its resources
7399 cannot be determined (for example, part of the service has stopped, but
7400 some service resources are still configured on the owner system). See <a href="#cluster-status">Displaying
7401 Cluster and Service Status</a> for detailed information about service states.
7402 <p>The cluster puts a service into the <b><font face="Courier New, Courier, mono">error</font></b>
7403 state if it cannot guarantee the integrity of the service. An <b><font face="Courier New, Courier, mono">error</font></b>
7404 state can be caused by various problems, such as a service start did not
7405 succeed, and the subsequent service stop also failed.
7406 <p>You must carefully handle services in the <b><font face="Courier New, Courier, mono">error</font></b>
7407 state. If service resources are still configured on the owner system, starting
7408 the service on the other cluster system may cause significant problems.
7409 For example, if a file system remains mounted on the owner system, and
7410 you start the service on the other cluster system, the file system will
7411 be mounted on both systems, which can cause data corruption. Therefore,
7412 you can only enable or disable a service that is in the
7413 <b><font face="Courier New, Courier, mono">error</font></b>
7414 state on the system that owns the service. If the enable or disable fails,
7415 the service will remain in the <b><font face="Courier New, Courier, mono">error</font></b>
7416 state.
7417 <p>You can also modify a service that is in the <b><font face="Courier New, Courier, mono">error</font></b>
7418 state. You may need to do this in order to correct the problem that caused
7419 the <b><font face="Courier New, Courier, mono">error</font></b> state.
7420 After you modify the service, it will be enabled on the owner system, if
7421 possible, or it will remain in the <b><font face="Courier New, Courier, mono">error</font></b>
7422 state. The service will not be disabled.
7423 <p>If a service is in the <b><font face="Courier New, Courier, mono">error</font></b>
7424 state, follow these steps to resolve the problem:
7425 <ol>
7426 <li>
7427 Modify cluster event logging to log debugging messages. See <a href="#cluster-logging">Modifying
7428 Cluster Event Logging</a> for more information.</li>
7429
7430 <br>&nbsp;
7431 <li>
7432 Use the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7433 utility to attempt to enable or disable the service on the cluster system
7434 that owns the service. See <a href="#service-disable">Disabling a Service</a>
7435 and <a href="#service-enable">Enabling a Service</a> for more information.</li>
7436
7437 <br>&nbsp;
7438 <li>
7439 If the service does not start or stop on the owner system, examine the
7440 <b><font face="Courier New, Courier, mono">/var/log/cluster</font></b>
7441 log file, and diagnose and correct the problem. You may need to modify
7442 the service to fix incorrect information in the cluster database (for example,
7443 an incorrect start script), or you may need to perform manual tasks on
7444 the owner system (for example, unmounting file systems).</li>
7445
7446 <br>&nbsp;
7447 <li>
7448 Repeat the attempt to enable or disable the service on the owner system.
7449 If repeated attempts fail to correct the problem and enable or disable
7450 the service, reboot the owner system.</li>
7451 </ol>
7452
7453 <hr noshade width="80%">
7454 <p><a NAME="admin"></a>
7455 <h1>
7456 5 Cluster Administration</h1>
7457 After you set up a cluster and configure services, you may need to administer
7458 the cluster, as described in the following sections:
7459 <ul>
7460 <li>
7461 <a href="#cluster-status">Displaying Cluster and Service Status</a></li>
7462
7463 <li>
7464 <a href="#cluster-start">Starting and Stopping the Cluster Software</a></li>
7465
7466 <li>
7467 <a href="#cluster-config">Modifying the Cluster Configuration</a></li>
7468
7469 <li>
7470 <a href="#cluster-backup">Backing Up and Restoring the Cluster Database</a></li>
7471
7472 <li>
7473 <a href="#cluster-logging">Modifying Cluster Event Logging</a></li>
7474
7475 <li>
7476 <a href="#cluster-reinstall">Updating the Cluster Software</a></li>
7477
7478 <li>
7479 <a href="#cluster-reload">Reloading the Cluster Database</a></li>
7480
7481 <li>
7482 <a href="#cluster-name">Changing the Cluster Name</a></li>
7483
7484 <li>
7485 <a href="#cluster-init">Reinitializing the Cluster</a></li>
7486
7487 <li>
7488 <a href="#cluster-remove">Removing a Cluster Member</a></li>
7489
7490 <li>
7491 <a href="#diagnose">Diagnosing and Correcting Problems in a Cluster</a></li>
7492 </ul>
7493 <a NAME="cluster-status"></a>
7494 <br>&nbsp;
7495 <h2>
7496 5.1 Displaying Cluster and Service Status</h2>
7497 Monitoring cluster and service status can help you identify and solve problems
7498 in the cluster environment. You can display status by using the following
7499 tools:
7500 <ul>
7501 <li>
7502 The <b><font face="Courier New, Courier, mono">clustat</font></b> command</li>
7503
7504 <li>
7505 Log file messages</li>
7506 </ul>
7507 Note that status is always from the point of view of the cluster system
7508 on which you are running a tool. To obtain comprehensive cluster status,
7509 run a tool on all cluster systems.
7510 <p>Cluster and service status includes the following information:
7511 <ul>
7512 <li>
7513 Cluster member system status</li>
7514
7515 <li>
7516 Power switch status</li>
7517
7518 <li>
7519 Heartbeat channel status</li>
7520
7521 <li>
7522 Service status and which cluster system is running the service or owns
7523 the service</li>
7524
7525 <li>
7526 <i>Editorial comment: add bullett and subsequent description of service
7527 monitoring status.</i></li>
7528 </ul>
7529 The following table describes how to analyze the status information shown
7530 by the <b><font face="Courier New, Courier, mono">cluadmin</font></b> utility,
7531 the <b><font face="Courier New, Courier, mono">clustat</font></b> command,
7532 and the cluster GUI.
7533 <br>&nbsp;
7534 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="92%" >
7535 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#F8FCF8">
7536 <td WIDTH="22%" HEIGHT="39"><b>Member Status&nbsp;</b></td>
7537
7538 <td WIDTH="78%" HEIGHT="39"><b>Description</b></td>
7539 </tr>
7540
7541 <tr>
7542 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">UP</font></b></td>
7543
7544 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The member system is
7545 communicating with the other member system and accessing the quorum partitions.</td>
7546 </tr>
7547
7548 <tr>
7549 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">DOWN</font></b></td>
7550
7551 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The member system is
7552 unable to communicate with the other member system.</td>
7553 </tr>
7554 </table>
7555
7556 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="92%" >
7557 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#F8FCF8">
7558 <td WIDTH="22%" HEIGHT="9"><b>Power Switch Status</b></td>
7559
7560 <td WIDTH="78%" HEIGHT="9"><b>Description&nbsp;</b></td>
7561 </tr>
7562
7563 <tr>
7564 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">OK</font></b></td>
7565
7566 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The power switch is operating
7567 properly.</td>
7568 </tr>
7569
7570 <tr>
7571 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">Wrn</font></b></td>
7572
7573 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">Could not obtain power
7574 switch status.</td>
7575 </tr>
7576
7577 <tr>
7578 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">Err</font></b></td>
7579
7580 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">A failure or error has
7581 occurred.&nbsp;</td>
7582 </tr>
7583
7584 <tr>
7585 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">Good</font></b></td>
7586
7587 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The power switch is operating
7588 properly.</td>
7589 </tr>
7590
7591 <tr>
7592 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">Unknown</font></b></td>
7593
7594 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The other cluster member
7595 is <b><font face="Courier New, Courier, mono">DOWN</font></b>.</td>
7596 </tr>
7597
7598 <tr>
7599 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">Timeout</font></b></td>
7600
7601 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The power switch is not
7602 responding to power daemon commands, possibly because of a disconnected
7603 serial cable.</td>
7604 </tr>
7605
7606 <tr>
7607 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">Error</font></b></td>
7608
7609 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">A failure or error has
7610 occurred.&nbsp;</td>
7611 </tr>
7612
7613 <tr>
7614 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="29"><b><font face="Courier New, Courier, mono">None</font></b></td>
7615
7616 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="29">The cluster configuration
7617 does not include power switches.</td>
7618 </tr>
7619
7620 <tr>
7621 <td>
7622 <center><b>Initializing</b></center>
7623 </td>
7624
7625 <td>The switch is in the process of being initialized and its definitive
7626 status has not been concluded.</td>
7627 </tr>
7628 </table>
7629
7630 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="92%" >
7631 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#F8FCF8">
7632 <td WIDTH="22%" HEIGHT="27"><b>Heartbeat Channel Status</b></td>
7633
7634 <td WIDTH="78%" HEIGHT="27"><b>Description&nbsp;</b></td>
7635 </tr>
7636
7637 <tr>
7638 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">OK</font></b></td>
7639
7640 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">The heartbeat channel
7641 is operating properly.</td>
7642 </tr>
7643
7644 <tr>
7645 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">Wrn</font></b></td>
7646
7647 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">Could not obtain channel
7648 status.</td>
7649 </tr>
7650
7651 <tr>
7652 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">Err</font></b></td>
7653
7654 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">A failure or error has
7655 occurred.&nbsp;</td>
7656 </tr>
7657
7658 <tr>
7659 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">ONLINE</font></b></td>
7660
7661 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">The heartbeat channel
7662 is operating properly.</td>
7663 </tr>
7664
7665 <tr>
7666 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">OFFLINE</font></b></td>
7667
7668 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">The other cluster member
7669 appears to be <b><font face="Courier New, Courier, mono">UP</font></b>,
7670 but it is not responding to heartbeat requests on this channel.</td>
7671 </tr>
7672
7673 <tr>
7674 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="26"><b><font face="Courier New, Courier, mono">UNKNOWN</font></b></td>
7675
7676 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="26">Could not obtain the
7677 status of the other cluster member system over this channel, possibly because
7678 the system is <b><font face="Courier New, Courier, mono">DOWN</font></b>
7679 or the cluster daemons are not running.</td>
7680 </tr>
7681 </table>
7682
7683 <p><i>Editorial comment: many of these service states no longer exist.</i>
7684 <br>&nbsp;
7685 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="92%" >
7686 <tr ALIGN=CENTER VALIGN=CENTER BGCOLOR="#F8FCF8">
7687 <td WIDTH="22%" HEIGHT="43"><b>Service Status&nbsp;</b></td>
7688
7689 <td WIDTH="78%" HEIGHT="43"><b>Description</b></td>
7690 </tr>
7691
7692 <tr>
7693 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="48"><b><font face="Courier New, Courier, mono">running</font></b></td>
7694
7695 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="48">The service resources
7696 are configured and available on the cluster system that owns the service.
7697 The <b><font face="Courier New, Courier, mono">running</font></b> state
7698 is a persistent state. From this state, a service can enter the <b><font face="Courier New, Courier, mono">stopping</font></b>
7699 state (for example, if the preferred member rejoins the cluster), the <b><font face="Courier New, Courier, mono">disabling</font></b>
7700 state (if a user initiates a request to disable the service), or the <b><font face="Courier New, Courier, mono">error</font></b>
7701 state (if the status of the service resources cannot be determined).&nbsp;</td>
7702 </tr>
7703
7704 <tr>
7705 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="53"><b><font face="Courier New, Courier, mono">disabling</font></b></td>
7706
7707 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="53">The service is in the
7708 process of being disabled (for example, a user has initiated a request
7709 to disable the service). The <b><font face="Courier New, Courier, mono">disabling</font></b>
7710 state is a transient state. The service remains in the <b><font face="Courier New, Courier, mono">disabling</font></b>state
7711 until the service disable succeeds or fails. From this state, the service
7712 can enter the <b><font face="Courier New, Courier, mono">disabled</font></b>
7713 state (if the disable succeeds), the <b><font face="Courier New, Courier, mono">running</font></b>
7714 state (if the disable fails and the service is restarted), or the <b><font face="Courier New, Courier, mono">error</font></b>
7715 state (if the status of the service resources cannot be determined).&nbsp;</td>
7716 </tr>
7717
7718 <tr>
7719 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="77"><b><font face="Courier New, Courier, mono">disabled</font></b></td>
7720
7721 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="77">The service has been
7722 disabled, and does not have an assigned owner. The <b><font face="Courier New, Courier, mono">disabled</font></b>
7723 state is a persistent state. From this state, the service can enter the
7724 <b><font face="Courier New, Courier, mono">starting</font></b>
7725 state (if a user initiates a request to start the service), or the <b><font face="Courier New, Courier, mono">error</font></b>
7726 state (if a request to start the service failed and the status of the service
7727 resources cannot be determined).&nbsp;</td>
7728 </tr>
7729
7730 <tr>
7731 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="51"><b><font face="Courier New, Courier, mono">starting</font></b></td>
7732
7733 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="51">The service is in the
7734 process of being started. The <b><font face="Courier New, Courier, mono">starting</font></b>
7735 state is a transient state. The service remains in the <b><font face="Courier New, Courier, mono">starting</font></b>
7736 state until the service start succeeds or fails. From this state, the service
7737 can enter the <b><font face="Courier New, Courier, mono">running</font></b>
7738 state (if the service start succeeds), the <b><font face="Courier New, Courier, mono">stopped</font></b>
7739 state (if the service stop fails), or the <b><font face="Courier New, Courier, mono">error</font></b>
7740 state (if the status of the service resources cannot be determined).&nbsp;</td>
7741 </tr>
7742
7743 <tr>
7744 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="55"><b><font face="Courier New, Courier, mono">stopping</font></b></td>
7745
7746 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="55">The service is in the
7747 process of being stopped. The <b><font face="Courier New, Courier, mono">stopping</font></b>
7748 state is a transient state. The service remains in the <b><font face="Courier New, Courier, mono">stopping</font></b>
7749 state until the service stop succeeds or fails. From this state, the service
7750 can enter the <b><font face="Courier New, Courier, mono">stopped</font></b>
7751 state (if the service stop succeeds), the <b><font face="Courier New, Courier, mono">running</font></b>
7752 state (if the service stop failed and the service can be started), or the
7753 <b><font face="Courier New, Courier, mono">error</font></b>
7754 state (if the status of the service resources cannot be determined).&nbsp;</td>
7755 </tr>
7756
7757 <tr>
7758 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="52"><b><font face="Courier New, Courier, mono">stopped</font></b></td>
7759
7760 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="52">The service is not running
7761 on any cluster system, does not have an assigned owner, and does not have
7762 any resources configured on a cluster system. The <b><font face="Courier New, Courier, mono">stopped</font></b>
7763 state is a persistent state. From this state, the service can enter the
7764 <b><font face="Courier New, Courier, mono">disabled</font></b>
7765 state (if a user initiates a request to disable the service), or the <b><font face="Courier New, Courier, mono">starting</font></b>
7766 state (if the preferred member joins the cluster).&nbsp;</td>
7767 </tr>
7768
7769 <tr>
7770 <td ALIGN=CENTER VALIGN=CENTER WIDTH="22%" HEIGHT="81"><b><font face="Courier New, Courier, mono">error</font></b></td>
7771
7772 <td ALIGN=LEFT VALIGN=TOP WIDTH="78%" HEIGHT="81">The status of the service
7773 resources cannot be determined. For example, some resources associated
7774 with the service may still be configured on the cluster system that owns
7775 the service. The <b><font face="Courier New, Courier, mono">error</font></b>
7776 state is a persistent state. To protect data integrity, you must ensure
7777 that the service resources are no longer configured on a cluster system,
7778 before trying to start or stop a service in the <b><font face="Courier New, Courier, mono">error</font></b>
7779 state.&nbsp;</td>
7780 </tr>
7781 </table>
7782
7783 <p>To display a snapshot of the current cluster status, invoke the <b><font face="Courier New, Courier, mono">clustat</font></b>
7784 utility. For example:
7785 <pre># clustat
7786 <font size=-1>Thu Jul 20 16:23:54 EDT 2000
7787 Cluster Configuration (cluster_1):
7788
7789 Member status:
7790
7791 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Member&nbsp;&nbsp;&nbsp;&nbsp; Id&nbsp;&nbsp;&nbsp;&nbsp; System Status&nbsp; Power Switch
7792 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ---------- ------ -------------&nbsp; ------------
7793 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; stor4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Up&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Good
7794 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; stor5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Up&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Good
7795
7796 Channel status:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
7797
7798 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Type&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Status
7799 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -------------------------&nbsp; ---------- --------
7800 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; stor4 &lt;--> stor5&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; network&nbsp;&nbsp;&nbsp; ONLINE&nbsp;&nbsp;&nbsp;&nbsp;
7801 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; /dev/ttyS1 &lt;--> /dev/ttyS1&nbsp; serial&nbsp;&nbsp;&nbsp;&nbsp; OFFLINE&nbsp;&nbsp;&nbsp;&nbsp;
7802
7803
7804 Service status:
7805
7806 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Service&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Status&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Owner
7807 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ----------------&nbsp; ----------&nbsp; ----------------
7808 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; diskmount&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; disabled&nbsp;&nbsp;&nbsp; None
7809 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; database1&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; running&nbsp;&nbsp;&nbsp;&nbsp; stor5
7810 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; database2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; starting&nbsp;&nbsp;&nbsp; stor4
7811 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; user_mail&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; disabling&nbsp;&nbsp; None
7812 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; web_home&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; running&nbsp;&nbsp;&nbsp;&nbsp; stor4</font></pre>
7813 <i>Editorial comment: need a more recent screenshot of clustat output above.</i>
7814 <br>To monitor the cluster and display status at specific time intervals,
7815 invoke <b><font face="Courier New, Courier, mono">clustat</font></b> with
7816 the <b><font face="Courier New, Courier, mono">-i <i>time</i></font></b>
7817 command option, where <b><i><font face="Courier New, Courier, mono">time</font></i></b>
7818 specifies the number of seconds between status shapshots.
7819 <p><a NAME="cluster-start"></a>
7820 <br>&nbsp;
7821 <h2>
7822 5.2 Starting and Stopping the Cluster Software</h2>
7823 You can start the cluster software on a cluster system by invoking the
7824 <b><font face="Courier New, Courier, mono">cluster
7825 start </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
7826 directory. For example:
7827 <pre># <b>service cluster start</b></pre>
7828 You can stop the cluster software on a cluster system by invoking the <b><font face="Courier New, Courier, mono">cluster
7829 stop </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
7830 directory. For example:
7831 <pre># <b>service cluster stop</b></pre>
7832 The previous command willcause the cluster system's services to relocate
7833 over to the other cluster system.
7834 <br>&nbsp;
7835 <p><a NAME="cluster-config"></a>
7836 <h2>
7837 5.3 Modifying the Cluster Configuration</h2>
7838 You may need to modify the cluster configuration. For example, you may
7839 need to correct heartbeat channel or quorum partition entries in the cluster
7840 database, a copy of which is located in the <b><font face="Courier New, Courier, mono">/etc/cluster.conf</font></b>
7841 file.
7842 <p>You must use the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
7843 utility to modify the cluster configuration. Do not modify the <b><font face="Courier New, Courier, mono">cluster.conf</font></b>
7844 file. To modify the cluster configuration, stop the cluster software on
7845 one cluster system, as described in <a href="#cluster-start">Starting and
7846 Stopping the Cluster Software</a>.
7847 <p>Then, invoke the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
7848 utility, and specify the correct information at the prompts. After running
7849 the utility, restart the cluster software.
7850 <p><a NAME="cluster-backup"></a>
7851 <h2>
7852 5.4 Backing Up and Restoring the Cluster Database</h2>
7853 It is recommended that you regularly back up the cluster database. In addition,
7854 you should back up the database before making any significant changes to
7855 the cluster configuration.
7856 <p>To back up the cluster database to the <b><font face="Courier New, Courier, mono">/etc/cluster.conf.bak</font></b>
7857 file, invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7858 utility, and specify the <b><font face="Courier New, Courier, mono">cluster
7859 backup</font></b> command. For example:
7860 <pre>cluadmin> <b>cluster backup</b></pre>
7861 You can also save the cluster database to a different file by invoking
7862 the <b><font face="Courier New, Courier, mono">cluadmin</font></b> utility
7863 and specifying the <b><font face="Courier New, Courier, mono">cluster saveas
7864 <i>filename</i></font></b>command.
7865 <p>To restore the cluster database, follow these steps:
7866 <ol>
7867 <li>
7868 Stop the cluster software on one system by invoking the <b><font face="Courier New, Courier, mono">cluster
7869 stop </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
7870 directory. For example:</li>
7871
7872 <pre># <b>/etc/rc.d/init.d/cluster stop</b></pre>
7873 The previous command may cause the cluster system's services to fail over
7874 to the other cluster system.
7875 <br>&nbsp;
7876 <li>
7877 On the remaining cluster system, invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7878 utility and restore the cluster database. To restore the database from
7879 the <b><font face="Courier New, Courier, mono">/etc/cluster.conf.bak</font></b>
7880 file, specify the <b><font face="Courier New, Courier, mono">cluster restore</font></b>
7881 command. To restore the database from a different file, specify the <b><font face="Courier New, Courier, mono">cluster
7882 restorefrom <i>file_name</i></font></b> command.</li>
7883
7884 <br>&nbsp;
7885 <p>&nbsp;
7886 <br>&nbsp;
7887 <br>&nbsp;
7888 <br>&nbsp;
7889 <br>&nbsp;
7890 <br>&nbsp;
7891 <br>&nbsp;
7892 <br>&nbsp;
7893 <br>&nbsp;
7894 <br>&nbsp;
7895 <br>&nbsp;
7896 <br>&nbsp;
7897 <br>&nbsp;
7898 <br>&nbsp;
7899 <br>&nbsp;
7900 <br>&nbsp;
7901 <br>&nbsp;
7902 <br>&nbsp;
7903 <br>&nbsp;
7904 <br>&nbsp;
7905 <br>&nbsp;
7906 <br>&nbsp;
7907 <br>&nbsp;
7908 <br>&nbsp;
7909 <br>&nbsp;
7910 <br>&nbsp;
7911 <br>&nbsp;
7912 <br>&nbsp;
7913 <br>&nbsp;
7914 <br>&nbsp;
7915 <br>&nbsp;
7916 <br>&nbsp;
7917 <br>&nbsp;
7918 <br>&nbsp;
7919 <br>&nbsp;
7920 <br>&nbsp;
7921 <br>&nbsp;
7922 <br>&nbsp;
7923 <br>&nbsp;
7924 <br>&nbsp;
7925 <br>&nbsp;
7926 <br>&nbsp;
7927 <br>&nbsp;
7928 <p>The cluster will disable all running services, delete all the services,
7929 and then restore the database.
7930 <br>&nbsp;
7931 <li>
7932 Restart the cluster software on the stopped system by invoking the <b><font face="Courier New, Courier, mono">cluster
7933 start </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
7934 directory. For example:</li>
7935
7936 <pre># <b>service cluster start</b></pre>
7937
7938 <li>
7939 Restart each cluster service by invoking the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7940 utility on the cluster system on which you want to run the service and
7941 specifying the <b><font face="Courier New, Courier, mono">service enable
7942 <i>service_name</i></font></b>
7943 command.</li>
7944 </ol>
7945 <a NAME="cluster-logging"></a>
7946 <h2>
7947 5.5 Modifying Cluster Event Logging</h2>
7948 You can modify the severity level of the events that are logged by the
7949 <b><font face="Courier New, Courier, mono">clupowerd</font></b>,
7950 <b><font face="Courier New, Courier, mono">cluquorumd</font></b>,
7951 <b><font face="Courier New, Courier, mono">cluhbd</font></b>,
7952 and <b><font face="Courier New, Courier, mono">clusvcmgrd</font></b> daemons.
7953 You may want the daemons on the cluster systems to log messages at the
7954 same level.
7955 <p>To change a cluster daemon's logging level on all the cluster systems,
7956 invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7957 utility, and specify the <b><font face="Courier New, Courier, mono">cluster
7958 loglevel</font></b> command, the name of the daemon, and the severity level.
7959 You can specify the severity level by using the name or the number that
7960 corresponds to the severity level. The values 0 to 7 refer to the following
7961 severity levels:
7962 <blockquote><b><font face="Courier New, Courier, mono">0 - emerg</font></b>
7963 <br><b><font face="Courier New, Courier, mono">1 - alert</font></b>
7964 <br><b><font face="Courier New, Courier, mono">2 - crit</font></b>
7965 <br><b><font face="Courier New, Courier, mono">3 - err</font></b>
7966 <br><b><font face="Courier New, Courier, mono">4 - warning</font></b>
7967 <br><b><font face="Courier New, Courier, mono">5 - notice</font></b>
7968 <br><b><font face="Courier New, Courier, mono">6 - info</font></b>
7969 <br><b><font face="Courier New, Courier, mono">7 - debug</font></b></blockquote>
7970 Note that the cluster logs messages with the designated severity level
7971 and also messages of a higher severity. For example, if the severity level
7972 for quorum daemon messages is 2 (<b><font face="Courier New, Courier, mono">crit</font></b>),
7973 then the cluster logs messages or <b><font face="Courier New, Courier, mono">crit</font></b>,
7974 <b><font face="Courier New, Courier, mono">alert</font></b>,
7975 and <b><font face="Courier New, Courier, mono">emerg</font></b> severity
7976 levels. Be aware that setting the logging level to a low severity level,
7977 such as 7 (<b><font face="Courier New, Courier, mono">debug</font></b>),
7978 will result in large log files over time.
7979 <p>The following example enables the <b><font face="Courier New, Courier, mono">cluquorumd</font></b>
7980 daemon to log messages of all severity levels:
7981 <pre># <b>cluadmin
7982 </b>cluadmin> <b>cluster loglevel cluquorumd 7
7983 </b>cluadmin>
7984
7985 </pre>
7986 <a NAME="cluster-reinstall"></a>
7987 <h2>
7988 5.6 Updating the Cluster Software</h2>
7989 You can update the cluster software, but preserve the existing cluster
7990 database. Updating the cluster software on a system can take from 10 to
7991 20 minutes, depending on whether you must rebuild the kernel.
7992 <p>To update the cluster software while minimizing service downtime, follow
7993 these steps:
7994 <ol>
7995 <li>
7996 On a cluster system that you want to update, run the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
7997 utility and back up the current cluster database. For example:</li>
7998
7999 <pre>cluadmin> <b>cluster backup</b></pre>
8000
8001 <li>
8002 Stop the cluster software on the first cluster system that you want to
8003 update, by invoking the <b><font face="Courier New, Courier, mono">cluster
8004 stop </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8005 directory. For example:</li>
8006
8007 <pre># <b>service cluster stop</b></pre>
8008
8009 <li>
8010 Install the latest cluster software on the first cluster system that you
8011 want to update, by following the instructions described in <a href="#software-steps">Steps
8012 for Installing and Initializing the Cluster Software.</a> However, when
8013 prompted by the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8014 utility whether to use the existing cluster database, specify <b><font face="Courier New, Courier, mono">yes</font></b>.</li>
8015
8016 <br>&nbsp;
8017 <li>
8018 Stop the cluster software on the second cluster system that you want to
8019 update, by invoking the <b><font face="Courier New, Courier, mono">cluster
8020 stop </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8021 directory. At this point, no services are available.</li>
8022
8023 <br>&nbsp;
8024 <li>
8025 Start the cluster software on the first updated cluster system by invoking
8026 the <b><font face="Courier New, Courier, mono">cluster start</font></b>
8027 command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8028 directory. At this point, services may become available.</li>
8029
8030 <br>&nbsp;
8031 <li>
8032 Install the latest cluster software on the second cluster system that you
8033 want to update, by following the instructions described in <a href="#software-steps">Steps
8034 for Installing and Initializing the Cluster Software.</a> When prompted
8035 by the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8036 utility whether to use the existing cluster database, specify <b><font face="Courier New, Courier, mono">yes</font></b>.</li>
8037
8038 <br>&nbsp;
8039 <li>
8040 Start the cluster software on the second updated cluster system, by invoking
8041 the <b><font face="Courier New, Courier, mono">cluster start</font></b>
8042 command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8043 directory. For example: <b>service cluster start</b></li>
8044 </ol>
8045 <a NAME="cluster-reload"></a>
8046 <h2>
8047 5.7 Reloading the Cluster Database</h2>
8048 Invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
8049 utility and use the <b><font face="Courier New, Courier, mono">cluster
8050 reload </font></b>command to force the cluster to re-read the cluster database.
8051 For example:
8052 <pre>cluadmin> <b>cluster reload</b></pre>
8053 &nbsp;
8054 <p>&nbsp;
8055 <br>&nbsp;
8056 <br>&nbsp;
8057 <br>&nbsp;
8058 <br>&nbsp;
8059 <p><a NAME="cluster-name"></a>
8060 <h2>
8061 5.8 Changing the Cluster Name</h2>
8062 Invoke the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
8063 utility and use the <b><font face="Courier New, Courier, mono">cluster
8064 name</font></b> <b><i><font face="Courier New, Courier, mono">cluster_name</font></i></b>
8065 command to specify a name for the cluster. The cluster name is used in
8066 the display of the <b><font face="Courier New, Courier, mono">clustat </font></b>command.
8067 For example:
8068 <pre>cluadmin> <b>cluster name Accounting Team Fileserver
8069 Accounting Team Fileserver</b></pre>
8070
8071 <p><br><a NAME="cluster-init"></a>
8072 <h2>
8073 5.9 Reinitializing the Cluster</h2>
8074 In rare circumstances, you may want to reinitialize the cluster systems,
8075 services, and database. Be sure to back up the cluster database before
8076 reinitializing the cluster. See <a href="#cluster-backup">Backing Up and
8077 Restoring the Cluster Database</a> for information.
8078 <p>To completely reinitialize the cluster, follow these steps:
8079 <ol>
8080 <li>
8081 Disable all the running cluster services.</li>
8082
8083 <br>&nbsp;
8084 <li>
8085 Stop the cluster daemons on both cluster systems by invoking the <b><font face="Courier New, Courier, mono">cluster
8086 stop </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8087 directory on both cluster systems. For example:</li>
8088
8089 <pre># <b>service cluster stop</b></pre>
8090
8091 <li>
8092 Install the cluster software on both cluster systems. See <a href="#software-steps">Steps
8093 for Installing and Initializing the Cluster Software</a> for information.</li>
8094
8095 <br>&nbsp;
8096 <li>
8097 On one cluster system, run the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8098 utility. When prompted whether to use the existing cluster database, specify
8099 no.&nbsp; This will delete any state information and cluster database from
8100 the quorum partitions.</li>
8101
8102 <br>&nbsp;
8103 <li>
8104 After <b><font face="Courier New, Courier, mono">cluconfig</font></b> completes,
8105 follow the utility's instruction to run the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8106 command on the other cluster system. For example:</li>
8107
8108 <pre># <b>/sbin/cluconfig --init=/dev/raw/raw1</b></pre>
8109
8110 <li>
8111 Start the cluster daemons by invoking the <b><font face="Courier New, Courier, mono">cluster
8112 start </font></b>command located in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8113 directory on both cluster systems. For example:</li>
8114
8115 <pre># <b>service cluster start</b></pre>
8116 </ol>
8117 <a NAME="cluster-remove"></a>
8118 <h2>
8119 5.10 Removing a Cluster Member</h2>
8120 In some cases, you may want to temporarily remove a member system from
8121 the cluster. For example, if a cluster system experiences a hardware failure,
8122 you may want to reboot the system, but prevent it from rejoining the cluster,
8123 in order to perform maintenance on the system.
8124 <p>If you are running a Red Hat distribution, use the <b><font face="Courier New, Courier, mono">chkconfig</font></b>
8125 utility to be able to boot a cluster system, without allowing it to rejoin
8126 the cluster. For example:
8127 <pre># <b>chkconfig --del cluster</b></pre>
8128 When you want the system to rejoin the cluster, use the following command:
8129 <pre># <b>chkconfig --add cluster</b></pre>
8130 You can then reboot the system or run the cluster start command located
8131 in the System V <b><font face="Courier New, Courier, mono">init</font></b>
8132 directory. For example:
8133 <pre># <b>service cluster start</b></pre>
8134
8135 <p><br><a NAME="diagnose"></a>
8136 <h2>
8137 5.11 Diagnosing and Correcting Problems in a Cluster</h2>
8138 To ensure that you can identify any problems in a cluster, you must enable
8139 event logging. In addition, if you encounter problems in a cluster, be
8140 sure to set the severity level to <b><font face="Courier New, Courier, mono">debug</font></b>
8141 for the cluster daemons. This will log descriptive messages that may help
8142 you solve problems. Once you have resolved any problems, you should reset
8143 the debug level back down to its default value of <b>info</b> to avoid
8144 excessively large log message files from generating.
8145 <p>If you have problems while running the <b><font face="Courier New, Courier, mono">cluadmin</font></b>
8146 utility (for example, you cannot enable a service), set the severity level
8147 for the <b><font face="Courier New, Courier, mono">clusvcmgrd</font></b>
8148 daemon to <b><font face="Courier New, Courier, mono">debug</font></b>.
8149 This will cause debugging messages to be displayed while you are running
8150 the <b><font face="Courier New, Courier, mono">cluadmin</font></b> utility.
8151 See <a href="#cluster-logging">Modifying Cluster Event Logging</a> for
8152 more information.
8153 <p>Use the following table to diagnose and correct problems in a cluster.
8154 <br>&nbsp;
8155 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="95%" >
8156 <tr ALIGN=CENTER VALIGN=CENTER>
8157 <td WIDTH="19%" HEIGHT="42">
8158 <center><b><font size=+1>Problem</font></b></center>
8159 </td>
8160
8161 <td WIDTH="19%" HEIGHT="42">
8162 <center><b><font size=+1>Symptom</font></b></center>
8163 </td>
8164
8165 <td WIDTH="62%" HEIGHT="42">
8166 <center><b><font size=+1>Solution</font></b></center>
8167 </td>
8168 </tr>
8169
8170 <tr ALIGN=LEFT VALIGN=TOP>
8171 <td WIDTH="19%">SCSI bus not terminated</td>
8172
8173 <td WIDTH="19%">SCSI errors appear in the log file</td>
8174
8175 <td WIDTH="62%">Each SCSI bus must be terminated only at the beginning
8176 and end of the bus. Depending on the bus configuration, you may need to
8177 enable or disable termination in host bus adapters, RAID controllers, and
8178 storage enclosures. If you want to support hot plugging, you must use external
8179 termination to terminate a SCSI bus.&nbsp;
8180 <p>In addition, be sure that no devices are connected to a SCSI bus using
8181 a stub that is longer than 0.1 meter.
8182 <p>See <a href="#hardware-storage">Configuring Shared Disk Storage</a>
8183 and <a href="#scsi-term">SCSI Bus Termination</a> for information about
8184 terminating different types of SCSI buses.</td>
8185 </tr>
8186
8187 <tr ALIGN=LEFT VALIGN=TOP>
8188 <td WIDTH="19%">SCSI bus length greater than maximum limit</td>
8189
8190 <td WIDTH="19%">SCSI errors appear in the log file</td>
8191
8192 <td WIDTH="62%">Each type of SCSI bus must adhere to restrictions on length,
8193 as described in <a href="#scsi-length">SCSI Bus Length</a>.
8194 <p>In addition, ensure that no single-ended devices are connected to the
8195 LVD SCSI bus, because this will cause the entire bus to revert to a single-ended
8196 bus, which has more severe length restrictions than a differential bus.</td>
8197 </tr>
8198
8199 <tr ALIGN=LEFT VALIGN=TOP>
8200 <td WIDTH="19%">SCSI identification numbers not unique</td>
8201
8202 <td WIDTH="19%">SCSI errors appear in the log file</td>
8203
8204 <td WIDTH="62%">Each device on a SCSI bus must have a unique identification
8205 number. If you have a multi-initiator SCSI bus, you must modify the default
8206 SCSI identification number (7) for one of the host bust adapters connected
8207 to the bus, and ensure that all disk devices have unique identification
8208 numbers. See <a href="#scsi-ids">SCSI Identification Numbers</a> for more
8209 information.&nbsp;</td>
8210 </tr>
8211
8212 <tr ALIGN=LEFT VALIGN=TOP>
8213 <td WIDTH="19%">SCSI commands timing out before completion</td>
8214
8215 <td WIDTH="19%">SCSI errors appear in the log file</td>
8216
8217 <td WIDTH="62%">The prioritized arbitration scheme on a SCSI bus can result
8218 in low-priority devices being locked out for some period of time. This
8219 may cause commands to time out, if a low-priority storage device, such
8220 as a disk, is unable to win arbitration and complete a command that a host
8221 has queued to it. For some workloads, you may be able to avoid this problem
8222 by assigning low-priority SCSI identification numbers to the host bus adapters.
8223 <p>See <a href="#scsi-ids">SCSI Identification Numbers</a> for more information.</td>
8224 </tr>
8225
8226 <tr ALIGN=LEFT VALIGN=TOP>
8227 <td WIDTH="19%">Mounted quorum partition</td>
8228
8229 <td WIDTH="19%">Messages indicating checksum errors on a quorum partition
8230 appear in the log file</td>
8231
8232 <td WIDTH="62%">Be sure that the quorum partition raw devices are used
8233 only for cluster state information. They cannot be used for cluster services
8234 or for non-cluster purposes, and cannot contain a file system. See <a href="#state-partitions">Configuring
8235 the Quorum Partitions</a> for more information.&nbsp;
8236 <p>These messages could also indicate that the underlying block device
8237 special file for the quorum partition has been erroneously used for non-cluster
8238 purposes.&nbsp;</td>
8239 </tr>
8240
8241 <tr ALIGN=LEFT VALIGN=TOP>
8242 <td WIDTH="19%" HEIGHT="111">Service file system is unclean</td>
8243
8244 <td WIDTH="19%" HEIGHT="111">A disabled service cannot be enabled&nbsp;</td>
8245
8246 <td WIDTH="62%" HEIGHT="111">Manually run a checking program such as <b><font face="Courier New, Courier, mono">fsck</font></b>.
8247 Then, enable the service.&nbsp;
8248 <p>Note that the cluster infrastructure does by defaule run <b>fsck </b>with
8249 the <b>-p</b> option to automatically repair file system inconsistencies.
8250 For particularly egregious error types you may be required to manually
8251 initiate filesystem repair options.</td>
8252 </tr>
8253
8254 <tr ALIGN=LEFT VALIGN=TOP>
8255 <td WIDTH="19%">Quorum partitions not set up correctly</td>
8256
8257 <td WIDTH="19%">Messages indicating that a quorum partition cannot be accessed
8258 appear in the log file</td>
8259
8260 <td WIDTH="62%">Run the <b><font face="Courier New, Courier, mono">cludiskutil
8261 -t </font></b>command to check that the quorum partitions are accessible.
8262 If the command succeeds, run the <b><font face="Courier New, Courier, mono">cludiskutil
8263 -p</font></b> command on both cluster systems. If the output is different
8264 on the systems, the quorum partitions do not point to the same devices
8265 on both systems. Check to make sure that the raw devices exist and are
8266 correctly specified in the <b><font face="Courier New, Courier, mono">/etc/sysconfig/rawdevices</font></b>
8267 file. See <a href="#state-partitions">Configuring the Quorum Partitions</a>
8268 for more information.&nbsp;
8269 <p>These messages could also indicate that you did not specify <b><font face="Courier New, Courier, mono">yes</font></b>
8270 when prompted by the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8271 utility to initialize the quorum partitions. To correct this problem, run
8272 the utility again.&nbsp;</td>
8273 </tr>
8274
8275 <tr ALIGN=LEFT VALIGN=TOP>
8276 <td WIDTH="19%" HEIGHT="87">Cluster service operation fails</td>
8277
8278 <td WIDTH="19%" HEIGHT="87">Messages indicating the operation failed appear
8279 on the console or in the log file&nbsp;</td>
8280
8281 <td WIDTH="62%" HEIGHT="87">There are many different reasons for the failure
8282 of a service operation (for example, a service stop or start). To help
8283 you identify the cause of the problem, set the severity level for the cluster
8284 daemons to <b><font face="Courier New, Courier, mono">debug</font></b>
8285 in order to log descriptive messages. Then, retry the operation and examine
8286 the log file. See <a href="#cluster-logging">Modifying Cluster Event Logging</a>
8287 for more information.</td>
8288 </tr>
8289
8290 <tr ALIGN=LEFT VALIGN=TOP>
8291 <td WIDTH="19%" HEIGHT="151">Cluster service stop fails because a file
8292 system cannot be unmounted</td>
8293
8294 <td WIDTH="19%" HEIGHT="151">Messages indicating the operation failed appear
8295 on the console or in the log file&nbsp;</td>
8296
8297 <td WIDTH="62%" HEIGHT="151">Use the <b><font face="Courier New, Courier, mono">fuser</font></b>
8298 and <b><font face="Courier New, Courier, mono">ps</font></b> commands to
8299 identify the processes that are accessing the file system. Use the <b><font face="Courier New, Courier, mono">kill</font></b>
8300 command to stop the processes. You can also use the <b><font face="Courier New, Courier, mono">lsof
8301 -t <i>file_system</i></font></b> command to display the identification
8302 numbers for the processes that are accessing the specified file system.
8303 You can pipe the output to the <b><font face="Courier New, Courier, mono">kill</font></b>
8304 command.
8305 <p>To avoid this problem, be sure that only cluster-related processes can
8306 access shared storage data. In addition, you may want to modify the service
8307 and enable forced unmount for the file system. This enables the cluster
8308 service to unmount a file system even if it is being accessed by an application
8309 or user.</td>
8310 </tr>
8311
8312 <tr ALIGN=LEFT VALIGN=TOP>
8313 <td WIDTH="19%" HEIGHT="71">Incorrect entry in the cluster database</td>
8314
8315 <td WIDTH="19%" HEIGHT="71">Cluster operation is impaired</td>
8316
8317 <td WIDTH="62%" HEIGHT="71">The <b>cluadmin </b>utility can be used to
8318 examine and modify service configuration.&nbsp; Additionally, the <b>cluconfig
8319 </b>utility
8320 is used to modify cluster parameters.</td>
8321 </tr>
8322
8323 <tr ALIGN=LEFT VALIGN=TOP>
8324 <td WIDTH="19%" HEIGHT="265">Incorrect Ethernet heartbeat entry in the
8325 cluster database or <b><font face="Courier New, Courier, mono">/etc/hosts</font></b>
8326 file</td>
8327
8328 <td WIDTH="19%" HEIGHT="265">Cluster status indicates that a Ethernet heartbeat
8329 channel is <b><font face="Courier New, Courier, mono">OFFLINE</font></b>
8330 even though the interface is valid</td>
8331
8332 <td WIDTH="62%" HEIGHT="265">You can examine and, modify the cluster configuration
8333 by running the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8334 utility, as specified in <a href="#cluster-config">Modifying the Cluster
8335 Configuration</a>, and correct the problem.&nbsp;
8336 <p>In addition, be sure that you can use the <b><font face="Courier New, Courier, mono">ping</font></b>
8337 command to send a packet to all the network interfaces used in the cluster.</td>
8338 </tr>
8339
8340 <tr ALIGN=LEFT VALIGN=TOP>
8341 <td WIDTH="19%" HEIGHT="50">Loose cable connection to power switch</td>
8342
8343 <td WIDTH="19%" HEIGHT="50">Power switch status is <b><font face="Courier New, Courier, mono">Timeout</font></b></td>
8344
8345 <td WIDTH="62%" HEIGHT="50">Check the serial cable connection.</td>
8346 </tr>
8347
8348 <tr ALIGN=LEFT VALIGN=TOP>
8349 <td WIDTH="19%" HEIGHT="95">Power switch serial port incorrectly specified
8350 in the cluster database&nbsp;</td>
8351
8352 <td WIDTH="19%" HEIGHT="95">Power switch status indicates a problem&nbsp;</td>
8353
8354 <td WIDTH="62%" HEIGHT="95">You can examine the current settings and modify
8355 the cluster configuration by running the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8356 utility, as specified in <a href="#cluster-config">Modifying the Cluster
8357 Configuration</a>, and correct the problem.</td>
8358 </tr>
8359
8360 <tr ALIGN=LEFT VALIGN=TOP>
8361 <td WIDTH="19%" HEIGHT="50">Heartbeat channel problem&nbsp;</td>
8362
8363 <td WIDTH="19%" HEIGHT="50">Heartbeat channel status is <b><font face="Courier New, Courier, mono">OFFLINE</font></b></td>
8364
8365 <td WIDTH="62%" HEIGHT="50">You can examine the current settings and modify
8366 the cluster configuration by running the <b><font face="Courier New, Courier, mono">cluconfig</font></b>
8367 utility, as specified in <a href="#cluster-config">Modifying the Cluster
8368 Configuration</a>, and correct the problem.
8369 <p>Verify that the correct type of cable is used for each heartbeat channel
8370 connection.&nbsp;
8371 <p>Verify that you can "ping" each cluster system over the network interface
8372 for each Ethernet heartbeat channel.&nbsp;</td>
8373 </tr>
8374 </table>
8375
8376 <p>
8377 <hr noshade width="80%"><a NAME="supplement"></a>
8378 <h1>
8379 A Supplementary Hardware Information</h1>
8380 The information in the following sections can help you set up a cluster
8381 hardware configuration. In some cases, the information is vendor specific.
8382 <ul>
8383 <li>
8384 <a href="#rps-10">Setting Up an RPS-10 Power Switch</a></li>
8385
8386 <li>
8387 <a href="#scsi-reqs">SCSI Bus Configuration Requirements</a></li>
8388
8389 <li>
8390 <a href="#hba">Host Bus Adapter Features and Configuration Requirements</a></li>
8391
8392 <li>
8393 <a href="#adaptec">Adaptec Host Bus Adapter Requirement</a></li>
8394 </ul>
8395
8396 <br>&nbsp;
8397 <br>&nbsp;
8398 <p><a NAME="power-setup"></a>
8399 <h2>
8400 A.2 Setting Up&nbsp; Power Switches</h2>
8401
8402 <h3>
8403 <a NAME="rps-10"></a></h3>
8404
8405 <h3>
8406 Setting up RPS-10 Power Switches</h3>
8407 If you are using an RPS-10 Series power switch in your cluster, you must:
8408 <ul>
8409 <li>
8410 Set the rotary address on both power switches to 0. Be sure that the switch
8411 is positioned correctly and is not between settings.</li>
8412
8413 <br>&nbsp;
8414 <li>
8415 Toggle the four SetUp switches on both power switches, as follows:</li>
8416
8417 <br>&nbsp;
8418 <table BORDER CELLSPACING=0 CELLPADDING=3 WIDTH="49%" >
8419 <tr ALIGN=CENTER>
8420 <td WIDTH="15%"><b>Switch</b></td>
8421
8422 <td WIDTH="28%"><b>Function</b></td>
8423
8424 <td WIDTH="25%"><b>Up Position</b></td>
8425
8426 <td WIDTH="32%"><b>Down Position</b></td>
8427 </tr>
8428
8429 <tr>
8430 <td ALIGN=CENTER WIDTH="15%" HEIGHT="24">1</td>
8431
8432 <td WIDTH="28%" HEIGHT="24">Data rate</td>
8433
8434 <td ALIGN=CENTER WIDTH="25%" HEIGHT="24">&nbsp;</td>
8435
8436 <td ALIGN=CENTER WIDTH="32%" HEIGHT="24">X</td>
8437 </tr>
8438
8439 <tr>
8440 <td ALIGN=CENTER WIDTH="15%">2</td>
8441
8442 <td WIDTH="28%">Toggle delay</td>
8443
8444 <td ALIGN=CENTER WIDTH="25%">&nbsp;</td>
8445
8446 <td ALIGN=CENTER WIDTH="32%">X</td>
8447 </tr>
8448
8449 <tr>
8450 <td ALIGN=CENTER WIDTH="15%">3</td>
8451
8452 <td WIDTH="28%">Power up default</td>
8453
8454 <td ALIGN=CENTER WIDTH="25%">X</td>
8455
8456 <td ALIGN=CENTER WIDTH="32%">&nbsp;</td>
8457 </tr>
8458
8459 <tr>
8460 <td ALIGN=CENTER WIDTH="15%">4</td>
8461
8462 <td WIDTH="28%">Unused</td>
8463
8464 <td ALIGN=CENTER WIDTH="25%">&nbsp;</td>
8465
8466 <td ALIGN=CENTER WIDTH="32%">X</td>
8467 </tr>
8468 </table>
8469 </ul>
8470
8471 <ul>
8472 <li>
8473 Ensure that the serial port device special file (for example, <b><font face="Courier New, Courier, mono">/dev/ttyS1</font></b>)
8474 that is specified in the <b><font face="Courier New, Courier, mono">/etc/cluster.conf</font></b>
8475 file corresponds to the serial port to which the power switch's serial
8476 cable is connected.</li>
8477
8478 <br>&nbsp;
8479 <li>
8480 Connect the power cable for each cluster system to its own power switch.</li>
8481
8482 <br>&nbsp;
8483 <li>
8484 Use null modem cables to connect each cluster system to the serial port
8485 on the power switch that provides power to the other cluster system.</li>
8486 </ul>
8487 The following figure shows an example of an RPS-10 Series power switch
8488 configuration.
8489 <h4>
8490 RPS-10 Power Switch Hardware Configuration</h4>
8491 <img SRC="powerswitch.gif" height=259 width=360>
8492 <p>See the RPS-10 documentation supplied by the vendor for additional installation
8493 information. Note that the information provided in this document supersedes
8494 the vendor information.
8495 <br>&nbsp;
8496 <h3>
8497 <a NAME="power-wti-nps"></a></h3>
8498
8499 <h3>
8500 Setting up WTI NPS Power Switches</h3>
8501 The WTI NPS-115 and NPS-230 power switch is a network attached device.&nbsp;
8502 Essentially it is a power strip with network connectivity enabling power
8503 cycling of individual outlets.&nbsp; Only 1 NPS is needed within the cluster
8504 (unlike the RPS-10 model where a separate switch per cluster member is
8505 required).
8506 <p>Since there is no independent means whereby the cluster software can
8507 verify that you have plugged each cluster member system into the appropriate
8508 plug on the back of the NPS power switch, please take care to ensure correct
8509 setup.&nbsp; Failure to do so will cause the cluster software to incorrectly
8510 conclude a successful power cycle has occurred.
8511 <p>When setting up the NPS switch the following configuration guidelines
8512 should be followed.
8513 <p>When configuring the power switch itself:
8514 <ul>
8515 <li>
8516 You must assign a "System Password" (under the "General Parameters"menu).&nbsp;
8517 Note: this password is stored in clear text in the cluster configuration
8518 file, so choose a password which differs from your system's password.&nbsp;
8519 (Although, the file permissions for that file /etc/cluster.conf are only
8520 readable by root.)</li>
8521
8522 <li>
8523 Do not assign a password under the "Plug Parameters".</li>
8524
8525 <li>
8526 Assign system names to the Plug Parameters, (eg <i>clu1</i> to plug 1,<i>clu2</i>
8527 to plug 2 - assuming these are the cluster member names).</li>
8528 </ul>
8529
8530 <p><br>When running <b>cluconfig</b> to specify power switch parameters:
8531 <br>&nbsp;
8532 <ul>
8533 <li>
8534 Specify a switch type of wti_nps.</li>
8535
8536 <li>
8537 Specify the password you assigned to the NPS switch (ref step 1 in prior
8538 section).</li>
8539
8540 <li>
8541 When prompted for the plug/port number, specify the same name as assigned
8542 in step 3 in prior section.</li>
8543 </ul>
8544 Note: we have observed that the NPS power switch may become unresponsive
8545 when placed on networks which have high occurances of broadcast or multicast
8546 packets.&nbsp; In these cases you may have to isolate the power switch
8547 to a private subnet.
8548 <p>The NPS-115 power switch is has a very useful feature which can accommodate
8549 power cycling cluster members with dual power supplies.&nbsp; The NPS-115
8550 consists of 2 banks of power outlets, each of which is independently powered
8551 and has 4 plugs.&nbsp; Each power plug of the NPS-115 gets plugged into
8552 a separate power source (presumably a separate UPS).&nbsp; For cluster
8553 members with dual power supplies, you plug their power cords into an outlet
8554 in each bank.&nbsp; Then when you configure the NPS-115 and assign ports,
8555 simply assign the same name to outlets in each bank that you have plugged
8556 the corresponding cluster member into.&nbsp; For example, suppose the cluster
8557 members were clu3 and clu4, where clu3 is plugged into outlets 1 and 5,
8558 and clu4 is plugged into outlets 2 and 6:
8559 <p>Plug | Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8560 | Status&nbsp; | Boot Delay | Password&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8561 | Default |
8562 <br>-----+------------------+---------+------------+------------------+---------+
8563 <br>&nbsp;1&nbsp;&nbsp; | clu3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8564 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp;
8565 | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp;
8566 |
8567 <br>&nbsp;2&nbsp;&nbsp; | clu4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8568 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp;
8569 | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp;
8570 |
8571 <br>&nbsp;3&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;
8572 ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8573 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |
8574 <br>&nbsp;4&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;
8575 ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8576 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |
8577 <br>&nbsp;5&nbsp;&nbsp; | clu3&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8578 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp;
8579 | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp;
8580 |
8581 <br>&nbsp;6&nbsp;&nbsp; | clu4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8582 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp;
8583 | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp;
8584 |
8585 <br>&nbsp;7&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;
8586 ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8587 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |
8588 <br>&nbsp;8&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp;
8589 ON&nbsp;&nbsp;&nbsp; |&nbsp;&nbsp; 5&nbsp; sec&nbsp;&nbsp; | (undefined)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
8590 |&nbsp;&nbsp; ON&nbsp;&nbsp;&nbsp; |
8591 <br>-----+------------------+---------+------------+------------------+---------+
8592 <p>By specifying the same name to multiple outlets, in response to a power
8593 cycle command, all outlets with the same name will be power cycled.&nbsp;
8594 In this manner, a cluster member with dual power supplies can be successfully
8595 power cycled.&nbsp; Under this dual configuration, the parameters specified
8596 to <b>cluconfig </b>are the same as the single configuration described
8597 above.
8598 <h3>
8599 Setting up Baytech Power Switches</h3>
8600 The following information pertains to the RPC-3 and PRC-5 power switches.
8601 <p>The Baytech power switch is a network attached device.&nbsp; Essentially
8602 it is a power strip with network connectivity enabling power cycling of
8603 individual outlets.&nbsp; Only 1 Baytech switch is needed within the cluster
8604 (unlike the RPS-10 model where a separate switch per cluster member is
8605 required).
8606 <p>Since there is no independent means whereby the cluster software can
8607 verify that you have plugged each cluster member system into the appropriate
8608 plug on the back of the Baytech power switch, please take care to ensure
8609 correct setup.&nbsp; Failure to do so will cause the cluster software to
8610 incorrectly conclude a successful power cycle has occurred.
8611 <p>When setting up the Baytech switch the following configuration guidelines
8612 should be followed.
8613 <p>When configuring the Baytech power switch itself:
8614 <ul>
8615 <li>
8616 Usinga serial connection, assign the IP address related parameters.</li>
8617
8618 <li>
8619 You must assign a user name and password (under the "Manage Users"menu).&nbsp;
8620 Note: this password is stored in clear text in the cluster configuration
8621 file, so choose a password which differs from your system's password.&nbsp;
8622 (Although, the file permissions for that file /etc/cluster.conf are only
8623 readable by root.)</li>
8624
8625 <li>
8626 To assign system names to the corresponding outlets, go to the "Configuration"
8627 menu, followed by the "Outlets" menu., (eg <i>clu1</i> to outlet 1,<i>clu2</i>
8628 to outlet 2 - assuming these are the cluster member names).</li>
8629 </ul>
8630
8631 <p><br>When running <b>cluconfig</b> to specify power switch parameters:
8632 <br>&nbsp;
8633 <ul>
8634 <li>
8635 Specify a switch type of baytech.</li>
8636
8637 <li>
8638 Specify the password you assigned to the Baytech switch (ref step 2 in
8639 prior section).</li>
8640
8641 <li>
8642 When prompted for the plug/port number, specify the same name as assigned
8643 in step 3 in prior section.</li>
8644 </ul>
8645
8646 <h3>
8647 <a NAME="power-other"></a></h3>
8648
8649 <h3>
8650 Other Network Power Switches</h3>
8651 The cluster software includes support for a range of power switch types.&nbsp;
8652 This range of power switch module support originated from developers at
8653 Mission Critical Linux, Inc. and as part of the open source Linux-HA project.&nbsp;
8654 Time and hardware resource constraints did not allow us to fully test the
8655 complete range of switch types.&nbsp; As such the associated power switch
8656 STONITH modules are considered latent features.&nbsp; Examples of these
8657 other power switch modules include:
8658 <ul>
8659 <li>
8660 APC Master Switch,&nbsp; <a href="http://www.apc.com">www.apc.com</a>&nbsp;
8661 Note: we have observed that the Master Switch may become unresponsive when
8662 placed on networks which have high occurances of broadcast or multicast
8663 packets.&nbsp; In these cases you may have to isolate the power switch
8664 to a private subnet.</li>
8665
8666 <li>
8667 APC Serial On/Off Switch (partAP9211),&nbsp; <a href="http://www.apc.com">www.apc.com</a>&nbsp;
8668 Note: this switch type does not provide a means for the cluster to query
8669 its status.&nbsp; Therefore the cluster always assumes it it connected
8670 and operational.</li>
8671
8672 <li>
8673 Baytech RPC-3 and RPC-5,&nbsp; <a href="http://www.baytech.net">www.baytech.net</a>&nbsp;
8674 Note: this power switch performs well on networks which include high frequency
8675 of broadcast and multicast packets.</li>
8676 </ul>
8677
8678 <p><br><a NAME="scsi-reqs"></a>
8679 <h2>
8680 A.3 SCSI Bus Configuration Requirements</h2>
8681 SCSI buses must adhere to a number of configuration requirements in order
8682 to operate correctly. Failure to adhere to these requirements will adversely
8683 affect cluster operation and application and data availability.
8684 <p>You must adhere to the following <b>SCSI bus configuration requirements</b>:
8685 <ul>
8686 <li>
8687 Buses must be terminated at each end. In addition, how you terminate a
8688 SCSI bus affects whether you can use hot plugging. See <a href="#scsi-term">SCSI
8689 Bus Termination</a> for more information.</li>
8690
8691 <br>&nbsp;
8692 <li>
8693 TERMPWR (terminator power) must by provided by the host bus adapters connected
8694 to a bus. See <a href="#scsi-term">SCSI Bus Termination</a> for more information.</li>
8695
8696 <br>&nbsp;
8697 <li>
8698 Active SCSI terminators must be used in a multi-initiator bus. See <a href="#scsi-term">SCSI
8699 Bus Termination</a> for more information.</li>
8700
8701 <br>&nbsp;
8702 <li>
8703 Buses must not extend beyond the maximum length restriction for the bus
8704 type. Internal cabling must be included in the length of the SCSI bus.
8705 See <a href="#scsi-length">SCSI Bus Length</a> for more information.</li>
8706
8707 <br>&nbsp;
8708 <li>
8709 All devices (host bus adapters and disks) on a bus must have unique SCSI
8710 identification numbers. See <a href="#scsi-ids">SCSI Identification Numbers</a>
8711 for more information.</li>
8712
8713 <br>&nbsp;
8714 <li>
8715 The Linux device name for each shared SCSI device must be the same on each
8716 cluster system. For example, a device named <b><font face="Courier New, Courier, mono">/dev/sdc</font></b>
8717 on one cluster system must be named <b><font face="Courier New, Courier, mono">/dev/sdc</font></b>
8718 on the other cluster system. You can usually ensure that devices are named
8719 the same by using identical hardware for both cluster systems.</li>
8720
8721 <br>&nbsp;
8722 <li>
8723 Bus resets must be disabled for the host bus adapters used in a cluster.</li>
8724 </ul>
8725 To set SCSI identification numbers, disable host bus adapter termination,
8726 and disable bus resets, use the system's configuration utility. When the
8727 system boots, a message is displayed describing how to start the utility.
8728 For example, you may be instructed to press Ctrl-A, and follow the prompts
8729 to perform a particular task. To set storage enclosure and RAID controller
8730 termination, see the vendor documentation. See <a href="#scsi-term">SCSI
8731 Bus Termination</a> and <a href="#scsi-ids">SCSI Identification Numbers</a>
8732 for more information.
8733 <p>See <a href="http://www.scsita.org" target="_blank">www.scsita.org</a>
8734 and the following sections for detailed information about SCSI bus requirements.
8735 <br>&nbsp;
8736 <p><a NAME="scsi-term"></a>
8737 <h3>
8738 A.3.1 SCSI Bus Termination</h3>
8739 A SCSI bus is an electrical path between two terminators. A device (host
8740 bus adapter, RAID controller, or disk) attaches to a SCSI bus by a short
8741 <b>stub</b>,
8742 which is an unterminated bus segment that usually must be less than 0.1
8743 meter in length.
8744 <p>Buses must have only two terminators located at the ends of the bus.
8745 Additional terminators, terminators that are not at the ends of the bus,
8746 or long stubs will cause the bus to operate incorrectly. Termination for
8747 a SCSI bus can be provided by the devices connected to the bus or by external
8748 terminators, if the internal (onboard) device termination can be disabled.
8749 <p>Terminators are powered by a SCSI power distribution wire (or signal),
8750 TERMPWR, so that the terminator can operate as long as there is one powering
8751 device on the bus. In a cluster, TERMPWR must be provided by the host bus
8752 adapters, instead of the disks in the enclosure. You can usually disable
8753 TERMPWR in a disk by setting a jumper on the drive. See the disk drive
8754 documentation for information.
8755 <p>In addition, there are two types of SCSI terminators. Active terminators
8756 provide a voltage regulator for TERMPWR, while passive terminators provide
8757 a resistor network between TERMPWR and ground. Passive terminators are
8758 also susceptible to fluctuations in TERMPWR. Therefore, it is recommended
8759 that you use active terminators in a cluster.
8760 <p>For maintenance purposes, it is desirable for a storage configuration
8761 to support hot plugging (that is, the ability to disconnect a host bus
8762 adapter from a SCSI bus, while maintaining bus termination and operation).
8763 However, if you have a single-initiator SCSI bus, hot plugging is not necessary
8764 because the private bus does not need to remain operational when you remove
8765 a host. See <a href="#multiinit">Setting Up a Multi-Initiator SCSI Bus
8766 Configuration</a> for examples of hot plugging configurations.
8767 <p>If you have a multi-initiator SCSI bus, you must adhere to the following
8768 requirements for hot plugging:
8769 <ul>
8770 <li>
8771 SCSI devices, terminators, and cables must adhere to stringent hot plugging
8772 requirements described in the latest SCSI specifications described in SCSI
8773 Parallel Interface-3 (SPI-3), Annex D. You can obtain this document from<a href="http://www.t10.org" target="_blank">www.t10.org</a>.</li>
8774
8775 <br>&nbsp;
8776 <li>
8777 Internal host bus adapter termination must be disabled. Not all adapters
8778 support this feature.</li>
8779
8780 <br>&nbsp;
8781 <li>
8782 If a host bus adapter is at the end of the SCSI bus, an external terminator
8783 must provide the bus termination.</li>
8784
8785 <br>&nbsp;
8786 <li>
8787 The stub that is used to connect a host bus adapter to a SCSI bus must
8788 be less than 0.1 meter in length. Host bus adapters that use a long cable
8789 inside the system enclosure to connect to the bulkhead cannot support hot
8790 plugging. In addition, host bus adapters that have an internal connector
8791 and a cable that extends the bus inside the system enclosure cannot support
8792 hot plugging. Note that any internal cable must be included in the length
8793 of the SCSI bus.</li>
8794 </ul>
8795 When disconnecting a device from a single-initiator SCSI bus or from a
8796 multi-initiator SCSI bus that supports hot plugging, follow these guidelines:
8797 <ul>
8798 <li>
8799 Unterminated SCSI cables must not be connected to an operational host bus
8800 adapter or storage device.</li>
8801
8802 <br>&nbsp;
8803 <li>
8804 Connector pins must not bend or touch an electrical conductor while the
8805 SCSI cable is disconnected.</li>
8806
8807 <br>&nbsp;
8808 <li>
8809 To disconnect a host bus adapter from a single-initiator bus, you must
8810 disconnect the SCSI cable first from the RAID controller and then from
8811 the adapter. This ensures that the RAID controller is not exposed to any
8812 erroneous input.</li>
8813
8814 <br>&nbsp;
8815 <li>
8816 Protect connector pins from electrostatic discharge while the SCSI cable
8817 is disconnected by wearing a grounded anti-static wrist guard and physically
8818 protecting the cable ends from contact with other objects.</li>
8819
8820 <br>&nbsp;
8821 <li>
8822 Do not remove a device that is currently participating in any SCSI bus
8823 transactions.</li>
8824 </ul>
8825 To enable or disable an adapter's internal termination, use the system
8826 BIOS utility. When the system boots, a message is displayed describing
8827 how to start the utility. For example, you may be instructed to press Ctrl-A.
8828 Follow the prompts for setting the termination. At this point, you can
8829 also set the SCSI identification number, as needed, and disable SCSI bus
8830 resets. See <a href="#scsi-ids">SCSI Identification Numbers</a> for more
8831 information.
8832 <p>To set storage enclosure and RAID controller termination, see the vendor
8833 documentation.
8834 <br>&nbsp;
8835 <br>&nbsp;
8836 <p><a NAME="scsi-length"></a>
8837 <h3>
8838 A.3.2 SCSI Bus Length</h3>
8839 A SCSI bus must adhere to length restrictions for the bus type. Buses that
8840 do not adhere to these restrictions will not operate properly. The length
8841 of a SCSI bus is calculated from one terminated end to the other, and must
8842 include any cabling that exists inside the system or storage enclosures.
8843 <p>A cluster supports LVD (low voltage differential) buses. The maximum
8844 length of a single-initiator LVD bus is 25 meters. The maximum length of
8845 a multi-initiator LVD bus is 12 meters. According to the SCSI standard,
8846 a single-initiator LVD bus is a bus that is connected to only two devices,
8847 each within 0.1 meter from a terminator. All other buses are defined as
8848 multi-initiator buses.
8849 <p>Do not connect any single-ended devices to a LVD bus, or the bus will
8850 convert to a single-ended bus, which has a much shorter maximum length
8851 than a differential bus.
8852 <br>&nbsp;
8853 <br>&nbsp;
8854 <p><a NAME="scsi-ids"></a>
8855 <h3>
8856 A.3.3 SCSI Identification Numbers</h3>
8857 Each device on a SCSI bus must have a unique SCSI identification number.
8858 Devices include host bus adapters, RAID controllers, and disks.
8859 <p>The number of devices on a SCSI bus depends on the data path for the
8860 bus. A cluster supports wide SCSI buses, which have a 16-bit data path
8861 and support a maximum of 16 devices. Therefore, there are sixteen possible
8862 SCSI identification numbers that you can assign to the devices on a bus.
8863 <p>In addition, SCSI identification numbers are prioritized. Use the following
8864 priority order to assign SCSI identification numbers:
8865 <p>7 - 6 - 5 - 4 - 3 - 2 - 1 - 0 - 15 - 14 - 13 - 12 - 11 - 10 - 9 - 8
8866 <p>The previous order specifies that 7 is the highest priority, and 8 is
8867 the lowest priority. The default SCSI identification number for a host
8868 bus adapter is 7, because adapters are usually assigned the highest priority.
8869 On a multi-initiator bus, be sure to change the SCSI identification number
8870 of one of the host bus adapters to avoid duplicate values.
8871 <p>A disk in a JBOD enclosure is assigned a SCSI identification number
8872 either manually (by setting jumpers on the disk) or automatically (based
8873 on the enclosure slot number). You can assign identification numbers for
8874 logical units in a RAID subsystem by using the RAID management interface.
8875 <p>To modify an adapter's SCSI identification number, use the system BIOS
8876 utility. When the system boots, a message is displayed describing how to
8877 start the utility. For example, you may be instructed to press Ctrl-A,
8878 and follow the prompts for setting the SCSI identification number. At this
8879 point, you can also enable or disable the adapter's internal termination,
8880 as needed, and disable SCSI bus resets. See <a href="#scsi-term">SCSI Bus
8881 Termination</a> for more information.
8882 <p>The prioritized arbitration scheme on a SCSI bus can result in low-priority
8883 devices being locked out for some period of time. This may cause commands
8884 to time out, if a low-priority storage device, such as a disk, is unable
8885 to win arbitration and complete a command that a host has queued to it.
8886 For some workloads, you may be able to avoid this problem by assigning
8887 low-priority SCSI identification numbers to the host bus adapters.
8888 <br>&nbsp;
8889 <p><a NAME="hba"></a>
8890 <h2>
8891 A.4 Host Bus Adapter Features and Configuration Requirements</h2>
8892 Not all host bus adapters can be used with all cluster shared storage configurations.
8893 For example, some host bus adapters do not support hot plugging or cannot
8894 be used in a multi-initiator SCSI bus. You must use host bus adapters with
8895 the features and characteristics that your shared storage configuration
8896 requires. See <a href="#hardware-storage">Configuring Shared Disk Storage</a>
8897 for information about supported storage configurations.
8898 <p>The following table describes some recommended SCSI and Fibre Channel
8899 host bus adapters. It includes information about adapter termination and
8900 how to use the adapters in single and multi-initiator SCSI buses and Fibre
8901 Channel interconnects.
8902 <p>The specific product devices listed in the table have been tested. However,
8903 other devices may also work well in a cluster. If you want to use a host
8904 bus adapter other than a recommended one, the information in the table
8905 can help you determine if the device has the features and characteristics
8906 that will enable it to work in a cluster.
8907 <br>&nbsp;
8908 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
8909 <caption><col width=55*><col width=99*><col width=103*><thead>
8910 <br></thead></caption>
8911
8912 <tr ALIGN=CENTER VALIGN=CENTER>
8913 <th WIDTH="17%">Host Bus Adapter</th>
8914
8915 <th WIDTH="20%">Features&nbsp;</th>
8916
8917 <th WIDTH="22%">Single-Initiator Configuration</th>
8918
8919 <th WIDTH="41%">Multi-Initiator Configuration</th>
8920 </tr>
8921
8922 <tr VALIGN=TOP>
8923 <td WIDTH="17%" HEIGHT="217"><font size=-1>Adaptec 2940U2W (minimum driver:
8924 AIC7xxx V5.1.28)</font></td>
8925
8926 <td WIDTH="20%" HEIGHT="217"><font size=-1>Ultra2, wide, LVD</font>
8927 <p><font size=-1>HD68 external connector</font>
8928 <p><font size=-1>One channel, with two bus segments</font>
8929 <p><font size=-1>Set the onboard termination by using the BIOS utility.&nbsp;</font>
8930 <p><font size=-1>Onboard termination is disabled when the power is off.</font></td>
8931
8932 <td WIDTH="22%" HEIGHT="217"><font size=-1>Set the onboard termination
8933 to automatic (the default).&nbsp;</font>
8934 <p><font size=-1>You can use the internal SCSI connector for private (non-cluster)
8935 storage.</font></td>
8936
8937 <td WIDTH="41%" HEIGHT="217"><font size=-1>This configuration is not supported,
8938 because the adapter and its Linux driver do not reliably recover from SCSI
8939 bus resets that can be generated by the host bus adapter on the other cluster
8940 system.&nbsp;</font>
8941 <p><font size=-1>To use the adapter in a multi-initiator bus, the onboard
8942 termination must be disabled. This ensures proper termination when the
8943 power is off.</font>
8944 <p><font size=-1>For hot plugging support, disable the onboard termination
8945 for the Ultra2 segment, and connect an external terminator, such as a pass-through
8946 terminator, to the adapter. You cannot connect a cable to the internal
8947 Ultra2 connector.</font>
8948 <p><font size=-1>For no hot plugging support, disable the onboard termination
8949 for the Ultra2 segment, or set it to automatic. Connect a terminator to
8950 the end of the internal cable attached to the internal Ultra2 connector.&nbsp;</font></td>
8951 </tr>
8952 </table>
8953
8954 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
8955 <caption><col width=55*><col width=99*><col width=103*></caption>
8956
8957 <tr VALIGN=TOP>
8958 <td WIDTH="17%" HEIGHT="224"><font size=-1>Qlogic QLA1080 (minimum driver:
8959 QLA1x160 V3.12, obtained from <a href="http://www.qlogic.com/bbs-html/drivers.html" target="new_window">www.qlogic.com/
8960 bbs-html /drivers.html</a>)</font></td>
8961
8962 <td WIDTH="20%" HEIGHT="224"><font size=-1>Ultra2, wide, LVD</font>
8963 <p><font size=-1>VHDCI external connector</font>
8964 <p><font size=-1>One channel</font>
8965 <p><font size=-1>Set the onboard termination by using the BIOS utility.&nbsp;</font>
8966 <p><font size=-1>Onboard termination is disabled when the power is off,
8967 unless jumpers are used to enforce termination.</font></td>
8968
8969 <td WIDTH="22%" HEIGHT="224"><font size=-1>Set the onboard termination
8970 to automatic (the default).&nbsp;</font>
8971 <p><font size=-1>You can use the internal SCSI connector for private (non-cluster)
8972 storage.&nbsp;</font></td>
8973
8974 <td WIDTH="41%" HEIGHT="224"><font size=-1>This configuration is not supported,
8975 because the adapter and its Linux driver do not reliably recover from SCSI
8976 bus resets that can be generated by the host bus adapter on the other cluster
8977 system.</font>
8978 <p><font size=-1>&nbsp;For hot plugging support, disable the onboard termination,
8979 and use an external terminator, such as a VHDCI pass-through terminator,
8980 a VHDCI y-cable or a VHDCI trilink connector. You cannot connect a cable
8981 to the internal Ultra2 connector.&nbsp;</font>
8982 <p><font size=-1>For no hot plugging support, disable the onboard termination,
8983 or set it to automatic. Connect a terminator to the end of the internal
8984 cable connected to the internal Ultra2 connector.&nbsp;</font>
8985 <p><font size=-1>For an alternate configuration without hot plugging support,
8986 enable the onboard termination with jumpers, so the termination is enforced
8987 even when the power is off. You cannot connect a cable to the internal
8988 Ultra2 connector.</font></td>
8989 </tr>
8990 </table>
8991
8992 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
8993 <caption><col width=55*><col width=99*><col width=103*></caption>
8994
8995 <tr VALIGN=TOP>
8996 <td WIDTH="17%" HEIGHT="267"><font size=-1>Tekram DC-390U2W (minimum driver
8997 SYM53C8xx V1.3G)</font></td>
8998
8999 <td WIDTH="20%" HEIGHT="267"><font size=-1>Ultra2, wide, LVD</font>
9000 <p><font size=-1>HD68 external connector</font>
9001 <p><font size=-1>One channel, two segments</font>
9002 <p><font size=-1>Onboard termination for a bus segment is disabled if internal
9003 and external cables are connected to the segment. Onboard termination is
9004 enabled if there is only one cable connected to the segment.</font>
9005 <p><font size=-1>Termination is disabled when the power is off.</font></td>
9006
9007 <td WIDTH="22%" HEIGHT="267"><font size=-1>You can use the internal SCSI
9008 connector for private (non-cluster) storage.&nbsp;</font></td>
9009
9010 <td WIDTH="41%" HEIGHT="267"><font size=-1>Testing has shown that the adapter
9011 and its Linux driver reliably recover from SCSI bus resets that can be
9012 generated by the host bus adapter on the other cluster system.</font>
9013 <p><font size=-1>The adapter cannot be configured to use external termination,
9014 so it does not support hot plugging.</font>
9015 <p><font size=-1>Disable the onboard termination by connecting an internal
9016 cable to the internal Ultra2 connector, and then attaching a terminator
9017 to the end of the cable. This ensures proper termination when the power
9018 is off.</font>
9019 <p>&nbsp;</td>
9020 </tr>
9021 </table>
9022
9023 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
9024 <caption><col width=55*><col width=99*><col width=103*></caption>
9025
9026 <tr VALIGN=TOP>
9027 <td WIDTH="17%" HEIGHT="240"><font size=-1>Adaptec 29160 (minimum driver:
9028 AIC7xxx V5.1.28)</font></td>
9029
9030 <td WIDTH="20%" HEIGHT="240"><font size=-1>Ultra160</font>
9031 <p><font size=-1>HD68 external connector</font>
9032 <p><font size=-1>One channel, with two bus segments</font>
9033 <p><font size=-1>Set the onboard termination by using the BIOS utility.&nbsp;</font>
9034 <p><font size=-1>Termination is disabled when the power is off, unless
9035 jumpers are used to enforce termination.</font></td>
9036
9037 <td WIDTH="22%" HEIGHT="240"><font size=-1>Set the onboard termination
9038 to automatic (the default).&nbsp;</font>
9039 <p><font size=-1>You can use the internal SCSI connector for private (non-cluster)
9040 storage.</font></td>
9041
9042 <td WIDTH="41%" HEIGHT="240"><font size=-1>This configuration is not supported,
9043 because the adapter and its Linux driver do not reliably recover from SCSI
9044 bus resets that can be generated by the host bus adapter on the other cluster
9045 system.</font>
9046 <p><font size=-1>&nbsp;You cannot connect the adapter to an external terminator,
9047 such as a pass-through terminator, because the adapter does not function
9048 correctly with external termination. Therefore, the adapter does not support
9049 hot plugging.</font>
9050 <p><font size=-1>Use jumpers to enable the onboard termination for the
9051 Ultra160 segment. You cannot connect a cable to the internal Ultra160 connector.&nbsp;</font>
9052 <p><font size=-1>For an alternate configuration, disable the onboard termination
9053 for the Ultra160 segment, or set it to automatic. Then, attach a terminator
9054 to the end of an internal cable that is connected to the internal Ultra160
9055 connector.&nbsp;</font></td>
9056 </tr>
9057 </table>
9058
9059 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
9060 <caption><col width=55*><col width=99*><col width=103*></caption>
9061
9062 <tr VALIGN=TOP>
9063 <td WIDTH="17%" HEIGHT="221"><font size=-1>Adaptec 29160LP (minimum driver:
9064 AIC7xxx V5.1.28)</font></td>
9065
9066 <td WIDTH="20%" HEIGHT="221"><font size=-1>Ultra160</font>
9067 <p><font size=-1>VHDCI external connector</font>
9068 <p><font size=-1>One channel</font>
9069 <p><font size=-1>Set the onboard termination by using the BIOS utility.&nbsp;</font>
9070 <p><font size=-1>Termination is disabled when the power is off, unless
9071 jumpers are used to enforce termination.</font></td>
9072
9073 <td WIDTH="22%" HEIGHT="221"><font size=-1>Set the onboard termination
9074 to automatic (the default).&nbsp;</font>
9075 <p><font size=-1>You can use the internal SCSI connector for private (non-cluster)
9076 storage.&nbsp;</font></td>
9077
9078 <td WIDTH="41%" HEIGHT="221"><font size=-1>This configuration is not supported,
9079 because the adapter and its Linux driver do not reliably recover from SCSI
9080 bus resets that can be generated by the host bus adapter on the other cluster
9081 system.</font>
9082 <p><font size=-1>&nbsp;You cannot connect the adapter to an external terminator,
9083 such as a pass-through terminator, because the adapter does not function
9084 correctly with external termination. Therefore, the adapter does not support
9085 hot plugging.</font>
9086 <p><font size=-1>Use jumpers to enable the onboard termination. You cannot
9087 connect a cable to the internal Ultra160 connector.</font>
9088 <p><font size=-1>For an alternate configuration, disable the onboard termination,
9089 or set it to automatic. Then, attach a terminator to the end of an internal
9090 cable that is connected to the internal Ultra160 connector.</font></td>
9091 </tr>
9092 </table>
9093
9094 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
9095 <caption><col width=55*><col width=99*><col width=103*></caption>
9096
9097 <tr VALIGN=TOP>
9098 <td WIDTH="17%" HEIGHT="239"><font size=-1>Adaptec 39160 (minimum driver:
9099 AIC7xxx V5.1.28)</font>
9100 <p><font size=-1>Qlogic QLA12160 (minimum driver: QLA1x160 V3.12, obtained
9101 from <a href="http://www.qlogic.com/bbs-html/drivers.html" target="new_window">www.qlogic.com/
9102 bbs-html /drivers.html</a>)</font></td>
9103
9104 <td WIDTH="20%" HEIGHT="239"><font size=-1>Ultra160</font>
9105 <p><font size=-1>Two VHDCI external connectors</font>
9106 <p><font size=-1>Two channels</font>
9107 <p><font size=-1>Set the onboard termination by using the BIOS utility.&nbsp;</font>
9108 <p><font size=-1>Termination is disabled when the power is off, unless
9109 jumpers are used to enforce termination.</font></td>
9110
9111 <td WIDTH="22%" HEIGHT="239"><font size=-1>Set onboard termination to automatic
9112 (the default).&nbsp;</font>
9113 <p><font size=-1>You can use the internal SCSI connectors for private (non-cluster)
9114 storage.&nbsp;</font></td>
9115
9116 <td WIDTH="41%" HEIGHT="239"><font size=-1>This configuration is not supported,
9117 because the adapter and its Linux driver do not reliably recover from SCSI
9118 bus resets that can be generated by the host bus adapter on the other cluster
9119 system.</font>
9120 <p><font size=-1>&nbsp;You cannot connect the adapter to an external terminator,
9121 such as a pass-through terminator, because the adapter does not function
9122 correctly with external termination. Therefore, the adapter does not support
9123 hot plugging.</font>
9124 <p><font size=-1>Use jumpers to enable the onboard termination for a multi-initiator
9125 SCSI channel. You cannot connect a cable to the internal connector for
9126 the multi-initiator SCSI channel.&nbsp;</font>
9127 <p><font size=-1>For an alternate configuration, disable the onboard termination
9128 for the multi-initiator SCSI channel or set it to automatic. Then, attach
9129 a terminator to the end of an internal cable that is connected to the multi-initiator
9130 SCSI channel.&nbsp;</font></td>
9131 </tr>
9132 </table>
9133
9134 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
9135 <caption><col width=55*><col width=99*><col width=103*></caption>
9136
9137 <tr VALIGN=TOP>
9138 <td WIDTH="17%" HEIGHT="131"><font size=-1>LSI Logic SYM22915 (minimum
9139 driver: SYM53c8xx V1.6b, obtained from <a href="ftp://ftp.lsil.com/HostAdapterDrivers/linux" target="new_window">ftp.lsil.com
9140 /HostAdapter Drivers/linux</a>)</font></td>
9141
9142 <td WIDTH="20%" HEIGHT="239"><font size=-1>Ultra160</font>
9143 <p><font size=-1>Two VHDCI external connectors</font>
9144 <p><font size=-1>Two channels&nbsp;</font>
9145 <p><font size=-1>Set the onboard termination by using the BIOS utility.&nbsp;</font>
9146 <p><font size=-1>The onboard termination is automatically enabled or disabled,
9147 depending on the configuration, even when the module power is off. Use
9148 jumpers to disable the automatic termination.&nbsp;</font></td>
9149
9150 <td WIDTH="22%" HEIGHT="239"><font size=-1>Set onboard termination to automatic
9151 (the default).&nbsp;</font>
9152 <p><font size=-1>You can use the internal SCSI connectors for private (non-cluster)
9153 storage.&nbsp;</font></td>
9154
9155 <td WIDTH="41%" HEIGHT="239"><font size=-1>Testing has shown that the adapter
9156 and its Linux driver reliably recover from SCSI bus resets that can be
9157 generated by the host bus adapter on the other cluster system.</font>
9158 <p><font size=-1>For hot plugging support, use an external terminator,
9159 such as a VHDCI pass-through terminator, a VHDCI y-cable, or a VHDCI trilink
9160 connector. You cannot connect a cable to the internal connector.</font>
9161 <p><font size=-1>For no hot plugging support, connect a cable to the internal
9162 connector, and connect a terminator to the end of the internal cable attached
9163 to the internal connector.&nbsp;</font></td>
9164 </tr>
9165 </table>
9166
9167 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
9168 <caption><col width=55*><col width=99*><col width=103*></caption>
9169
9170 <tr VALIGN=TOP>
9171 <td WIDTH="17%" HEIGHT="131"><font size=-1>Adaptec AIC-7896 on the Intel
9172 L440GX+ motherboard (as used on the VA Linux 2200 series) (minimum driver:
9173 AIC7xxx V5.1.28)</font></td>
9174
9175 <td WIDTH="20%" HEIGHT="131"><font size=-1>One Ultra2, wide, LVD port,
9176 and one Ultra, wide port</font>
9177 <p><font size=-1>Onboard termination is permanently enabled, so the adapter
9178 must be located at the end of the bus.</font></td>
9179
9180 <td WIDTH="22%" HEIGHT="131"><font size=-1>Termination is permanently enabled,
9181 so no action is needed in order to use the adapter in a single-initiator
9182 bus.</font></td>
9183
9184 <td WIDTH="41%" HEIGHT="131"><font size=-1>The adapter cannot be used in
9185 a multi-initiator configuration, because it does not function correctly
9186 in this configuration.</font></td>
9187 </tr>
9188 </table>
9189
9190 <table BORDER CELLSPACING=0 CELLPADDING=4 WIDTH="100%" style="page-break-before: always" >
9191 <caption><col width=55*><col width=99*><col width=103*></caption>
9192
9193 <tr VALIGN=TOP>
9194 <td WIDTH="17%" HEIGHT="131"><font size=-1>QLA2200 (minimum driver: QLA2x00
9195 V2.23, obtained from <a href="http://www.qlogic.com/bbs-html/drivers.html" target="new_window">www.qlogic.com
9196 /bbs-html /drivers.html</a>)</font></td>
9197
9198 <td WIDTH="20%" HEIGHT="239"><font size=-1>Fibre Channel arbitrated loop
9199 and fabric</font>
9200 <p><font size=-1>One channel&nbsp;</font></td>
9201
9202 <td WIDTH="22%" HEIGHT="239"><font size=-1>Can be implemented with point-to-point
9203 links or with hubs. Configurations with switches have not been tested.&nbsp;</font>
9204 <p><font size=-1>Hubs are required for connection to a dual-controller
9205 RAID array or to multiple RAID arrays.</font></td>
9206
9207 <td WIDTH="41%" HEIGHT="239"><font size=-1>This configuration has not been
9208 tested.&nbsp;</font></td>
9209 </tr>
9210 </table>
9211
9212 <p><a NAME="adaptec"></a>
9213 <h2>
9214 A.5 Adaptec Host Bus Adapter Requirement</h2>
9215 If you are using Adaptec host bus adapters in multi-initiator shared disk
9216 storage connection, edit the <b><font face="Courier New, Courier, mono">/etc/lilo.conf</font></b>
9217 file and either add the following line or edit the <b><font face="Courier New, Courier, mono">append</font></b>
9218 line to match the following line:
9219 <p><b><font face="Courier New, Courier, mono">append="aic7xxx=no_reset"</font></b>
9220 <br>&nbsp;
9221 <p><a NAME="supp-software"></a>
9222 <h1>
9223 B Supplementary Software Information</h1>
9224 The information in the following sections can help you manage the cluster
9225 software configuration:
9226 <ul>
9227 <li>
9228 <a href="#cluster-com">Cluster Communication Mechanisms</a></li>
9229
9230 <li>
9231 <a href="#cluster-daemons">Cluster Daemons</a></li>
9232
9233 <li>
9234 <a href="#admin-scenarios">Failover and Recovery Scenarios</a></li>
9235
9236 <li>
9237 <a href="#app-tuning">Tuning Oracle Services</a></li>
9238
9239 <li>
9240 <a href="#lvs">Using a Cluster in an LVS Environment</a></li>
9241 </ul>
9242
9243 <br>&nbsp;
9244 <p><a NAME="cluster-com"></a>
9245 <h2>
9246 B.1 Cluster Communication Mechanisms</h2>
9247 A cluster uses several intracluster communication mechanisms to ensure
9248 data integrity and correct cluster behavior when a failure occurs. The
9249 cluster uses these mechanisms to:
9250 <ul>
9251 <li>
9252 Control when a system can become a cluster member</li>
9253
9254 <li>
9255 Determine the state of the cluster systems</li>
9256
9257 <li>
9258 Control the behavior of the cluster when a failure occurs</li>
9259 </ul>
9260 The cluster communication mechanisms are as follows:
9261 <ul>
9262 <li>
9263 Quorum disk partitions</li>
9264
9265 <br>&nbsp;
9266 <p>&nbsp;
9267 <br>&nbsp;
9268 <br>&nbsp;
9269 <p>Periodically, each cluster system writes a timestamp and system status
9270 (UP or DOWN) to the primary and backup quorum partitions, which are raw
9271 partitions located on shared storage. Each cluster system reads the system
9272 status and timestamp that were written by the other cluster system and
9273 determines if they are up to date. The cluster systems attempt to read
9274 the information from the primary quorum partition. If this partition is
9275 corrupted, the cluster systems read the information from the backup quorum
9276 partition and simultaneously repair the primary partition. Data consistency
9277 is maintained through checksums and any inconsistencies between the partitions
9278 are automatically corrected.
9279 <p>If a cluster system reboots but cannot write to both quorum partitions,
9280 the system will not be allowed to join the cluster. In addition, if an
9281 existing cluster system can no longer write to both partitions, it removes
9282 itself from the cluster by shutting down.
9283 <br>&nbsp;
9284 <li>
9285 Remote power switch monitoring</li>
9286
9287 <br>&nbsp;
9288 <p>&nbsp;
9289 <br>&nbsp;
9290 <br>&nbsp;
9291 <p>Periodically, each cluster system monitors the health of the remote
9292 power switch connection, if any. The cluster system uses this information
9293 to help determine the status of the other cluster system. The complete
9294 failure of the power switch communication mechanism does not automatically
9295 result in a failover.
9296 <br>&nbsp;
9297 <li>
9298 Ethernet and serial heartbeats</li>
9299
9300 <br>&nbsp;
9301 <p>&nbsp;
9302 <br>&nbsp;
9303 <br>&nbsp;
9304 <p>The cluster systems are connected together by using point-to-point Ethernet
9305 and serial lines. Periodically, each cluster system issues heartbeats (pings)
9306 across these lines. The cluster uses this information to help determine
9307 the status of the systems and to ensure correct cluster operation. The
9308 complete failure of the heartbeat communication mechanism does not automatically
9309 result in a failover.</ul>
9310 If a cluster system determines that the quorum timestamp from the other
9311 cluster system is not up-to-date, it will check the heartbeat status. If
9312 heartbeats to the system are still operating, the cluster will take no
9313 action at this time. If a cluster system does not update its timestamp
9314 after some period of time, and does not respond to heartbeat pings, it
9315 is considered down.
9316 <p>Note that the cluster will remain operational as long as one cluster
9317 system can write to the quorum disk partitions, even if all other communication
9318 mechanisms fail.
9319 <br>&nbsp;
9320 <p><a NAME="cluster-daemons"></a>
9321 <h2>
9322 B.2 Cluster Daemons</h2>
9323 The cluster daemons are as follows:
9324 <ul>
9325 <li>
9326 Quorum daemon</li>
9327
9328 <br>&nbsp;
9329 <p>&nbsp;
9330 <br>&nbsp;
9331 <br>&nbsp;
9332 <p>On each cluster system, the <b><font face="Courier New, Courier, mono">cluquorumd</font></b>
9333 quorum daemon periodically writes a timestamp and system status to a specific
9334 area on the primary and backup quorum disk partitions. The daemon also
9335 reads the other cluster system's timestamp and system status information
9336 from the primary quorum partition or, if the primary partition is corrupted,
9337 from the backup partition.
9338 <br>&nbsp;
9339 <li>
9340 Heartbeat daemon</li>
9341
9342 <br>&nbsp;
9343 <p>&nbsp;
9344 <br>&nbsp;
9345 <br>&nbsp;
9346 <p>On each cluster system, the <b><font face="Courier New, Courier, mono">cluhbd</font></b>
9347 heartbeat daemon issues pings across the point-to-point Ethernet and serial
9348 lines to which both cluster systems are connected.
9349 <br>&nbsp;
9350 <li>
9351 Power daemon</li>
9352
9353 <br>&nbsp;
9354 <p>&nbsp;
9355 <br>&nbsp;
9356 <br>&nbsp;
9357 <p>On each cluster system, the <b><font face="Courier New, Courier, mono">clupowerd</font></b>
9358 power daemon monitors the remote power switch connection, if any.&nbsp;
9359 You will notice that there are 2 separate <b>clupowerd </b>processes running.&nbsp;
9360 One is the <i>master</i> process which responds to message requests (e.g.
9361 status and power cycle); the other process does periodic polling of the
9362 power switch status.
9363 <br>&nbsp;
9364 <li>
9365 Service manager daemon</li>
9366
9367 <br>&nbsp;
9368 <p>&nbsp;
9369 <br>&nbsp;
9370 <br>&nbsp;
9371 <p>On each cluster system, the <b><font face="Courier New, Courier, mono">clusvcmgrd</font></b>
9372 service manager daemon responds to changes in cluster membership by stopping
9373 and starting services.
9374 <br>&nbsp;</ul>
9375
9376 <br>&nbsp;
9377 <p><a NAME="admin-scenarios"></a>
9378 <h2>
9379 B.3 Failover and Recovery Scenarios</h2>
9380 Understanding cluster behavior when significant events occur can help you
9381 manage a cluster. Note that cluster behavior depends on whether you are
9382 using power switches in the configuration. Power switches enable the cluster
9383 to maintain complete data integrity under all failure conditions.
9384 <p>The following sections describe how the system will respond to various
9385 failure and error scenarios:
9386 <ul>
9387 <li>
9388 <a href="#admin-failure">System Hang</a></li>
9389
9390 <li>
9391 <a href="#admin-panic">System Panic</a></li>
9392
9393 <li>
9394 <a href="#admin-storage">Inaccessible Quorum Partitions</a></li>
9395
9396 <li>
9397 <a href="#admin-network">Total Network Connection Failure</a></li>
9398
9399 <li>
9400 <a href="#admin-power">Remote Power Switch Connection Failure</a></li>
9401
9402 <li>
9403 <a href="#admin-quorum">Quorum Daemon Failure</a></li>
9404
9405 <li>
9406 <a href="#admin-heartbeat">Heartbeat Daemon Failure</a></li>
9407
9408 <li>
9409 <a href="#admin-powerd">Power Daemon Failure</a></li>
9410
9411 <li>
9412 <a href="#admin-serviceman">Service Manager Daemon Failure</a></li>
9413
9414 <br>&nbsp;</ul>
9415
9416 <br>&nbsp;
9417 <p><a NAME="admin-failure"></a>
9418 <h3>
9419 B.3.1 System Hang</h3>
9420 In a cluster configuration that uses power switches, if a system "hangs,"
9421 the cluster behaves as follows:
9422 <ol>
9423 <li>
9424 The functional cluster system detects that the "hung" cluster system is
9425 not updating its timestamp on the quorum partitions and is not communicating
9426 over the heartbeat channels.</li>
9427
9428 <br>&nbsp;
9429 <li>
9430 The functional cluster system power-cycles the "hung" system.</li>
9431
9432 <br>&nbsp;
9433 <li>
9434 The functional cluster system restarts any services that were running on
9435 the "hung" system.</li>
9436
9437 <br>&nbsp;
9438 <li>
9439 If the previously "hung" system reboots, and can join the cluster (that
9440 is, the system can write to both quorum partitions), services are re-balanced
9441 across the member systems, according to each service's placement policy.</li>
9442 </ol>
9443 In a cluster configuration that does not use power switches, if a system
9444 "hangs," the cluster behaves as follows:
9445 <ol>
9446 <li>
9447 The functional cluster system detects that the "hung" cluster system is
9448 not updating its timestamp on the quorum partitions and is not communicating
9449 over the heartbeat channels.</li>
9450
9451 <br>&nbsp;
9452 <li>
9453 The functional cluster system sets the status of the "hung" system to <b><font face="Courier New, Courier, mono">DOWN</font></b>
9454 on the quorum partitions, and then restarts the "hung" system's services.</li>
9455
9456 <br>&nbsp;
9457 <li>
9458 If the "hung" system becomes "unhung," it notices that its status is <b><font face="Courier New, Courier, mono">DOWN</font></b>,
9459 and initiates a system reboot.</li>
9460
9461 <br>&nbsp;
9462 <p>&nbsp;
9463 <br>&nbsp;
9464 <br>&nbsp;
9465 <p>If the system remains "hung," you must manually power-cycle the "hung"
9466 system in order for it to resume cluster operation.
9467 <br>&nbsp;
9468 <li>
9469 If the previously "hung" system reboots, and can join the cluster, services
9470 are re-balanced across the member systems, according to each service's
9471 placement policy.</li>
9472 </ol>
9473
9474 <br>&nbsp;
9475 <p><a NAME="admin-panic"></a>
9476 <h3>
9477 B.3.2 System Panic</h3>
9478 A system panic (crash) is a controlled response to a software-detected
9479 error. A panic attempts to return the system to a consistent state by shutting
9480 down the system. If a cluster system panics, the following occurs:
9481 <ol>
9482 <li>
9483 The functional cluster system detects that the cluster system that is experiencing
9484 the panic is not updating its timestamp on the quorum partitions and is
9485 not communicating over the heartbeat channels.</li>
9486
9487 <br>&nbsp;
9488 <li>
9489 The cluster system that is experiencing the panic initiates a system shut
9490 down and reboot.</li>
9491
9492 <br>&nbsp;
9493 <li>
9494 If you are using power switches, the functional cluster system power-cycles
9495 the cluster system that is experiencing the panic.</li>
9496
9497 <br>&nbsp;
9498 <li>
9499 The functional cluster system restarts any services that were running on
9500 the system that experienced the panic.</li>
9501
9502 <br>&nbsp;
9503 <li>
9504 When the system that experienced the panic reboots, and can join the cluster
9505 (that is, the system can write to both quorum partitions), services are
9506 re-balanced across the member systems, according to each service's placement
9507 policy.</li>
9508 </ol>
9509
9510 <br>&nbsp;
9511 <p><a NAME="admin-storage"></a>
9512 <h3>
9513 B.3.3 Inaccessible Quorum Partitions</h3>
9514 Inaccessible quorum partitions can be caused by the failure of a SCSI (or
9515 FibreChannel) adapter that is connected to the shared disk storage, or
9516 by a SCSI cable becoming disconnected to the shared disk storage. If one
9517 of these conditions occurs, and the SCSI bus remains terminated, the cluster
9518 behaves as follows:
9519 <ol>
9520 <li>
9521 The cluster system with the inaccessible quorum partitions notices that
9522 it cannot update its timestamp on the quorum partitions and initiates a
9523 reboot.</li>
9524
9525 <br>&nbsp;
9526 <li>
9527 If the cluster configuration includes power switches, the functional cluster
9528 system power-cycles the rebooting system.</li>
9529
9530 <br>&nbsp;
9531 <li>
9532 The functional cluster system restarts any services that were running on
9533 the system with the inaccessible quorum partitions.</li>
9534
9535 <br>&nbsp;
9536 <li>
9537 If the cluster system reboots, and can join the cluster (that is, the system
9538 can write to both quorum partitions), services are re-balanced across the
9539 member systems, according to each service's placement policy.</li>
9540 </ol>
9541
9542 <br>&nbsp;
9543 <h3>
9544 <a NAME="admin-network"></a></h3>
9545
9546 <h3>
9547 B.3.4 Total Network Connection Failure</h3>
9548 A total network connection failure occurs when all the heartbeat network
9549 connections between the systems fail. This can be caused by one of the
9550 following:
9551 <ul>
9552 <li>
9553 All the heartbeat network cables are disconnected from a system.</li>
9554
9555 <br>&nbsp;
9556 <li>
9557 All the serial connections and network interfaces used for heartbeat communication
9558 fail.</li>
9559 </ul>
9560 If a total network connection failure occurs, both systems detect the problem,
9561 but they also detect that the SCSI disk connections are still active. Therefore,
9562 services remain running on the systems and are not interrupted.
9563 <p>If a total network connection failure occurs, diagnose the problem and
9564 then do one of the following:
9565 <ul>
9566 <li>
9567 If the problem affects only one cluster system, relocate its services to
9568 the other system. You can then correct the problem, and relocate the services
9569 back to the original system.</li>
9570
9571 <br>&nbsp;
9572 <li>
9573 Manually stop the services on one cluster system. In this case, services
9574 do not automatically fail over to the other system. Instead, you must manually
9575 restart the services on the other system. After you correct the problem,
9576 you can re-balance the services across the systems.</li>
9577
9578 <br>&nbsp;
9579 <li>
9580 Shut down one cluster system. In this case, the following occurs:</li>
9581
9582 <br>&nbsp;
9583 <ol>
9584 <li>
9585 Services are stopped on the cluster system that is shut down.</li>
9586
9587 <br>&nbsp;
9588 <li>
9589 The remaining cluster system detects that the system is being shut down.</li>
9590
9591 <br>&nbsp;
9592 <li>
9593 Any services that were running on the system that was shut down are restarted
9594 on the remaining cluster system.</li>
9595
9596 <br>&nbsp;
9597 <li>
9598 If the system reboots, and can join the cluster (that is, the system can
9599 write to both quorum partitions), services are re-balanced across the member
9600 systems, according to each service's placement policy.</li>
9601 </ol>
9602 </ul>
9603
9604 <br>&nbsp;
9605 <p><a NAME="admin-power"></a>
9606 <h3>
9607 B.3.5 Remote Power Switch Connection Failure</h3>
9608 If a query to a remote power switch connection fails, but both systems
9609 continue to have power, there is no change in cluster behavior unless a
9610 cluster system attempts to use the failed remote power switch connection
9611 to power-cycle the other system. The power daemon will continually log
9612 high-priority messages indicating a power switch failure or a loss of connectivity
9613 to the power switch (for example, if a cable has been disconnected).
9614 <p>If a cluster system attempts to use a failed remote power switch, services
9615 running on the system that experienced the failure are stopped. However,
9616 to ensure data integrity, they are not failed over to the other cluster
9617 system. Instead, they remain stopped until the hardware failure is corrected.
9618 <br>&nbsp;
9619 <br>&nbsp;
9620 <p><a NAME="admin-quorum"></a>
9621 <h3>
9622 B.3.6 Quorum Daemon Failure</h3>
9623 If a quorum daemon fails on a cluster system, the system is no longer able
9624 to monitor the quorum partitions. If you are not using power switches in
9625 the cluster, this error condition may result in services being run on more
9626 than one cluster system, which can cause data corruption.
9627 <p>If a quorum daemon fails, and power switches are used in the cluster,
9628 the following occurs:
9629 <ol>
9630 <li>
9631 The functional cluster system detects that the cluster system whose quorum
9632 daemon has failed is not updating its timestamp on the quorum partitions,
9633 although the system is still communicating over the heartbeat channels.</li>
9634
9635 <br>&nbsp;
9636 <li>
9637 After a period of time, the functional cluster system power-cycles the
9638 cluster system whose quorum daemon has failed.</li>
9639
9640 <br>&nbsp;
9641 <li>
9642 The functional cluster system restarts any services that were running on
9643 the cluster system whose quorum daemon has failed.</li>
9644
9645 <br>&nbsp;
9646 <li>
9647 If the cluster system reboots and can join the cluster (that is, it can
9648 write to the quorum partitions), services are re-balanced across the member
9649 systems, according to each service's placement policy.</li>
9650 </ol>
9651 If a quorum daemon fails, and power switches are not used in the cluster,
9652 the following occurs:
9653 <ol>
9654 <li>
9655 The functional cluster system detects that the cluster system whose quorum
9656 daemon has failed is not updating its timestamp on the quorum partitions,
9657 although the system is still communicating over the heartbeat channels.</li>
9658
9659 <br>&nbsp;
9660 <li>
9661 The functional cluster system restarts any services that were running on
9662 the cluster system whose quorum daemon has failed. Both cluster systems
9663 may be running services simultaneously, which can cause data corruption.</li>
9664
9665 <br>&nbsp;</ol>
9666
9667 <br>&nbsp;
9668 <h3>
9669 <a NAME="admin-heartbeat"></a></h3>
9670
9671 <h3>
9672 B.3.7 Heartbeat Daemon Failure</h3>
9673 If the heartbeat daemon fails on a cluster system, service failover time
9674 will increase because the quorum daemon cannot quickly determine the state
9675 of the other cluster system. By itself, a heartbeat daemon failure will
9676 not cause a service failover.
9677 <p><a NAME="admin-powerd"></a>
9678 <h3>
9679 B.3.8 Power Daemon Failure</h3>
9680 If the power daemon fails on a cluster system and the other cluster system
9681 experiences a severe failure (for example, a system panic), the cluster
9682 system will not be able to power-cycle the failed system. Instead, the
9683 cluster system will continue to run its services, and the services that
9684 were running on the failed system will not fail over. Cluster behavior
9685 is the same as for a remote power switch connection failure.
9686 <br>&nbsp;
9687 <p><a NAME="admin-serviceman"></a>
9688 <h3>
9689 B.3.9 Service Manager Daemon Failure</h3>
9690 If the service manager daemon fails, services cannot be started or stopped
9691 until you restart the service manager daemon or reboot the system.
9692 <br>&nbsp;
9693 <br>&nbsp;
9694 <p><a NAME="app-tuning"></a>
9695 <h2>
9696 B.5 Tuning Oracle Services</h2>
9697 The Oracle database recovery time after a failover is directly proportional
9698 to the number of outstanding transactions and the size of the database.
9699 The following parameters control database recovery time:
9700 <ul>
9701 <li>
9702 <b><font face="Courier New, Courier, mono">LOG_CHECKPOINT_TIMEOUT</font></b></li>
9703
9704 <li>
9705 <b><font face="Courier New, Courier, mono">LOG_CHECKPOINT_INTERVAL</font></b></li>
9706
9707 <li>
9708 <b><font face="Courier New, Courier, mono">FAST_START_IO_TARGET</font></b></li>
9709
9710 <li>
9711 <b><font face="Courier New, Courier, mono">REDO_LOG_FILE_SIZES</font></b></li>
9712 </ul>
9713 To minimize recovery time, set the previous parameters to relatively low
9714 values. Note that excessively low values will adversely impact performance.
9715 You may have to try different values in order to find the optimal value.
9716 <p>Oracle provides additional tuning parameters that control the number
9717 of database transaction retries and the retry delay time. Be sure that
9718 these values are large enough to accommodate the failover time in your
9719 environment. This will ensure that failover is transparent to database
9720 client application programs and does not require programs to reconnect.
9721 <br><a NAME="lvs"></a>
9722 <h2>
9723 B.7 Using a Cluster in an LVS Environment</h2>
9724 <i>Editorial comment: Integrate with Piranha documentation.</i>
9725 <p>You can use a cluster in conjunction with Linux Virtual Server (LVS)
9726 to deploy a highly available e-commerce site that has complete data integrity
9727 and application availability, in addition to load balancing capabilities.
9728 Note that various commercial cluster offerings are LVS derivatives. See
9729 <a href="http://www.linuxvirtualserver.org" target="_blank">www.linuxvirtualserver.org</a>
9730 for detailed information about LVS and downloading the software.
9731 <p>The following figure shows how you could use a cluster in an LVS environment.
9732 It has a three-tier architecture, where the top tier consists of LVS load-balancing
9733 systems to distribute Web requests, the second tier consists of a set of
9734 Web servers to serve the requests, and the third tier consists of a cluster
9735 to serve data to the Web servers.
9736 <h4>
9737 Cluster in an LVS Environment</h4>
9738 <img SRC="lvs_cluster.jpg" >
9739 <p>In an LVS configuration, client systems issue requests on the World
9740 Wide Web. For security reasons, these requests enter a Web site through
9741 a firewall, which can be a Linux system serving in that capacity or a dedicated
9742 firewall device. For redundancy, you can configure firewall devices in
9743 a failover configuration. Behind the firewall are LVS load-balancing systems,
9744 which can be configured in an active-standby mode. The active load-balancing
9745 system forwards the requests to a set of Web servers.
9746 <p>Each Web server can independently process an HTTP request from a client
9747 and send the response back to the client. LVS enables you to expand a Web
9748 site's capacity by adding Web servers to the load-balancing systems' set
9749 of active Web servers. In addition, if a Web server fails, it can be removed
9750 from the set.
9751 <p>This LVS configuration is particularly suitable if the Web servers serve
9752 only static Web content, which consists of small amounts of infrequently
9753 changing data, such as corporate logos, that can be easily duplicated on
9754 the Web servers. However, this configuration is not suitable if the Web
9755 servers serve dynamic content, which consists of information that changes
9756 frequently. Dynamic content could include a product inventory, purchase
9757 orders, or customer database, which must be consistent on all the Web servers
9758 to ensure that customers have access to up-to-date and accurate information.
9759 <p>To serve dynamic Web content in an LVS configuration, you can add a
9760 cluster behind the Web servers, as shown in the previous figure. This combination
9761 of LVS and a cluster enables you to configure a high-integrity, no-single-point-of-failure
9762 e-commerce site. The cluster can run a highly-available instance of a database
9763 or a set of databases that are network-accessible to the web servers.
9764 <p>For example, the figure could represent an e-commerce site used for
9765 online merchandise ordering through a URL. Client requests to the URL pass
9766 through the firewall to the active LVS load-balancing system, which then
9767 forwards the requests to one of the three Web servers. The cluster systems
9768 serve dynamic data to the Web servers, which forward the data to the requesting
9769 client system.
9770 <p>
9771 <hr width="75%" noshade>
9772 <br>&nbsp;
9773 <br>&nbsp;
9774 </body>
9775 </html>