1 .\" $NetBSD: raidctl.8,v 1.58 2009/11/17 19:09:38 jld Exp $
3 .\" Copyright (c) 1998, 2002 The NetBSD Foundation, Inc.
4 .\" All rights reserved.
6 .\" This code is derived from software contributed to The NetBSD Foundation
9 .\" Redistribution and use in source and binary forms, with or without
10 .\" modification, are permitted provided that the following conditions
12 .\" 1. Redistributions of source code must retain the above copyright
13 .\" notice, this list of conditions and the following disclaimer.
14 .\" 2. Redistributions in binary form must reproduce the above copyright
15 .\" notice, this list of conditions and the following disclaimer in the
16 .\" documentation and/or other materials provided with the distribution.
18 .\" THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
19 .\" ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
20 .\" TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
21 .\" PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
22 .\" BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
23 .\" CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
24 .\" SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
25 .\" INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
26 .\" CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
27 .\" ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28 .\" POSSIBILITY OF SUCH DAMAGE.
31 .\" Copyright (c) 1995 Carnegie-Mellon University.
32 .\" All rights reserved.
34 .\" Author: Mark Holland
36 .\" Permission to use, copy, modify and distribute this software and
37 .\" its documentation is hereby granted, provided that both the copyright
38 .\" notice and this permission notice appear in all copies of the
39 .\" software, derivative works or modified versions, and any portions
40 .\" thereof, and that both notices appear in supporting documentation.
42 .\" CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
43 .\" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
44 .\" FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
46 .\" Carnegie Mellon requests users of this software to return to
48 .\" Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
49 .\" School of Computer Science
50 .\" Carnegie Mellon University
51 .\" Pittsburgh PA 15213-3890
53 .\" any improvements or extensions that they make and grant Carnegie the
54 .\" rights to redistribute these changes.
61 .Nd configuration utility for the RAIDframe disk driver
65 .Fl a Ar component Ar dev
68 .Fl A Op yes | no | root
75 .Fl c Ar config_file Ar dev
78 .Fl C Ar config_file Ar dev
81 .Fl f Ar component Ar dev
84 .Fl F Ar component Ar dev
87 .Fl g Ar component Ar dev
96 .Fl I Ar serial_number Ar dev
115 .Fl r Ar component Ar dev
118 .Fl R Ar component Ar dev
130 is the user-land control program for
132 the RAIDframe disk device.
134 is primarily used to dynamically configure and unconfigure RAIDframe disk
136 For more information about the RAIDframe disk device, see
139 This document assumes the reader has at least rudimentary knowledge of
140 RAID and RAID concepts.
142 The command-line options for
145 .Bl -tag -width indent
146 .It Fl a Ar component Ar dev
149 as a hot spare for the device
151 Component labels (which identify the location of a given
152 component within a particular RAID set) are automatically added to the
153 hot spare after it has been used and are not required for
156 .It Fl A Ic yes Ar dev
157 Make the RAID set auto-configurable.
158 The RAID set will be automatically configured at boot
160 the root file system is mounted.
161 Note that all components of the set must be of type
164 .It Fl A Ic no Ar dev
165 Turn off auto-configuration for the RAID set.
166 .It Fl A Ic root Ar dev
167 Make the RAID set auto-configurable, and also mark the set as being
168 eligible to be the root partition.
169 A RAID set configured this way will
171 the use of the boot disk as the root device.
172 All components of the set must be of type
175 Note that only certain architectures
176 .Pq currently alpha, i386, pmax, sparc, sparc64, and vax
177 support booting a kernel directly from a RAID set.
179 Initiate a copyback of reconstructed data from a spare disk to
181 This is performed after a component has failed,
182 and the failed drive has been reconstructed onto a spare drive.
183 .It Fl c Ar config_file Ar dev
184 Configure the RAIDframe device
186 according to the configuration given in
188 A description of the contents of
191 .It Fl C Ar config_file Ar dev
194 but forces the configuration to take place.
195 This is required the first time a RAID set is configured.
196 .It Fl f Ar component Ar dev
197 This marks the specified
199 as having failed, but does not initiate a reconstruction of that component.
200 .It Fl F Ar component Ar dev
203 of the device, and immediately begin a reconstruction of the failed
204 disk onto an available hot spare.
205 This is one of the mechanisms used to start
206 the reconstruction process if a component does have a hardware failure.
207 .It Fl g Ar component Ar dev
208 Get the component label for the specified component.
210 Generate the configuration of the RAIDframe device in a format suitable for
217 Initialize the RAID device.
218 In particular, (re-)write the parity on the selected device.
223 RAID sets before the RAID device is labeled and before
224 file systems are created on the RAID device.
225 .It Fl I Ar serial_number Ar dev
226 Initialize the component labels on each component of the device.
228 is used as one of the keys in determining whether a
229 particular set of components belong to the same RAID set.
230 While not strictly enforced, different serial numbers should be used for
234 be performed when a new RAID set is created.
236 Display status information about the parity map on the RAID set, if any.
239 then the current contents of the parity map will be output (in
240 hexadecimal format) as well.
241 .It Fl M Ic yes Ar dev
242 .\"XXX should there be a section with more info on the parity map feature?
243 Enable the use of a parity map on the RAID set; this is the default,
244 and greatly reduces the time taken to check parity after unclean
245 shutdowns at the cost of some very slight overhead during normal
247 Changes to this setting will take effect the next time the set is
249 Note that RAID-0 sets, having no parity, will not use a parity map in
251 .It Fl M Ic no Ar dev
252 Disable the use of a parity map on the RAID set; doing this is not
254 This will take effect the next time the set is configured.
255 .It Fl M Ic set Ar cooldown Ar tickms Ar regions Ar dev
256 Alter the parameters of the parity map; parameters to leave unchanged
257 can be given as 0, and trailing zeroes may be omitted.
258 .\"XXX should this explanation be deferred to another section as well?
259 The RAID set is divided into
261 regions; each region is marked dirty for at most
265 milliseconds each after a write to it, and at least
270 take effect the next time is configured, while changes to the other
271 parameters are applied immediately.
272 The default parameters are expected to be reasonable for most workloads.
274 Check the status of the parity on the RAID set.
275 Displays a status message,
276 and returns successfully if the parity is up-to-date.
278 Check the status of the parity on the RAID set, and initialize
279 (re-write) the parity if the parity is not known to be up-to-date.
280 This is normally used after a system crash (and before a
282 to ensure the integrity of the parity.
283 .It Fl r Ar component Ar dev
284 Remove the spare disk specified by
286 from the set of available spare components.
287 .It Fl R Ar component Ar dev
290 if necessary, and immediately begins a reconstruction back to
292 This is useful for reconstructing back onto a component after
293 it has been replaced following a failure.
295 Display the status of the RAIDframe device for each of the components
298 Check the status of parity re-writing, component reconstruction, and
300 The output indicates the amount of progress
301 achieved in each of these areas.
303 Unconfigure the RAIDframe device.
304 This does not remove any component labels or change any configuration
305 settings (e.g. auto-configuration settings) for the RAID set.
308 For operations such as reconstructions, parity
309 re-writing, and copybacks, provide a progress indicator.
317 may be either the full name of the device, e.g.,
319 for the i386 architecture, or
321 for many others, or just simply
324 .Pa /dev/rraid0[cd] ) .
325 It is recommended that the partitions used to represent the
326 RAID device are not used for file systems.
327 .Ss Configuration file
328 The format of the configuration file is complex, and
329 only an abbreviated treatment is given here.
330 In the configuration files, a
332 indicates the beginning of a comment.
334 There are 4 required sections of a configuration file, and 2
336 Each section begins with a
338 followed by the section name,
339 and the configuration parameters associated with that section.
340 The first section is the
342 section, and it specifies
343 the number of rows, columns, and spare disks in the RAID set.
345 .Bd -literal -offset indent
350 indicates an array with 1 row, 3 columns, and 0 spare disks.
351 Note that although multi-dimensional arrays may be specified, they are
353 supported in the driver.
355 The second section, the
357 section, specifies the actual components of the device.
359 .Bd -literal -offset indent
366 specifies the three component disks to be used in the RAID device.
367 If any of the specified drives cannot be found when the RAID device is
368 configured, then they will be marked as
370 and the system will operate in degraded mode.
373 that the order of the components in the configuration file does not
374 change between configurations of a RAID device.
375 Changing the order of the components will result in data loss
376 if the set is configured with the
379 In normal circumstances, the RAID set will not configure if only
381 is specified, and the components are out-of-order.
383 The next section, which is the
385 section, is optional, and, if present, specifies the devices to be used as
387 \(em devices which are on-line,
388 but are not actively used by the RAID driver unless
389 one of the main components fail.
393 .Bd -literal -offset indent
398 for a configuration with a single spare component.
399 If no spare drives are to be used in the configuration, then the
401 section may be omitted.
403 The next section is the
406 This section describes the general layout parameters for the RAID device,
407 and provides such information as
408 sectors per stripe unit,
409 stripe units per parity unit,
410 stripe units per reconstruction unit,
411 and the parity configuration to use.
412 This section might look like:
413 .Bd -literal -offset indent
415 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level
419 The sectors per stripe unit specifies, in blocks, the interleave
420 factor; i.e., the number of contiguous sectors to be written to each
421 component for a single stripe.
422 Appropriate selection of this value (32 in this example)
423 is the subject of much research in RAID architectures.
424 The stripe units per parity unit and
425 stripe units per reconstruction unit are normally each set to 1.
426 While certain values above 1 are permitted, a discussion of valid
427 values and the consequences of using anything other than 1 are outside
428 the scope of this document.
429 The last value in this section (5 in this example)
430 indicates the parity configuration desired.
431 Valid entries include:
435 No parity, only simple striping.
439 The parity is the mirror.
442 Striping across components, with parity stored on the last component.
445 Striping across components, parity distributed across all components.
448 There are other valid entries here, including those for Even-Odd
449 parity, RAID level 5 with rotated sparing, Chained declustering,
450 and Interleaved declustering, but as of this writing the code for
451 those parity operations has not been tested with
454 The next required section is the
457 This is most often specified as:
458 .Bd -literal -offset indent
463 where the queuing method is specified as fifo (first-in, first-out),
464 and the size of the per-component queue is limited to 100 requests.
465 Other queuing methods may also be specified, but a discussion of them
466 is beyond the scope of this document.
468 The final section, the
470 section, is optional.
471 For more details on this the reader is referred to
472 the RAIDframe documentation discussed in the
478 for a more complete configuration file example.
480 .Bl -tag -width /dev/XXrXraidX -compact
481 .It Pa /dev/{,r}raid*
483 device special files.
486 It is highly recommended that before using the RAID driver for real
487 file systems that the system administrator(s) become quite familiar
490 and that they understand how the component reconstruction process works.
491 The examples in this section will focus on configuring a
492 number of different RAID sets of varying degrees of redundancy.
493 By working through these examples, administrators should be able to
494 develop a good feel for how to configure a RAID set, and how to
495 initiate reconstruction of failed components.
497 In the following examples
499 will be used to denote the RAID device.
500 Depending on the architecture,
504 may be used in place of
506 .Ss Initialization and Configuration
507 The initial step in configuring a RAID set is to identify the components
508 that will be used in the RAID set.
509 All components should be the same size.
510 Each component should have a disklabel type of
512 and a typical disklabel entry for a RAID component might look like:
513 .Bd -literal -offset indent
514 f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
519 will also work as the component type, the type
521 is preferred for RAIDframe use, as it is required for features such as
523 As part of the initial configuration of each RAID set,
524 each component will be given a
525 .Sq component label .
528 contains important information about the component, including a
529 user-specified serial number, the row and column of that component in
530 the RAID set, the redundancy level of the RAID set, a
531 .Sq modification counter ,
532 and whether the parity information (if any) on that
533 component is known to be correct.
534 Component labels are an integral part of the RAID set,
535 since they are used to ensure that components
536 are configured in the correct order, and used to keep track of other
537 vital information about the RAID set.
538 Component labels are also required for the auto-detection
539 and auto-configuration of RAID sets at boot time.
540 For a component label to be considered valid, that
541 particular component label must be in agreement with the other
542 component labels in the set.
543 For example, the serial number,
544 .Sq modification counter ,
545 number of rows and number of columns must all be in agreement.
546 If any of these are different, then the component is
547 not considered to be part of the set.
550 for more information about component labels.
552 Once the components have been identified, and the disks have
555 is then used to configure the
558 To configure the device, a configuration file which looks something like:
559 .Bd -literal -offset indent
561 # numRow numCol numSpare
573 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
580 is created in a file.
581 The above configuration file specifies a RAID 5
582 set consisting of the components
591 in case one of the three main drives should fail.
592 A RAID 0 set would be specified in a similar way:
593 .Bd -literal -offset indent
595 # numRow numCol numSpare
605 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
612 In this case, devices
618 are the components that make up this RAID set.
619 Note that there are no hot spares for a RAID 0 set,
620 since there is no way to recover data if any of the components fail.
622 For a RAID 1 (mirror) set, the following configuration might be used:
623 .Bd -literal -offset indent
625 # numRow numCol numSpare
633 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
644 are the two components of the mirror set.
645 While no hot spares have been specified in this
646 configuration, they easily could be, just as they were specified in
647 the RAID 5 case above.
648 Note as well that RAID 1 sets are currently limited to only 2 components.
649 At present, n-way mirroring is not possible.
651 The first time a RAID set is configured, the
654 .Bd -literal -offset indent
655 raidctl -C raid0.conf raid0
660 is the name of the RAID configuration file.
663 forces the configuration to succeed, even if any of the component
664 labels are incorrect.
667 option should not be used lightly in
668 situations other than initial configurations, as if
669 the system is refusing to configure a RAID set, there is probably a
670 very good reason for it.
671 After the initial configuration is done (and
672 appropriate component labels are added with the
674 option) then raid0 can be configured normally with:
675 .Bd -literal -offset indent
676 raidctl -c raid0.conf raid0
679 When the RAID set is configured for the first time, it is
680 necessary to initialize the component labels, and to initialize the
681 parity on the RAID set.
682 Initializing the component labels is done with:
683 .Bd -literal -offset indent
684 raidctl -I 112341 raid0
689 is a user-specified serial number for the RAID set.
690 This initialization step is
693 As well, using different serial numbers between RAID sets is
694 .Em strongly encouraged ,
695 as using the same serial number for all RAID sets will only serve to
696 decrease the usefulness of the component label checking.
698 Initializing the RAID set is done via the
705 RAID sets, since among other things it verifies that the parity (if
706 any) on the RAID set is correct.
707 Since this initialization may be quite time-consuming, the
709 option may be also used in conjunction with
711 .Bd -literal -offset indent
715 This will give more verbose output on the
716 status of the initialization:
717 .Bd -literal -offset indent
718 Initiating re-write of parity
719 Parity Re-write status:
720 10% |**** | ETA: 06:03 /
723 The output provides a
725 in both a numeric and graphical format, as well as an estimated time
726 to completion of the operation.
728 Since it is the parity that provides the
730 part of RAID, it is critical that the parity is correct as much as possible.
731 If the parity is not correct, then there is no
732 guarantee that data will not be lost if a component fails.
734 Once the parity is known to be correct, it is then safe to perform
739 on the device or its file systems, and then to mount the file systems
742 Under certain circumstances (e.g., the additional component has not
743 arrived, or data is being migrated off of a disk destined to become a
744 component) it may be desirable to configure a RAID 1 set with only
746 This can be achieved by using the word
748 to indicate that a particular component is not present.
750 .Bd -literal -offset indent
752 # numRow numCol numSpare
760 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1
768 is the real component, and will be the second disk of a RAID 1 set.
769 The first component is simply marked as being absent.
774 as above) proceeds normally, but initialization of the RAID set will
775 have to wait until all physical components are present.
776 After configuration, this set can be used normally, but will be operating
778 Once a second physical component is obtained, it can be hot-added,
779 the existing data mirrored, and normal operation resumed.
781 The size of the resulting RAID set will depend on the number of data
782 components in the set.
783 Space is automatically reserved for the component labels, and
784 the actual amount of space used
785 for data on a component will be rounded down to the largest possible
786 multiple of the sectors per stripe unit (sectPerSU) value.
787 Thus, the amount of space provided by the RAID set will be less
788 than the sum of the size of the components.
789 .Ss Maintenance of the RAID set
790 After the parity has been initialized for the first time, the command:
791 .Bd -literal -offset indent
795 can be used to check the current status of the parity.
796 To check the parity and rebuild it necessary (for example,
797 after an unclean shutdown) the command:
798 .Bd -literal -offset indent
803 Note that re-writing the parity can be done while
804 other operations on the RAID set are taking place (e.g., while doing a
806 on a file system on the RAID set).
807 However: for maximum effectiveness of the RAID set, the parity should be
808 known to be correct before any data on the set is modified.
810 To see how the RAID set is doing, the following command can be used to
811 show the RAID set's status:
812 .Bd -literal -offset indent
816 The output will look something like:
817 .Bd -literal -offset indent
824 Component label for /dev/sd1e:
825 Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
826 Version: 2 Serial Number: 13432 Mod Counter: 65
828 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
829 RAID Level: 5 blocksize: 512 numBlocks: 1799936
831 Last configured as: raid0
832 Component label for /dev/sd2e:
833 Row: 0 Column: 1 Num Rows: 1 Num Columns: 3
834 Version: 2 Serial Number: 13432 Mod Counter: 65
836 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
837 RAID Level: 5 blocksize: 512 numBlocks: 1799936
839 Last configured as: raid0
840 Component label for /dev/sd3e:
841 Row: 0 Column: 2 Num Rows: 1 Num Columns: 3
842 Version: 2 Serial Number: 13432 Mod Counter: 65
844 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
845 RAID Level: 5 blocksize: 512 numBlocks: 1799936
847 Last configured as: raid0
849 Reconstruction is 100% complete.
850 Parity Re-write is 100% complete.
851 Copyback is 100% complete.
854 This indicates that all is well with the RAID set.
855 Of importance here are the component lines which read
860 .Sq Parity status: clean
861 indicates that the parity is up-to-date for this RAID set,
862 whether or not the RAID set is in redundant or degraded mode.
863 .Sq Parity status: DIRTY
864 indicates that it is not known if the parity information is
865 consistent with the data, and that the parity information needs
867 Note that if there are file systems open on the RAID set,
868 the individual components will not be
870 but the set as a whole can still be clean.
872 To check the component label of
874 the following is used:
875 .Bd -literal -offset indent
876 raidctl -g /dev/sd1e raid0
879 The output of this command will look something like:
880 .Bd -literal -offset indent
881 Component label for /dev/sd1e:
882 Row: 0 Column: 0 Num Rows: 1 Num Columns: 3
883 Version: 2 Serial Number: 13432 Mod Counter: 65
885 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
886 RAID Level: 5 blocksize: 512 numBlocks: 1799936
888 Last configured as: raid0
890 .Ss Dealing with Component Failures
892 (perhaps to test reconstruction) it is necessary to pretend a drive
893 has failed, the following will perform that function:
894 .Bd -literal -offset indent
895 raidctl -f /dev/sd2e raid0
898 The system will then be performing all operations in degraded mode,
899 where missing data is re-computed from existing data and the parity.
900 In this case, obtaining the status of raid0 will return (in part):
901 .Bd -literal -offset indent
910 Note that with the use of
912 a reconstruction has not been started.
913 To both fail the disk and start a reconstruction, the
916 .Bd -literal -offset indent
917 raidctl -F /dev/sd2e raid0
922 option may be used first, and then the
924 option used later, on the same disk, if desired.
925 Immediately after the reconstruction is started, the status will report:
926 .Bd -literal -offset indent
929 /dev/sd2e: reconstructing
932 /dev/sd4e: used_spare
935 Reconstruction is 10% complete.
936 Parity Re-write is 100% complete.
937 Copyback is 100% complete.
940 This indicates that a reconstruction is in progress.
941 To find out how the reconstruction is progressing the
944 This will indicate the progress in terms of the
945 percentage of the reconstruction that is completed.
946 When the reconstruction is finished the
949 .Bd -literal -offset indent
955 /dev/sd4e: used_spare
958 Reconstruction is 100% complete.
959 Parity Re-write is 100% complete.
960 Copyback is 100% complete.
963 At this point there are at least two options.
966 is known to be good (i.e., the failure was either caused by
970 or the failed disk was replaced), then a copyback of the data can
971 be initiated with the
974 In this example, this would copy the entire contents of
978 Once the copyback procedure is complete, the
979 status of the device would be (in part):
980 .Bd -literal -offset indent
989 and the system is back to normal operation.
991 The second option after the reconstruction is to simply use
995 in the configuration file.
996 For example, the configuration file (in part) might now look like:
997 .Bd -literal -offset indent
1009 is completely interchangeable with
1012 Note that extreme care must be taken when
1013 changing the order of the drives in a configuration.
1014 This is one of the few instances where the devices and/or
1015 their orderings can be changed without loss of data!
1016 In general, the ordering of components in a configuration file should
1020 If a component fails and there are no hot spares
1021 available on-line, the status of the RAID set might (in part) look like:
1022 .Bd -literal -offset indent
1030 In this case there are a number of options.
1031 The first option is to add a hot spare using:
1032 .Bd -literal -offset indent
1033 raidctl -a /dev/sd4e raid0
1036 After the hot add, the status would then be:
1037 .Bd -literal -offset indent
1046 Reconstruction could then take place using
1050 A second option is to rebuild directly onto
1052 Once the disk containing
1054 has been replaced, one can simply use:
1055 .Bd -literal -offset indent
1056 raidctl -R /dev/sd2e raid0
1062 As the rebuilding is in progress, the status will be:
1063 .Bd -literal -offset indent
1066 /dev/sd2e: reconstructing
1071 and when completed, will be:
1072 .Bd -literal -offset indent
1080 In circumstances where a particular component is completely
1081 unavailable after a reboot, a special component name will be used to
1082 indicate the missing component.
1084 .Bd -literal -offset indent
1091 indicates that the second component of this RAID set was not detected
1092 at all by the auto-configuration code.
1095 can be used anywhere a normal component name would be used.
1096 For example, to add a hot spare to the above set, and rebuild to that hot
1097 spare, the following could be done:
1098 .Bd -literal -offset indent
1099 raidctl -a /dev/sd3e raid0
1100 raidctl -F component1 raid0
1103 at which point the data missing from
1105 would be reconstructed onto
1108 When more than one component is marked as
1110 due to a non-component hardware failure (e.g., loss of power to two
1111 components, adapter problems, termination problems, or cabling issues) it
1112 is quite possible to recover the data on the RAID set.
1113 The first thing to be aware of is that the first disk to fail will
1114 almost certainly be out-of-sync with the remainder of the array.
1115 If any IO was performed between the time the first component is considered
1117 and when the second component is considered
1119 then the first component to fail will
1121 contain correct data, and should be ignored.
1122 When the second component is marked as failed, however, the RAID device will
1123 (currently) panic the system.
1124 At this point the data on the RAID set
1125 (not including the first failed component) is still self consistent,
1126 and will be in no worse state of repair than had the power gone out in
1127 the middle of a write to a file system on a non-RAID device.
1128 The problem, however, is that the component labels may now have 3 different
1129 .Sq modification counters
1130 (one value on the first component that failed, one value on the second
1131 component that failed, and a third value on the remaining components).
1132 In such a situation, the RAID set will not autoconfigure,
1133 and can only be forcibly re-configured
1137 To recover the RAID set, one must first remedy whatever physical
1138 problem caused the multiple-component failure.
1139 After that is done, the RAID set can be restored by forcibly
1140 configuring the raid set
1142 the component that failed first.
1147 fail (in that order) in a RAID set of the following configuration:
1148 .Bd -literal -offset indent
1159 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
1167 then the following configuration (say "recover_raid0.conf")
1168 .Bd -literal -offset indent
1179 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5
1187 .Bd -literal -offset indent
1188 raidctl -C recover_raid0.conf raid0
1191 to force the configuration of raid0.
1193 .Bd -literal -offset indent
1194 raidctl -I 12345 raid0
1197 will be required in order to synchronize the component labels.
1198 At this point the file systems on the RAID set can then be checked and
1200 To complete the re-construction of the RAID set,
1202 is simply hot-added back into the array, and reconstructed
1203 as described earlier.
1205 RAID sets can be layered to create more complex and much larger RAID sets.
1206 A RAID 0 set, for example, could be constructed from four RAID 5 sets.
1207 The following configuration file shows such a setup:
1208 .Bd -literal -offset indent
1210 # numRow numCol numSpare
1220 # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0
1227 A similar configuration file might be used for a RAID 0 set
1228 constructed from components on RAID 1 sets.
1229 In such a configuration, the mirroring provides a high degree
1230 of redundancy, while the striping provides additional speed benefits.
1231 .Ss Auto-configuration and Root on RAID
1232 RAID sets can also be auto-configured at boot.
1233 To make a set auto-configurable,
1234 simply prepare the RAID set as above, and then do a:
1235 .Bd -literal -offset indent
1236 raidctl -A yes raid0
1239 to turn on auto-configuration for that set.
1240 To turn off auto-configuration, use:
1241 .Bd -literal -offset indent
1245 RAID sets which are auto-configurable will be configured before the
1246 root file system is mounted.
1247 These RAID sets are thus available for
1248 use as a root file system, or for any other file system.
1249 A primary advantage of using the auto-configuration is that RAID components
1250 become more independent of the disks they reside on.
1251 For example, SCSI ID's can change, but auto-configured sets will always be
1252 configured correctly, even if the SCSI ID's of the component disks
1253 have become scrambled.
1255 Having a system's root file system
1257 on a RAID set is also allowed, with the
1259 partition of such a RAID set being used for
1261 To use raid0a as the root file system, simply use:
1262 .Bd -literal -offset indent
1263 raidctl -A root raid0
1266 To return raid0a to be just an auto-configuring set simply use the
1270 Note that kernels can only be directly read from RAID 1 components on
1271 architectures that support that
1272 .Pq currently alpha, i386, pmax, sparc, sparc64, and vax .
1273 On those architectures, the
1275 file system is recognized by the bootblocks, and will properly load the
1276 kernel directly from a RAID 1 component.
1277 For other architectures, or to support the root file system
1278 on other RAID sets, some other mechanism must be used to get a kernel booting.
1279 For example, a small partition containing only the secondary boot-blocks
1280 and an alternate kernel (or two) could be used.
1281 Once a kernel is booting however, and an auto-configuring RAID set is
1282 found that is eligible to be root, then that RAID set will be
1283 auto-configured and used as the root device.
1284 If two or more RAID sets claim to be root devices, then the
1285 user will be prompted to select the root device.
1286 At this time, RAID 0, 1, 4, and 5 sets are all supported as root devices.
1288 A typical RAID 1 setup with root on RAID might be as follows:
1291 wd0a - a small partition, which contains a complete, bootable, basic
1295 wd1a - also contains a complete, bootable, basic
1299 wd0e and wd1e - a RAID 1 set, raid0, used for the root file system.
1301 wd0f and wd1f - a RAID 1 set, raid1, which will be used only for
1304 wd0g and wd1g - a RAID 1 set, raid2, used for
1307 or other data, if desired.
1309 wd0h and wd1h - a RAID 1 set, raid3, if desired.
1312 RAID sets raid0, raid1, and raid2 are all marked as auto-configurable.
1313 raid0 is marked as being a root file system.
1314 When new kernels are installed, the kernel is not only copied to
1316 but also to wd0a and wd1a.
1317 The kernel on wd0a is required, since that
1318 is the kernel the system boots from.
1319 The kernel on wd1a is also
1320 required, since that will be the kernel used should wd0 fail.
1321 The important point here is to have redundant copies of the kernel
1322 available, in the event that one of the drives fail.
1324 There is no requirement that the root file system be on the same disk
1326 For example, obtaining the kernel from wd0a, and using
1327 sd0e and sd1e for raid0, and the root file system, is fine.
1330 critical, however, that there be multiple kernels available, in the
1331 event of media failure.
1333 Multi-layered RAID devices (such as a RAID 0 set made
1334 up of RAID 1 sets) are
1336 supported as root devices or auto-configurable devices at this point.
1337 (Multi-layered RAID devices
1339 supported in general, however, as mentioned earlier.)
1340 Note that in order to enable component auto-detection and
1341 auto-configuration of RAID devices, the line:
1342 .Bd -literal -offset indent
1343 options RAID_AUTOCONFIG
1346 must be in the kernel configuration file.
1350 .Ss Swapping on RAID
1351 A RAID device can be used as a swap device.
1352 In order to ensure that a RAID device used as a swap device
1353 is correctly unconfigured when the system is shutdown or rebooted,
1354 it is recommended that the line
1355 .Bd -literal -offset indent
1362 The final operation performed by
1367 This is accomplished via a simple:
1368 .Bd -literal -offset indent
1372 at which point the device is ready to be reconfigured.
1373 .Ss Performance Tuning
1374 Selection of the various parameter values which result in the best
1375 performance can be quite tricky, and often requires a bit of
1376 trial-and-error to get those values most appropriate for a given system.
1377 A whole range of factors come into play, including:
1380 Types of components (e.g., SCSI vs. IDE) and their bandwidth
1382 Types of controller cards and their bandwidth
1384 Distribution of components among controllers
1388 file system access patterns
1393 As with most performance tuning, benchmarking under real-life loads
1394 may be the only way to measure expected performance.
1395 Understanding some of the underlying technology is also useful in tuning.
1396 The goal of this section is to provide pointers to those parameters which may
1397 make significant differences in performance.
1399 For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient.
1400 Since data in a RAID 1 set is arranged in a linear
1401 fashion on each component, selecting an appropriate stripe size is
1402 somewhat less critical than it is for a RAID 5 set.
1403 However: a stripe size that is too small will cause large IO's to be
1404 broken up into a number of smaller ones, hurting performance.
1405 At the same time, a large stripe size may cause problems with
1406 concurrent accesses to stripes, which may also affect performance.
1407 Thus values in the range of 32 to 128 are often the most effective.
1409 Tuning RAID 5 sets is trickier.
1410 In the best case, IO is presented to the RAID set one stripe at a time.
1411 Since the entire stripe is available at the beginning of the IO,
1412 the parity of that stripe can be calculated before the stripe is written,
1413 and then the stripe data and parity can be written in parallel.
1414 When the amount of data being written is less than a full stripe worth, the
1419 means only a portion of the stripe on the components is going to
1420 change, the data (and parity) on the components must be updated
1421 slightly differently.
1426 must be read from the components.
1427 Then the new parity is constructed,
1428 using the new data to be written, and the old data and old parity.
1429 Finally, the new data and new parity are written.
1430 All this extra data shuffling results in a serious loss of performance,
1431 and is typically 2 to 4 times slower than a full stripe write (or read).
1432 To combat this problem in the real world, it may be useful
1433 to ensure that stripe sizes are small enough that a
1435 from the system will use exactly one large stripe write.
1436 As is seen later, there are some file system dependencies
1437 which may come into play here as well.
1441 is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may
1442 be desirable to select a SectPerSU value of 16 blocks (8K) or 32
1444 Since there are 4 data sectors per stripe, the maximum
1445 data per stripe is 64 blocks (32K) or 128 blocks (64K).
1446 Again, empirical measurement will provide the best indicators of which
1447 values will yield better performance.
1449 The parameters used for the file system are also critical to good performance.
1452 for example, increasing the block size to 32K or 64K may improve
1453 performance dramatically.
1454 As well, changing the cylinders-per-group
1455 parameter from 16 to 32 or higher is often not only necessary for
1456 larger file systems, but may also have positive performance implications.
1458 Despite the length of this man-page, configuring a RAID set is a
1459 relatively straight-forward process.
1460 All that needs to be done is the following steps:
1465 to create the components (of type RAID).
1467 Construct a RAID configuration file: e.g.,
1470 Configure the RAID set with:
1471 .Bd -literal -offset indent
1472 raidctl -C raid0.conf raid0
1476 Initialize the component labels with:
1477 .Bd -literal -offset indent
1478 raidctl -I 123456 raid0
1482 Initialize other important parts of the set with:
1483 .Bd -literal -offset indent
1488 Get the default label for the RAID set:
1489 .Bd -literal -offset indent
1490 disklabel raid0 \*[Gt] /tmp/label
1495 .Bd -literal -offset indent
1500 Put the new label on the RAID set:
1501 .Bd -literal -offset indent
1502 disklabel -R -r raid0 /tmp/label
1506 Create the file system:
1507 .Bd -literal -offset indent
1512 Mount the file system:
1513 .Bd -literal -offset indent
1514 mount /dev/raid0e /mnt
1519 .Bd -literal -offset indent
1520 raidctl -c raid0.conf raid0
1523 To re-configure the RAID set the next time it is needed, or put
1527 where it will automatically be started by the
1536 RAIDframe is a framework for rapid prototyping of RAID structures
1537 developed by the folks at the Parallel Data Laboratory at Carnegie
1538 Mellon University (CMU).
1539 A more complete description of the internals and functionality of
1540 RAIDframe is found in the paper "RAIDframe: A Rapid Prototyping Tool
1541 for RAID Systems", by William V. Courtright II, Garth Gibson, Mark
1542 Holland, LeAnn Neal Reilly, and Jim Zelenka, and published by the
1543 Parallel Data Laboratory of Carnegie Mellon University.
1547 command first appeared as a program in CMU's RAIDframe v1.1 distribution.
1550 is a complete re-write, and first appeared in
1554 The RAIDframe Copyright is as follows:
1556 Copyright (c) 1994-1996 Carnegie-Mellon University.
1557 All rights reserved.
1559 Permission to use, copy, modify and distribute this software and
1560 its documentation is hereby granted, provided that both the copyright
1561 notice and this permission notice appear in all copies of the
1562 software, derivative works or modified versions, and any portions
1563 thereof, and that both notices appear in supporting documentation.
1565 CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS"
1566 CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND
1567 FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE.
1569 Carnegie Mellon requests users of this software to return to
1571 Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU
1572 School of Computer Science
1573 Carnegie Mellon University
1574 Pittsburgh PA 15213-3890
1576 any improvements or extensions that they make and grant Carnegie the
1577 rights to redistribute these changes.
1580 Certain RAID levels (1, 4, 5, 6, and others) can protect against some
1581 data loss due to component failure.
1582 However the loss of two components of a RAID 4 or 5 system,
1583 or the loss of a single component of a RAID 0 system will
1584 result in the entire file system being lost.
1587 a substitute for good backup practices.
1589 Recomputation of parity
1591 be performed whenever there is a chance that it may have been compromised.
1592 This includes after system crashes, or before a RAID
1593 device has been used for the first time.
1594 Failure to keep parity correct will be catastrophic should a
1595 component ever fail \(em it is better to use RAID 0 and get the
1596 additional space and speed, than it is to use parity, but
1597 not keep the parity correct.
1598 At least with RAID 0 there is no perception of increased data security.
1600 Hot-spare removal is currently not available.