4 * Licensed to the Apache Software Foundation (ASF) under one
5 * or more contributor license agreements. See the NOTICE file
6 * distributed with this work for additional information
7 * regarding copyright ownership. The ASF licenses this file
8 * to you under the Apache License, Version 2.0 (the
9 * "License"); you may not use this file except in compliance
10 * with the License. You may obtain a copy of the License at
12 * http://www.apache.org/licenses/LICENSE-2.0
14 * Unless required by applicable law or agreed to in writing, software
15 * distributed under the License is distributed on an "AS IS" BASIS,
16 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17 * See the License for the specific language governing permissions and
18 * limitations under the License.
31 HBaseFsck (hbck) is a tool for checking for region consistency and table integrity problems and repairing a corrupted HBase.
32 It works in two basic modes -- a read-only inconsistency identifying mode and a multi-phase read-write repair mode.
34 === Running hbck to identify inconsistencies
36 To check to see if your HBase cluster has corruptions, run hbck against your HBase cluster:
44 At the end of the commands output it prints OK or tells you the number of INCONSISTENCIES present.
45 You may also want to run hbck a few times because some inconsistencies can be transient (e.g.
46 cluster is starting up or a region is splitting). Operationally you may want to run hbck regularly and setup alert (e.g.
47 via nagios) if it repeatedly reports inconsistencies . A run of hbck will report a list of inconsistencies along with a brief description of the regions and tables affected.
48 The using the `-details` option will report more details including a representative listing of all the splits present in all the tables.
53 $ ./bin/hbase hbck -details
56 If you just want to know if some tables are corrupted, you can limit hbck to identify inconsistencies in only specific tables.
57 For example the following command would only attempt to check table TableFoo and TableBar.
58 The benefit is that hbck will run in less time.
63 $ ./bin/hbase hbck TableFoo TableBar
68 If after several runs, inconsistencies continue to be reported, you may have encountered a corruption.
69 These should be rare, but in the event they occur newer versions of HBase include the hbck tool enabled with automatic repair options.
71 There are two invariants that when violated create inconsistencies in HBase:
73 * HBase's region consistency invariant is satisfied if every region is assigned and deployed on exactly one region server, and all places where this state kept is in accordance.
74 * HBase's table integrity invariant is satisfied if for each table, every possible row key resolves to exactly one region.
76 Repairs generally work in three phases -- a read-only information gathering phase that identifies inconsistencies, a table integrity repair phase that restores the table integrity invariant, and then finally a region consistency repair phase that restores the region consistency invariant.
77 Starting from version 0.90.0, hbck could detect region consistency problems report on a subset of possible table integrity problems.
78 It also included the ability to automatically fix the most common inconsistency, region assignment and deployment consistency problems.
79 This repair could be done by using the `-fix` command line option.
80 These problems close regions if they are open on the wrong server or on multiple region servers and also assigns regions to region servers if they are not open.
82 Starting from HBase versions 0.90.7, 0.92.2 and 0.94.0, several new command line options are introduced to aid repairing a corrupted HBase.
83 This hbck sometimes goes by the nickname ``uberhbck''. Each particular version of uber hbck is compatible with the HBase's of the same major version (0.90.7 uberhbck can repair a 0.90.4). However, versions <=0.90.6 and versions <=0.92.1 may require restarting the master or failing over to a backup master.
87 When repairing a corrupted HBase, it is best to repair the lowest risk inconsistencies first.
88 These are generally region consistency repairs -- localized single region repairs, that only modify in-memory data, ephemeral zookeeper data, or patch holes in the META table.
89 Region consistency requires that the HBase instance has the state of the region's data in HDFS (.regioninfo files), the region's row in the hbase:meta table., and region's deployment/assignments on region servers and the master in accordance.
90 Options for repairing region consistency include:
92 * `-fixAssignments` (equivalent to the 0.90 `-fix` option) repairs unassigned, incorrectly assigned or multiply assigned regions.
93 * `-fixMeta` which removes meta rows when corresponding regions are not present in HDFS and adds new meta rows if they regions are present in HDFS while not in META. To fix deployment and assignment problems you can run this command:
98 $ ./bin/hbase hbck -fixAssignments
101 To fix deployment and assignment problems as well as repairing incorrect meta rows you can run this command:
106 $ ./bin/hbase hbck -fixAssignments -fixMeta
109 There are a few classes of table integrity problems that are low risk repairs.
110 The first two are degenerate (startkey == endkey) regions and backwards regions (startkey > endkey). These are automatically handled by sidelining the data to a temporary directory (/hbck/xxxx). The third low-risk class is hdfs region holes.
111 This can be repaired by using the:
113 * `-fixHdfsHoles` option for fabricating new empty regions on the file system.
114 If holes are detected you can use -fixHdfsHoles and should include -fixMeta and -fixAssignments to make the new region consistent.
119 $ ./bin/hbase hbck -fixAssignments -fixMeta -fixHdfsHoles
122 Since this is a common operation, we've added a the `-repairHoles` flag that is equivalent to the previous command:
127 $ ./bin/hbase hbck -repairHoles
130 If inconsistencies still remain after these steps, you most likely have table integrity problems related to orphaned or overlapping regions.
132 === Region Overlap Repairs
134 Table integrity problems can require repairs that deal with overlaps.
135 This is a riskier operation because it requires modifications to the file system, requires some decision making, and may require some manual steps.
136 For these repairs it is best to analyze the output of a `hbck -details` run so that you isolate repairs attempts only upon problems the checks identify.
137 Because this is riskier, there are safeguard that should be used to limit the scope of the repairs.
138 WARNING: This is a relatively new and have only been tested on online but idle HBase instances (no reads/writes). Use at your own risk in an active production environment! The options for repairing table integrity violations include:
140 * `-fixHdfsOrphans` option for ``adopting'' a region directory that is missing a region metadata file (the .regioninfo file).
141 * `-fixHdfsOverlaps` ability for fixing overlapping regions
143 When repairing overlapping regions, a region's data can be modified on the file system in two ways: 1) by merging regions into a larger region or 2) by sidelining regions by moving data to ``sideline'' directory where data could be restored later.
144 Merging a large number of regions is technically correct but could result in an extremely large region that requires series of costly compactions and splitting operations.
145 In these cases, it is probably better to sideline the regions that overlap with the most other regions (likely the largest ranges) so that merges can happen on a more reasonable scale.
146 Since these sidelined regions are already laid out in HBase's native directory and HFile format, they can be restored by using HBase's bulk load mechanism.
147 The default safeguard thresholds are conservative.
148 These options let you override the default thresholds and to enable the large region sidelining feature.
150 * `-maxMerge <n>` maximum number of overlapping regions to merge
151 * `-sidelineBigOverlaps` if more than maxMerge regions are overlapping, sideline attempt to sideline the regions overlapping with the most other regions.
152 * `-maxOverlapsToSideline <n>` if sidelining large overlapping regions, sideline at most n regions.
154 Since often times you would just want to get the tables repaired, you can use this option to turn on all repair options:
156 * `-repair` includes all the region consistency options and only the hole repairing table integrity options.
158 Finally, there are safeguards to limit repairs to only specific tables.
159 For example the following command would only attempt to check and repair table TableFoo and TableBar.
163 $ ./bin/hbase hbck -repair TableFoo TableBar
166 ==== Special cases: Meta is not properly assigned
168 There are a few special cases that hbck can handle as well.
169 Sometimes the meta table's only region is inconsistently assigned or deployed.
170 In this case there is a special `-fixMetaOnly` option that can try to fix meta assignments.
174 $ ./bin/hbase hbck -fixMetaOnly -fixAssignments
177 ==== Special cases: HBase version file is missing
179 HBase's data on the file system requires a version file in order to start.
180 If this file is missing, you can use the `-fixVersionFile` option to fabricating a new HBase version file.
181 This assumes that the version of hbck you are running is the appropriate version for the HBase cluster.
183 ==== Special case: Root and META are corrupt.
185 The most drastic corruption scenario is the case where the ROOT or META is corrupted and HBase will not start.
186 In this case you can use the OfflineMetaRepair tool create new ROOT and META regions and tables.
187 This tool assumes that HBase is offline.
188 It then marches through the existing HBase home directory, loads as much information from region metadata files (.regioninfo files) as possible from the file system.
189 If the region metadata has proper table integrity, it sidelines the original root and meta table directories, and builds new ones with pointers to the region directories and their data.
193 $ ./bin/hbase org.apache.hadoop.hbase.util.hbck.OfflineMetaRepair
196 NOTE: This tool is not as clever as uberhbck but can be used to bootstrap repairs that uberhbck can complete.
197 If the tool succeeds you should be able to start hbase and run online repairs if necessary.
199 ==== Special cases: Offline split parent
201 Once a region is split, the offline parent will be cleaned up automatically.
202 Sometimes, daughter regions are split again before their parents are cleaned up.
203 HBase can clean up parents in the right order.
204 However, there could be some lingering offline split parents sometimes.
205 They are in META, in HDFS, and not deployed.
206 But HBase can't clean them up.
207 In this case, you can use the `-fixSplitParents` option to reset them in META to be online and not split.
208 Therefore, hbck can merge them with other regions if fixing overlapping regions option is used.
210 This option should not normally be used, and it is not in `-fixAll`.