src/main/asciidoc/_chapters/backup_restore.adoc

   1 ////
   2 /**
   3  *
   4  * Licensed to the Apache Software Foundation (ASF) under one
   5  * or more contributor license agreements.  See the NOTICE file
   6  * distributed with this work for additional information
   7  * regarding copyright ownership.  The ASF licenses this file
   8  * to you under the Apache License, Version 2.0 (the
   9  * "License"); you may not use this file except in compliance
  10  * with the License.  You may obtain a copy of the License at
  11  *
  12  *     http://www.apache.org/licenses/LICENSE-2.0
  13  *
  14  * Unless required by applicable law or agreed to in writing, software
  15  * distributed under the License is distributed on an "AS IS" BASIS,
  16  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  17  * See the License for the specific language governing permissions and
  18  * limitations under the License.
  19  */
  20 ////
  21
  22 [[backuprestore]]
  23 = Backup and Restore
  24 :doctype: book
  25 :numbered:
  26 :toc: left
  27 :icons: font
  28 :experimental:
  29
  30 [[br.overview]]
  31 == Overview
  32
  33 Backup and restore is a standard operation provided by many databases. An effective backup and restore
  34 strategy helps ensure that users can recover data in case of unexpected failures. The HBase backup and restore
  35 feature helps ensure that enterprises using HBase as a canonical data repository can recover from catastrophic
  36 failures. Another important feature is the ability to restore the database to a particular
  37 point-in-time, commonly referred to as a snapshot.
  38
  39 The HBase backup and restore feature provides the ability to create full backups and incremental backups on
  40 tables in an HBase cluster. The full backup is the foundation on which incremental backups are applied
  41 to build iterative snapshots. Incremental backups can be run on a schedule to capture changes over time,
  42 for example by using a Cron task. Incremental backups are more cost-effective than full backups because they only capture
  43 the changes since the last backup and they also enable administrators to restore the database to any prior incremental backup. Furthermore, the
  44 utilities also enable table-level data backup-and-recovery if you do not want to restore the entire dataset
  45 of the backup.
  46
  47 The backup and restore feature supplements the HBase Replication feature. While HBase replication is ideal for
  48 creating "hot" copies of the data (where the replicated data is immediately available for query), the backup and
  49 restore feature is ideal for creating "cold" copies of data (where a manual step must be taken to restore the system).
  50 Previously, users only had the ability to create full backups via the ExportSnapshot functionality. The incremental
  51 backup implementation is the novel improvement over the previous "art" provided by ExportSnapshot.
  52
  53 The backup and restore feature uses DistCp to transfer files between clusters .
  54 link:https://issues.apache.org/jira/browse/HADOOP-15850[HADOOP-15850] fixes a bug where CopyCommitter#concatFileChunks
  55 unconditionally tried to concatenate the files being DistCp'ed to target cluster (though the files are
  56 independent) . Without the fix from
  57 link:https://issues.apache.org/jira/browse/HADOOP-15850[HADOOP-15850] , the transfer would fail.
  58 So the backup and restore feature need hadoop version as below
  59
  60 * 2.7.x
  61 * 2.8.x
  62 * 2.9.2+
  63 * 2.10.0+
  64 * 3.0.4+
  65 * 3.1.2+
  66 * 3.2.0+
  67 * 3.3.0+
  68
  69
  70 [[br.terminology]]
  71 == Terminology
  72
  73 The backup and restore feature introduces new terminology which can be used to understand how control flows through the
  74 system.
  75
  76 * _A backup_: A logical unit of data and metadata which can restore a table to its state at a specific point in time.
  77 * _Full backup_: a type of backup which wholly encapsulates the contents of the table at a point in time.
  78 * _Incremental backup_: a type of backup which contains the changes in a table since a full backup.
  79 * _Backup set_: A user-defined name which references one or more tables over which a backup can be executed.
  80 * _Backup ID_: A unique names which identifies one backup from the rest, e.g. `backupId_1467823988425`
  81
  82 [[br.planning]]
  83 == Planning
  84
  85 There are some common strategies which can be used to implement backup and restore in your environment. The following section
  86 shows how these strategies are implemented and identifies potential tradeoffs with each.
  87
  88 WARNING: This backup and restore tools has not been tested on Transparent Data Encryption (TDE) enabled HDFS clusters.
  89 This is related to the open issue link:https://issues.apache.org/jira/browse/HBASE-16178[HBASE-16178].
  90
  91 [[br.intracluster.backup]]
  92 === Backup within a cluster
  93
  94 This strategy stores the backups on the same cluster as where the backup was taken. This approach is only appropriate for testing
  95 as it does not provide any additional safety on top of what the software itself already provides.
  96
  97 .Intra-Cluster Backup
  98 image::backup-intra-cluster.png[]
  99
 100 [[br.dedicated.cluster.backup]]
 101 === Backup using a dedicated cluster
 102
 103 This strategy provides greater fault tolerance and provides a path towards disaster recovery. In this setting, you will
 104 store the backup on a separate HDFS cluster by supplying the backup destination cluster’s HDFS URL to the backup utility.
 105 You should consider backing up to a different physical location, such as a different data center.
 106
 107 Typically, a backup-dedicated HDFS cluster uses a more economical hardware profile to save money.
 108
 109 .Dedicated HDFS Cluster Backup
 110 image::backup-dedicated-cluster.png[]
 111
 112 [[br.cloud.or.vendor.backup]]
 113 === Backup to the Cloud or a storage vendor appliance
 114
 115 Another approach to safeguarding HBase incremental backups is to store the data on provisioned, secure servers that belong
 116 to third-party vendors and that are located off-site. The vendor can be a public cloud provider or a storage vendor who uses
 117 a Hadoop-compatible file system, such as S3 and other HDFS-compatible destinations.
 118
 119 .Backup to Cloud or Vendor Storage Solutions
 120 image::backup-cloud-appliance.png[]
 121
 122 NOTE: The HBase backup utility does not support backup to multiple destinations. A workaround is to manually create copies
 123 of the backup files from HDFS or S3.
 124
 125 [[br.initial.setup]]
 126 == First-time configuration steps
 127
 128 This section contains the necessary configuration changes that must be made in order to use the backup and restore feature.
 129 As this feature makes significant use of YARN's MapReduce framework to parallelize these I/O heavy operations, configuration
 130 changes extend outside of just `hbase-site.xml`.
 131
 132 === Allow the "hbase" system user in YARN
 133
 134 The YARN *container-executor.cfg* configuration file must have the following property setting: _allowed.system.users=hbase_. No spaces
 135 are allowed in entries of this configuration file.
 136
 137 WARNING: Skipping this step will result in runtime errors when executing the first backup tasks.
 138
 139 *Example of a valid container-executor.cfg file for backup and restore:*
 140
 141 [source]
 142 ----
 143 yarn.nodemanager.log-dirs=/var/log/hadoop/mapred
 144 yarn.nodemanager.linux-container-executor.group=yarn
 145 banned.users=hdfs,yarn,mapred,bin
 146 allowed.system.users=hbase
 147 min.user.id=500
 148 ----
 149
 150 === HBase specific changes
 151
 152 Add the following properties to hbase-site.xml and restart HBase if it is already running.
 153
 154 NOTE: The ",..." is an ellipsis meant to imply that this is a comma-separated list of values, not literal text which should be added to hbase-site.xml.
 155
 156 [source]
 157 ----
 158 <property>
 159   <name>hbase.backup.enable</name>
 160   <value>true</value>
 161 </property>
 162 <property>
 163   <name>hbase.master.logcleaner.plugins</name>
 164   <value>org.apache.hadoop.hbase.backup.master.BackupLogCleaner,...</value>
 165 </property>
 166 <property>
 167   <name>hbase.procedure.master.classes</name>
 168   <value>org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager,...</value>
 169 </property>
 170 <property>
 171   <name>hbase.procedure.regionserver.classes</name>
 172   <value>org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager,...</value>
 173 </property>
 174 <property>
 175   <name>hbase.coprocessor.region.classes</name>
 176   <value>org.apache.hadoop.hbase.backup.BackupObserver,...</value>
 177 </property>
 178 <property>
 179   <name>hbase.master.hfilecleaner.plugins</name>
 180   <value>org.apache.hadoop.hbase.backup.BackupHFileCleaner,...</value>
 181 </property>
 182 ----
 183
 184 == Backup and Restore commands
 185
 186 This covers the command-line utilities that administrators would run to create, restore, and merge backups. Tools to
 187 inspect details on specific backup sessions is covered in the next section, <<br.administration,Administration of Backup Images>>.
 188
 189 Run the command `hbase backup help <command>` to access the online help that provides basic information about a command
 190 and its options. The below information is captured in this help message for each command.
 191
 192 // hbase backup create
 193
 194 [[br.creating.complete.backup]]
 195 === Creating a Backup Image
 196
 197 [NOTE]
 198 ====
 199 For HBase clusters also using Apache Phoenix: include the SQL system catalog tables in the backup. In the event that you
 200 need to restore the HBase backup, access to the system catalog tables enable you to resume Phoenix interoperability with the
 201 restored data.
 202 ====
 203
 204 The first step in running the backup and restore utilities is to perform a full backup and to store the data in a separate image
 205 from the source. At a minimum, you must do this to get a baseline before you can rely on incremental backups.
 206
 207 Run the following command as HBase superuser:
 208
 209 [source]
 210 ----
 211 hbase backup create <type> <backup_path>
 212 ----
 213
 214 After the command finishes running, the console prints a SUCCESS or FAILURE status message. The SUCCESS message includes a _backup_ ID.
 215 The backup ID is the Unix time (also known as Epoch time) that the HBase master received the backup request from the client.
 216
 217 [TIP]
 218 ====
 219 Record the backup ID that appears at the end of a successful backup. In case the source cluster fails and you need to recover the
 220 dataset with a restore operation, having the backup ID readily available can save time.
 221 ====
 222
 223 [[br.create.positional.cli.arguments]]
 224 ==== Positional Command-Line Arguments
 225
 226 _type_::
 227   The type of backup to execute: _full_ or _incremental_. As a reminder, an _incremental_ backup requires a _full_ backup to
 228   already exist.
 229
 230 _backup_path_::
 231   The _backup_path_ argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are
 232   _hdfs:_, _webhdfs:_, _s3a:_ or other compatible Hadoop File System implementations.
 233
 234 [[br.create.named.cli.arguments]]
 235 ==== Named Command-Line Arguments
 236
 237 _-t <table_name[,table_name]>_::
 238   A comma-separated list of tables to back up. If no tables are specified, all tables are backed up. No regular-expression or
 239   wildcard support is present; all table names must be explicitly listed. See <<br.using.backup.sets,Backup Sets>> for more
 240   information about peforming operations on collections of tables. Mutually exclusive with the _-s_ option; one of these
 241   named options are required.
 242
 243 _-s <backup_set_name>_::
 244   Identify tables to backup based on a backup set. See <<br.using.backup.sets,Using Backup Sets>> for the purpose and usage
 245   of backup sets. Mutually exclusive with the _-t_ option.
 246
 247 _-w <number_workers>_::
 248   (Optional) Specifies the number of parallel workers to copy data to backup destination. Backups are currently executed by MapReduce jobs
 249   so this value corresponds to the number of Mappers that will be spawned by the job.
 250
 251 _-b <bandwidth_per_worker>_::
 252   (Optional) Specifies the bandwidth of each worker in MB per second.
 253
 254 _-d_::
 255   (Optional) Enables "DEBUG" mode which prints additional logging about the backup creation.
 256
 257 _-q <name>_::
 258   (Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option
 259   is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance.
 260
 261 [[br.usage.examples]]
 262 ==== Example usage
 263
 264 [source]
 265 ----
 266 $ hbase backup create full hdfs://host5:8020/data/backup -t SALES2,SALES3 -w 3
 267 ----
 268
 269 This command creates a full backup image of two tables, SALES2 and SALES3, in the HDFS instance who NameNode is host5:8020
 270 in the path _/data/backup_. The _-w_ option specifies that no more than three parallel works complete the operation.
 271
 272 // hbase backup restore
 273
 274 [[br.restoring.backup]]
 275 === Restoring a Backup Image
 276
 277 Run the following command as an HBase superuser. You can only restore a backup on a running HBase cluster because the data must be
 278 redistributed the RegionServers for the operation to complete successfully.
 279
 280 [source]
 281 ----
 282 hbase restore <backup_path> <backup_id>
 283 ----
 284
 285 [[br.restore.positional.args]]
 286 ==== Positional Command-Line Arguments
 287
 288 _backup_path_::
 289   The _backup_path_ argument specifies the full filesystem URI of where to store the backup image. Valid prefixes are
 290   _hdfs:_, _webhdfs:_, _s3a:_ or other compatible Hadoop File System implementations.
 291
 292 _backup_id_::
 293   The backup ID that uniquely identifies the backup image to be restored.
 294
 295
 296 [[br.restore.named.args]]
 297 ==== Named Command-Line Arguments
 298
 299 _-t <table_name[,table_name]>_::
 300   A comma-separated list of tables to restore. See <<br.using.backup.sets,Backup Sets>> for more
 301   information about peforming operations on collections of tables. Mutually exclusive with the _-s_ option; one of these
 302   named options are required.
 303
 304 _-s <backup_set_name>_::
 305   Identify tables to backup based on a backup set. See <<br.using.backup.sets,Using Backup Sets>> for the purpose and usage
 306   of backup sets. Mutually exclusive with the _-t_ option.
 307
 308 _-q <name>_::
 309   (Optional) Allows specification of the name of a YARN queue which the MapReduce job to create the backup should be executed in. This option
 310   is useful to prevent backup tasks from stealing resources away from other MapReduce jobs of high importance.
 311
 312 _-c_::
 313   (Optional) Perform a dry-run of the restore. The actions are checked, but not executed.
 314
 315 _-m <target_tables>_::
 316   (Optional) A comma-separated list of tables to restore into. If this option is not provided, the original table name is used. When
 317   this option is provided, there must be an equal number of entries provided in the `-t` option.
 318
 319 _-o_::
 320   (Optional) Overwrites the target table for the restore if the table already exists.
 321
 322
 323 [[br.restore.usage]]
 324 ==== Example of Usage
 325
 326 [source]
 327 ----
 328 hbase restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2
 329 ----
 330
 331 This command restores two tables of an incremental backup image. In this example:
 332 • `/tmp/backup_incremental` is the path to the directory containing the backup image.
 333 • `backupId_1467823988425` is the backup ID.
 334 • `mytable1` and `mytable2` are the names of tables in the backup image to be restored.
 335
 336 // hbase backup merge
 337
 338 [[br.merge.backup]]
 339 === Merging Incremental Backup Images
 340
 341 This command can be used to merge two or more incremental backup images into a single incremental
 342 backup image. This can be used to consolidate multiple, small incremental backup images into a single
 343 larger incremental backup image. This command could be used to merge hourly incremental backups
 344 into a daily incremental backup image, or daily incremental backups into a weekly incremental backup.
 345
 346 [source]
 347 ----
 348 $ hbase backup merge <backup_ids>
 349 ----
 350
 351 [[br.merge.backup.positional.cli.arguments]]
 352 ==== Positional Command-Line Arguments
 353
 354 _backup_ids_::
 355   A comma-separated list of incremental backup image IDs that are to be combined into a single image.
 356
 357 [[br.merge.backup.named.cli.arguments]]
 358 ==== Named Command-Line Arguments
 359
 360 None.
 361
 362 [[br.merge.backup.example]]
 363 ==== Example usage
 364
 365 [source]
 366 ----
 367 $ hbase backup merge backupId_1467823988425,backupId_1467827588425
 368 ----
 369
 370 // hbase backup set
 371
 372 [[br.using.backup.sets]]
 373 === Using Backup Sets
 374
 375 Backup sets can ease the administration of HBase data backups and restores by reducing the amount of repetitive input
 376 of table names. You can group tables into a named backup set with the `hbase backup set add` command. You can then use
 377 the `-set` option to invoke the name of a backup set in the `hbase backup create` or `hbase restore` rather than list
 378 individually every table in the group. You can have multiple backup sets.
 379
 380 NOTE: Note the differentiation between the `hbase backup set add` command and the _-set_ option. The `hbase backup set add`
 381 command must be run before using the `-set` option in a different command because backup sets must be named and defined
 382 before using backup sets as a shortcut.
 383
 384 If you run the `hbase backup set add` command and specify a backup set name that does not yet exist on your system, a new set
 385 is created. If you run the command with the name of an existing backup set name, then the tables that you specify are added
 386 to the set.
 387
 388 In this command, the backup set name is case-sensitive.
 389
 390 NOTE: The metadata of backup sets are stored within HBase. If you do not have access to the original HBase cluster with the
 391 backup set metadata, then you must specify individual table names to restore the data.
 392
 393 To create a backup set, run the following command as the HBase superuser:
 394
 395 [source]
 396 ----
 397 $ hbase backup set <subcommand> <backup_set_name> <tables>
 398 ----
 399
 400 [[br.set.subcommands]]
 401 ==== Backup Set Subcommands
 402
 403 The following list details subcommands of the hbase backup set command.
 404
 405 NOTE: You must enter one (and no more than one) of the following subcommands after hbase backup set to complete an operation.
 406 Also, the backup set name is case-sensitive in the command-line utility.
 407
 408 _add_::
 409   Adds table[s] to a backup set. Specify a _backup_set_name_ value after this argument to create a backup set.
 410
 411 _remove_::
 412   Removes tables from the set. Specify the tables to remove in the tables argument.
 413
 414 _list_::
 415   Lists all backup sets.
 416
 417 _describe_::
 418   Displays a description of a backup set. The information includes whether the set has full
 419   or incremental backups, start and end times of the backups, and a list of the tables in the set. This subcommand must precede
 420   a valid value for the _backup_set_name_ value.
 421
 422 _delete_::
 423   Deletes a backup set. Enter the value for the _backup_set_name_ option directly after the `hbase backup set delete` command.
 424
 425 [[br.set.positional.cli.arguments]]
 426 ==== Positional Command-Line Arguments
 427
 428 _backup_set_name_::
 429   Use to assign or invoke a backup set name. The backup set name must contain only printable characters and cannot have any spaces.
 430
 431 _tables_::
 432   List of tables (or a single table) to include in the backup set. Enter the table names as a comma-separated list. If no tables
 433   are specified, all tables are included in the set.
 434
 435 TIP: Maintain a log or other record of the case-sensitive backup set names and the corresponding tables in each set on a separate
 436 or remote cluster, backup strategy. This information can help you in case of failure on the primary cluster.
 437
 438 [[br.set.usage]]
 439 ==== Example of Usage
 440
 441 [source]
 442 ----
 443 $ hbase backup set add Q1Data TEAM3,TEAM_4
 444 ----
 445
 446 Depending on the environment, this command results in _one_ of the following actions:
 447
 448 * If the `Q1Data` backup set does not exist, a backup set containing tables `TEAM_3` and `TEAM_4` is created.
 449 * If the `Q1Data` backup set exists already, the tables `TEAM_3` and `TEAM_4` are added to the `Q1Data` backup set.
 450
 451 [[br.administration]]
 452 == Administration of Backup Images
 453
 454 The `hbase backup` command has several subcommands that help with administering backup images as they accumulate. Most production
 455 environments require recurring backups, so it is necessary to have utilities to help manage the data of the backup repository.
 456 Some subcommands enable you to find information that can help identify backups that are relevant in a search for particular data.
 457 You can also delete backup images.
 458
 459 The following list details each `hbase backup subcommand` that can help administer backups. Run the full command-subcommand line as
 460 the HBase superuser.
 461
 462 // hbase backup progress
 463
 464 [[br.managing.backup.progress]]
 465 === Managing Backup Progress
 466
 467 You can monitor a running backup in another terminal session by running the _hbase backup progress_ command and specifying the backup ID as an argument.
 468
 469 For example, run the following command as hbase superuser to view the progress of a backup
 470
 471 [source]
 472 ----
 473 $ hbase backup progress <backup_id>
 474 ----
 475
 476 [[br.progress.positional.cli.arguments]]
 477 ==== Positional Command-Line Arguments
 478
 479 _backup_id_::
 480   Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive.
 481
 482 [[br.progress.named.cli.arguments]]
 483 ==== Named Command-Line Arguments
 484
 485 None.
 486
 487 [[br.progress.example]]
 488 ==== Example usage
 489
 490 [source]
 491 ----
 492 hbase backup progress backupId_1467823988425
 493 ----
 494
 495 // hbase backup history
 496
 497 [[br.managing.backup.history]]
 498 === Managing Backup History
 499
 500 This command displays a log of backup sessions. The information for each session includes backup ID, type (full or incremental), the tables
 501 in the backup, status, and start and end time. Specify the number of backup sessions to display with the optional -n argument.
 502
 503 [source]
 504 ----
 505 $ hbase backup history <backup_id>
 506 ----
 507
 508 [[br.history.positional.cli.arguments]]
 509 ==== Positional Command-Line Arguments
 510
 511 _backup_id_::
 512   Specifies the backup that you want to monitor by seeing the progress information. The backupId is case-sensitive.
 513
 514 [[br.history.named.cli.arguments]]
 515 ==== Named Command-Line Arguments
 516
 517 _-n <num_records>_::
 518   (Optional) The maximum number of backup records (Default: 10).
 519
 520 _-p <backup_root_path>_::
 521   The full filesystem URI of where backup images are stored.
 522
 523 _-s <backup_set_name>_::
 524   The name of the backup set to obtain history for. Mutually exclusive with the _-t_ option.
 525
 526 _-t_ <table_name>::
 527   The name of table to obtain history for. Mutually exclusive with the _-s_ option.
 528
 529 [[br.history.backup.example]]
 530 ==== Example usage
 531
 532 [source]
 533 ----
 534 $ hbase backup history
 535 $ hbase backup history -n 20
 536 $ hbase backup history -t WebIndexRecords
 537 ----
 538
 539 // hbase backup describe
 540
 541 [[br.describe.backup]]
 542 === Describing a Backup Image
 543
 544 This command can be used to obtain information about a specific backup image.
 545
 546 [source]
 547 ----
 548 $ hbase backup describe <backup_id>
 549 ----
 550
 551 [[br.describe.backup.positional.cli.arguments]]
 552 ==== Positional Command-Line Arguments
 553
 554 _backup_id_::
 555   The ID of the backup image to describe.
 556
 557 [[br.describe.backup.named.cli.arguments]]
 558 ==== Named Command-Line Arguments
 559
 560 None.
 561
 562 [[br.describe.backup.example]]
 563 ==== Example usage
 564
 565 [source]
 566 ----
 567 $ hbase backup describe backupId_1467823988425
 568 ----
 569
 570 // hbase backup delete
 571
 572 [[br.delete.backup]]
 573 === Deleting a Backup Image
 574
 575 This command can be used to delete a backup image which is no longer needed.
 576
 577 [source]
 578 ----
 579 $ hbase backup delete <backup_id>
 580 ----
 581
 582 [[br.delete.backup.positional.cli.arguments]]
 583 ==== Positional Command-Line Arguments
 584
 585 _backup_id_::
 586   The ID to the backup image which should be deleted.
 587
 588 [[br.delete.backup.named.cli.arguments]]
 589 ==== Named Command-Line Arguments
 590
 591 None.
 592
 593 [[br.delete.backup.example]]
 594 ==== Example usage
 595
 596 [source]
 597 ----
 598 $ hbase backup delete backupId_1467823988425
 599 ----
 600
 601 // hbase backup repair
 602
 603 [[br.repair.backup]]
 604 === Backup Repair Command
 605
 606 This command attempts to correct any inconsistencies in persisted backup metadata which exists as
 607 the result of software errors or unhandled failure scenarios. While the backup implementation tries
 608 to correct all errors on its own, this tool may be necessary in the cases where the system cannot
 609 automatically recover on its own.
 610
 611 [source]
 612 ----
 613 $ hbase backup repair
 614 ----
 615
 616 [[br.repair.backup.positional.cli.arguments]]
 617 ==== Positional Command-Line Arguments
 618
 619 None.
 620
 621 [[br.repair.backup.named.cli.arguments]]
 622 === Named Command-Line Arguments
 623
 624 None.
 625
 626 [[br.repair.backup.example]]
 627 ==== Example usage
 628
 629 [source]
 630 ----
 631 $ hbase backup repair
 632 ----
 633
 634 [[br.backup.configuration]]
 635 == Configuration keys
 636
 637 The backup and restore feature includes both required and optional configuration keys.
 638
 639 === Required properties
 640
 641 _hbase.backup.enable_: Controls whether or not the feature is enabled (Default: `false`). Set this value to `true`.
 642
 643 _hbase.master.logcleaner.plugins_: A comma-separated list of classes invoked when cleaning logs in the HBase Master. Set
 644 this value to `org.apache.hadoop.hbase.backup.master.BackupLogCleaner` or append it to the current value.
 645
 646 _hbase.procedure.master.classes_: A comma-separated list of classes invoked with the Procedure framework in the Master. Set
 647 this value to `org.apache.hadoop.hbase.backup.master.LogRollMasterProcedureManager` or append it to the current value.
 648
 649 _hbase.procedure.regionserver.classes_: A comma-separated list of classes invoked with the Procedure framework in the RegionServer.
 650 Set this value to `org.apache.hadoop.hbase.backup.regionserver.LogRollRegionServerProcedureManager` or append it to the current value.
 651
 652 _hbase.coprocessor.region.classes_: A comma-separated list of RegionObservers deployed on tables. Set this value to
 653 `org.apache.hadoop.hbase.backup.BackupObserver` or append it to the current value.
 654
 655 _hbase.master.hfilecleaner.plugins_: A comma-separated list of HFileCleaners deployed on the Master. Set this value
 656 to `org.apache.hadoop.hbase.backup.BackupHFileCleaner` or append it to the current value.
 657
 658 === Optional properties
 659
 660 _hbase.backup.system.ttl_: The time-to-live in seconds of data in the `hbase:backup` tables (default: forever). This property
 661 is only relevant prior to the creation of the `hbase:backup` table. Use the `alter` command in the HBase shell to modify the TTL
 662 when this table already exists. See the <<br.filesystem.growth.warning,below section>> for more details on the impact of this
 663 configuration property.
 664
 665 _hbase.backup.attempts.max_: The number of attempts to perform when taking hbase table snapshots (default: 10).
 666
 667 _hbase.backup.attempts.pause.ms_: The amount of time to wait between failed snapshot attempts in milliseconds (default: 10000).
 668
 669 _hbase.backup.logroll.timeout.millis_: The amount of time (in milliseconds) to wait for RegionServers to execute a WAL rolling
 670 in the Master's procedure framework (default: 30000).
 671
 672 [[br.best.practices]]
 673 == Best Practices
 674
 675 === Formulate a restore strategy and test it.
 676
 677 Before you rely on a backup and restore strategy for your production environment, identify how backups must be performed,
 678 and more importantly, how restores must be performed. Test the plan to ensure that it is workable.
 679 At a minimum, store backup data from a production cluster on a different cluster or server. To further safeguard the data,
 680 use a backup location that is at a different physical location.
 681
 682 If you have a unrecoverable loss of data on your primary production cluster as a result of computer system issues, you may
 683 be able to restore the data from a different cluster or server at the same site. However, a disaster that destroys the whole
 684 site renders locally stored backups useless. Consider storing the backup data and necessary resources (both computing capacity
 685 and operator expertise) to restore the data at a site sufficiently remote from the production site. In the case of a catastrophe
 686 at the whole primary site (fire, earthquake, etc.), the remote backup site can be very valuable.
 687
 688 === Secure a full backup image first.
 689
 690 As a baseline, you must complete a full backup of HBase data at least once before you can rely on incremental backups. The full
 691 backup should be stored outside of the source cluster. To ensure complete dataset recovery, you must run the restore utility
 692 with the option to restore baseline full backup. The full backup is the foundation of your dataset. Incremental backup data
 693 is applied on top of the full backup during the restore operation to return you to the point in time when backup was last taken.
 694
 695 === Define and use backup sets for groups of tables that are logical subsets of the entire dataset.
 696
 697 You can group tables into an object called a backup set. A backup set can save time when you have a particular group of tables
 698 that you expect to repeatedly back up or restore.
 699
 700 When you create a backup set, you type table names to include in the group. The backup set includes not only groups of related
 701 tables, but also retains the HBase backup metadata. Afterwards, you can invoke the backup set name to indicate what tables apply
 702 to the command execution instead of entering all the table names individually.
 703
 704 === Document the backup and restore strategy, and ideally log information about each backup.
 705
 706 Document the whole process so that the knowledge base can transfer to new administrators after employee turnover. As an extra
 707 safety precaution, also log the calendar date, time, and other relevant details about the data of each backup. This metadata
 708 can potentially help locate a particular dataset in case of source cluster failure or primary site disaster. Maintain duplicate
 709 copies of all documentation: one copy at the production cluster site and another at the backup location or wherever it can be
 710 accessed by an administrator remotely from the production cluster.
 711
 712 [[br.s3.backup.scenario]]
 713 == Scenario: Safeguarding Application Datasets on Amazon S3
 714
 715 This scenario describes how a hypothetical retail business uses backups to safeguard application data and then restore the dataset
 716 after failure.
 717
 718 The HBase administration team uses backup sets to store data from a group of tables that have interrelated information for an
 719 application called green. In this example, one table contains transaction records and the other contains customer details. The
 720 two tables need to be backed up and be recoverable as a group.
 721
 722 The admin team also wants to ensure daily backups occur automatically.
 723
 724 .Tables Composing The Backup Set
 725 image::backup-app-components.png[]
 726
 727 The following is an outline of the steps and examples of commands that are used to backup the data for the _green_ application and
 728 to recover the data later. All commands are run when logged in as HBase superuser.
 729
 730 * A backup set called _green_set_ is created as an alias for both the transactions table and the customer table. The backup set can
 731 be used for all operations to avoid typing each table name. The backup set name is case-sensitive and should be formed with only
 732 printable characters and without spaces.
 733
 734  $ hbase backup set add green_set transactions
 735  $ hbase backup set add green_set customer
 736
 737 * The first backup of green_set data must be a full backup. The following command example shows how credentials are passed to Amazon
 738 S3 and specifies the file system with the s3a: prefix.
 739
 740  $ ACCESS_KEY=ABCDEFGHIJKLMNOPQRST
 741  $ SECRET_KEY=123456789abcdefghijklmnopqrstuvwxyzABCD
 742  $ sudo -u hbase hbase backup create full\
 743    s3a://$ACCESS_KEY:SECRET_KEY@prodhbasebackups/backups -s green_set
 744
 745 * Incremental backups should be run according to a schedule that ensures essential data recovery in the event of a catastrophe. At
 746 this retail company, the HBase admin team decides that automated daily backups secures the data sufficiently. The team decides that
 747 they can implement this by modifying an existing Cron job that is defined in `/etc/crontab`. Consequently, IT modifies the Cron job
 748 by adding the following line:
 749
 750  @daily hbase hbase backup create incremental s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups -s green_set
 751
 752 * A catastrophic IT incident disables the production cluster that the green application uses. An HBase system administrator of the
 753 backup cluster must restore the _green_set_ dataset to the point in time closest to the recovery objective.
 754 +
 755 NOTE: If the administrator of the backup HBase cluster has the backup ID with relevant details in accessible records, the following
 756 search with the `hdfs dfs -ls` command and manually scanning the backup ID list can be bypassed. Consider continuously maintaining
 757 and protecting a detailed log of backup IDs outside the production cluster in your environment.
 758 +
 759 The HBase administrator runs the following command on the directory where backups are stored to print the list of successful backup
 760 IDs on the console:
 761
 762  `hdfs dfs -ls -t /prodhbasebackups/backups`
 763
 764 * The admin scans the list to see which backup was created at a date and time closest to the recovery objective. To do this, the
 765 admin converts the calendar timestamp of the recovery point in time to Unix time because backup IDs are uniquely identified with
 766 Unix time. The backup IDs are listed in reverse chronological order, meaning the most recent successful backup appears first.
 767 +
 768 The admin notices that the following line in the command output corresponds with the _green_set_ backup that needs to be restored:
 769
 770  /prodhbasebackups/backups/backup_1467823988425`
 771
 772 * The admin restores green_set invoking the backup ID and the -overwrite option. The -overwrite option truncates all existing data
 773 in the destination and populates the tables with data from the backup dataset. Without this flag, the backup data is appended to the
 774 existing data in the destination. In this case, the admin decides to overwrite the data because it is corrupted.
 775
 776  $ sudo -u hbase hbase restore -s green_set \
 777    s3a://$ACCESS_KEY:$SECRET_KEY@prodhbasebackups/backups backup_1467823988425 \ -overwrite
 778
 779 [[br.data.security]]
 780 == Security of Backup Data
 781
 782 With this feature which makes copying data to remote locations, it's important to take a moment to clearly state the procedural
 783 concerns that exist around data security. Like the HBase replication feature, backup and restore provides the constructs to automatically
 784 copy data from within a corporate boundary to some system outside of that boundary. It is imperative when storing sensitive data that with backup and restore, much
 785 less any feature which extracts data from HBase, the locations to which data is being sent has undergone a security audit to ensure
 786 that only authenticated users are allowed to access that data.
 787
 788 For example, with the above example of backing up data to S3, it is of the utmost importance that the proper permissions are assigned
 789 to the S3 bucket to ensure that only a minimum set of authorized users are allowed to access this data. Because the data is no longer
 790 being accessed via HBase, and its authentication and authorization controls, we must ensure that the filesystem storing that data is
 791 providing a comparable level of security. This is a manual step which users *must* implement on their own.
 792
 793 [[br.technical.details]]
 794 == Technical Details of Incremental Backup and Restore
 795
 796 HBase incremental backups enable more efficient capture of HBase table images than previous attempts at serial backup and restore
 797 solutions, such as those that only used HBase Export and Import APIs. Incremental backups use Write Ahead Logs (WALs) to capture
 798 the data changes since the previous backup was created. A WAL roll (create new WALs) is executed across all RegionServers to track
 799 the WALs that need to be in the backup.
 800
 801 After the incremental backup image is created, the source backup files usually are on same node as the data source. A process similar
 802 to the DistCp (distributed copy) tool is used to move the source backup files to the target file systems. When a table restore operation
 803 starts, a two-step process is initiated. First, the full backup is restored from the full backup image. Second, all WAL files from
 804 incremental backups between the last full backup and the incremental backup being restored are converted to HFiles, which the HBase
 805 Bulk Load utility automatically imports as restored data in the table.
 806
 807 You can only restore on a live HBase cluster because the data must be redistributed to complete the restore operation successfully.
 808
 809 [[br.filesystem.growth.warning]]
 810 == A Warning on File System Growth
 811
 812 As a reminder, incremental backups are implemented via retaining the write-ahead logs which HBase primarily uses for data durability.
 813 Thus, to ensure that all data needing to be included in a backup is still available in the system, the HBase backup and restore feature
 814 retains all write-ahead logs since the last backup until the next incremental backup is executed.
 815
 816 Like HBase Snapshots, this can have an expectedly large impact on the HDFS usage of HBase for high volume tables. Take care in enabling
 817 and using the backup and restore feature, specifically with a mind to removing backup sessions when they are not actively being used.
 818
 819 The only automated, upper-bound on retained write-ahead logs for backup and restore is based on the TTL of the `hbase:backup` system table which,
 820 as of the time this document is written, is infinite (backup table entries are never automatically deleted). This requires that administrators
 821 perform backups on a schedule whose frequency is relative to the amount of available space on HDFS (e.g. less available HDFS space requires
 822 more aggressive backup merges and deletions). As a reminder, the TTL can be altered on the `hbase:backup` table using the `alter` command
 823 in the HBase shell. Modifying the configuration property `hbase.backup.system.ttl` in hbase-site.xml after the system table exists has no effect.
 824
 825 [[br.backup.capacity.planning]]
 826 == Capacity Planning
 827
 828 When designing a distributed system deployment, it is critical that some basic mathmatical rigor is executed to ensure sufficient computational
 829 capacity is available given the data and software requirements of the system. For this feature, the availability of network capacity is the largest
 830 bottleneck when estimating the performance of some implementation of backup and restore. The second most costly function is the speed at which
 831 data can be read/written.
 832
 833 === Full Backups
 834
 835 To estimate the duration of a full backup, we have to understand the general actions which are invoked:
 836
 837 * Write-ahead log roll on each RegionServer: ones to tens of seconds per RegionServer in parallel. Relative to the load on each RegionServer.
 838 * Take an HBase snapshot of the table(s): tens of seconds. Relative to the number of regions and files that comprise the table.
 839 * Export the snapshot to the destination: see below. Relative to the size of the data and the network bandwidth to the destination.
 840
 841 [[br.export.snapshot.cost]]
 842 To approximate how long the final step will take, we have to make some assumptions on hardware. Be aware that these will *not* be accurate for your
 843 system -- these are numbers that your or your administrator know for your system. Let's say the speed of reading data from HDFS on a single node is
 844 capped at 80MB/s (across all Mappers that run on that host), a modern network interface controller (NIC) supports 10Gb/s, the top-of-rack switch can
 845 handle 40Gb/s, and the WAN between your clusters is 10Gb/s. This means that you can only ship data to your remote at a speed of 1.25GB/s -- meaning
 846 that 16 nodes (`1.25 * 1024 / 80 = 16`) participating in the ExportSnapshot should be able to fully saturate the link between clusters. With more
 847 nodes in the cluster, we can still saturate the network but at a lesser impact on any one node which helps ensure local SLAs are made. If the size
 848 of the snapshot is 10TB, this would full backup would take in the ballpark of 2.5 hours (`10 * 1024 / 1.25 / (60 * 60) = 2.23hrs`)
 849
 850 As a general statement, it is very likely that the WAN bandwidth between your local cluster and the remote storage is the largest
 851 bottleneck to the speed of a full backup.
 852
 853 When the concern is restricting the computational impact of backups to a "production system", the above formulas can be reused with the optional
 854 command-line arguments to `hbase backup create`: `-b`, `-w`, `-q`. The `-b` option defines the bandwidth at which each worker (Mapper) would
 855 write data. The `-w` argument limits the number of workers that would be spawned in the DistCp job. The `-q` allows the user to specify a YARN
 856 queue which can limit the specific nodes where the workers will be spawned -- this can quarantine the backup workers performing the copy to
 857 a set of non-critical nodes. Relating the `-b` and `-w` options to our earlier equations: `-b` would be used to restrict each node from reading
 858 data at the full 80MB/s and `-w` is used to limit the job from spawning 16 worker tasks.
 859
 860 === Incremental Backup
 861
 862 Like we did for full backups, we have to understand the incremental backup process to approximate its runtime and cost.
 863
 864 * Identify new write-ahead logs since last full or incremental backup: negligible. Apriori knowledge from the backup system table(s).
 865 * Read, filter, and write "minimized" HFiles equivalent to the WALs: dominated by the speed of writing data. Relative to write speed of HDFS.
 866 * DistCp the HFiles to the destination: <<br.export.snapshot.cost,see above>>.
 867
 868 For the second step, the dominating cost of this operation would be the re-writing the data (under the assumption that a majority of the
 869 data in the WAL is preserved). In this case, we can assume an aggregate write speed of 30MB/s per node. Continuing our 16-node cluster example,
 870 this would require approximately 15 minutes to perform this step for 50GB of data (50 * 1024 / 60 / 60 = 14.2). The amount of time to start the
 871 DistCp MapReduce job would likely dominate the actual time taken to copy the data (50 / 1.25 = 40 seconds) and can be ignored.
 872
 873 [[br.limitations]]
 874 == Limitations of the Backup and Restore Utility
 875
 876 *Serial backup operations*
 877
 878 Backup operations cannot be run concurrently. An operation includes actions like create, delete, restore, and merge. Only one active backup session is supported. link:https://issues.apache.org/jira/browse/HBASE-16391[HBASE-16391]
 879 will introduce multiple-backup sessions support.
 880
 881 *No means to cancel backups*
 882
 883 Both backup and restore operations cannot be canceled. (link:https://issues.apache.org/jira/browse/HBASE-15997[HBASE-15997], link:https://issues.apache.org/jira/browse/HBASE-15998[HBASE-15998]).
 884 The workaround to cancel a backup would be to kill the client-side backup command (`control-C`), ensure all relevant MapReduce jobs have exited, and then
 885 run the `hbase backup repair` command to ensure the system backup metadata is consistent.
 886
 887 *Backups can only be saved to a single location*
 888
 889 Copying backup information to multiple locations is an exercise left to the user. link:https://issues.apache.org/jira/browse/HBASE-15476[HBASE-15476] will
 890 introduce the ability to specify multiple-backup destinations intrinsically.
 891
 892 *HBase superuser access is required*
 893
 894 Only an HBase superuser (e.g. hbase) is allowed to perform backup/restore, can pose a problem for shared HBase installations. Current mitigations would require
 895 coordination with system administrators to build and deploy a backup and restore strategy (link:https://issues.apache.org/jira/browse/HBASE-14138[HBASE-14138]).
 896
 897 *Backup restoration is an online operation*
 898
 899 To perform a restore from a backup, it requires that the HBase cluster is online as a caveat of the current implementation (link:https://issues.apache.org/jira/browse/HBASE-16573[HBASE-16573]).
 900
 901 *Some operations may fail and require re-run*
 902
 903 The HBase backup feature is primarily client driven. While there is the standard HBase retry logic built into the HBase Connection, persistent errors in executing operations
 904 may propagate back to the client (e.g. snapshot failure due to region splits). The backup implementation should be moved from client-side into the ProcedureV2 framework
 905 in the future which would provide additional robustness around transient/retryable failures. The `hbase backup repair` command is meant to correct states which the system
 906 cannot automatically detect and recover from.
 907
 908 *Avoidance of declaration of public API*
 909
 910 While the Java API to interact with this feature exists and its implementation is separated from an interface, insufficient rigor has been applied to determine if
 911 it is exactly what we intend to ship to users. As such, it is marked as for a `Private` audience with the expectation that, as users begin to try the feature, there
 912 will be modifications that would necessitate breaking compatibility (link:https://issues.apache.org/jira/browse/HBASE-17517[HBASE-17517]).
 913
 914 *Lack of global metrics for backup and restore*
 915
 916 Individual backup and restore operations contain metrics about the amount of work the operation included, but there is no centralized location (e.g. the Master UI)
 917 which present information for consumption (link:https://issues.apache.org/jira/browse/HBASE-16565[HBASE-16565]).