src/main/asciidoc/_chapters/shell.adoc

   1 ////
   2 /**
   3  *
   4  * Licensed to the Apache Software Foundation (ASF) under one
   5  * or more contributor license agreements.  See the NOTICE file
   6  * distributed with this work for additional information
   7  * regarding copyright ownership.  The ASF licenses this file
   8  * to you under the Apache License, Version 2.0 (the
   9  * "License"); you may not use this file except in compliance
  10  * with the License.  You may obtain a copy of the License at
  11  *
  12  *     http://www.apache.org/licenses/LICENSE-2.0
  13  *
  14  * Unless required by applicable law or agreed to in writing, software
  15  * distributed under the License is distributed on an "AS IS" BASIS,
  16  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  17  * See the License for the specific language governing permissions and
  18  * limitations under the License.
  19  */
  20 ////
  21
  22 [[shell]]
  23 = The Apache HBase Shell
  24 :doctype: book
  25 :numbered:
  26 :toc: left
  27 :icons: font
  28 :experimental:
  29
  30
  31 The Apache HBase Shell is link:http://jruby.org[(J)Ruby]'s IRB with some HBase particular commands added.
  32 Anything you can do in IRB, you should be able to do in the HBase Shell.
  33
  34 To run the HBase shell, do as follows:
  35
  36 [source,bash]
  37 ----
  38 $ ./bin/hbase shell
  39 ----
  40
  41 Type `help` and then `<RETURN>` to see a listing of shell commands and options.
  42 Browse at least the paragraphs at the end of the help output for the gist of how variables and command arguments are entered into the HBase shell; in particular note how table names, rows, and columns, etc., must be quoted.
  43
  44 See <<shell_exercises,shell exercises>> for example basic shell operation.
  45
  46 Here is a nicely formatted listing of link:http://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/[all shell
  47             commands] by Rajeshbabu Chintaguntla.
  48
  49 [[scripting]]
  50 == Scripting with Ruby
  51
  52 For examples scripting Apache HBase, look in the HBase _bin_            directory.
  53 Look at the files that end in _*.rb_.
  54 To run one of these files, do as follows:
  55
  56 [source,bash]
  57 ----
  58 $ ./bin/hbase org.jruby.Main PATH_TO_SCRIPT
  59 ----
  60
  61
  62 == Running the Shell in Non-Interactive Mode
  63
  64 A new non-interactive mode has been added to the HBase Shell (link:https://issues.apache.org/jira/browse/HBASE-11658[HBASE-11658)].
  65 Non-interactive mode captures the exit status (success or failure) of HBase Shell commands and passes that status back to the command interpreter.
  66 If you use the normal interactive mode, the HBase Shell will only ever return its own exit status, which will nearly always be `0` for success.
  67
  68 To invoke non-interactive mode, pass the `-n` or `--non-interactive` option to HBase Shell.
  69
  70 [[hbase.shell.noninteractive]]
  71 == HBase Shell in OS Scripts
  72
  73 You can use the HBase shell from within operating system script interpreters like the Bash shell which is the default command interpreter for most Linux and UNIX distributions.
  74 The following guidelines use Bash syntax, but could be adjusted to work with C-style shells such as csh or tcsh, and could probably be modified to work with the Microsoft Windows script interpreter as well. Submissions are welcome.
  75
  76 NOTE: Spawning HBase Shell commands in this way is slow, so keep that in mind when you are deciding when combining HBase operations with the operating system command line is appropriate.
  77
  78 .Passing Commands to the HBase Shell
  79 ====
  80 You can pass commands to the HBase Shell in non-interactive mode (see <<hbase.shell.noninteractive,hbase.shell.noninteractive>>) using the `echo` command and the `|` (pipe) operator.
  81 Be sure to escape characters in the HBase commands which would otherwise be interpreted by the shell.
  82 Some debug-level output has been truncated from the example below.
  83
  84 [source,bash]
  85 ----
  86 $ echo "describe 'test1'" | ./hbase shell -n
  87
  88 Version 0.98.3-hadoop2, rd5e65a9144e315bb0a964e7730871af32f5018d5, Sat May 31 19:56:09 PDT 2014
  89
  90 describe 'test1'
  91
  92 DESCRIPTION                                          ENABLED
  93  'test1', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NON true
  94  E', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',
  95   VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIO
  96  NS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =>
  97  'false', BLOCKSIZE => '65536', IN_MEMORY => 'false'
  98  , BLOCKCACHE => 'true'}
  99 1 row(s) in 3.2410 seconds
 100 ----
 101
 102 To suppress all output, echo it to _/dev/null:_
 103
 104 [source,bash]
 105 ----
 106 $ echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1
 107 ----
 108 ====
 109
 110 .Checking the Result of a Scripted Command
 111 ====
 112 Since scripts are not designed to be run interactively, you need a way to check whether your command failed or succeeded.
 113 The HBase shell uses the standard convention of returning a value of `0` for successful commands, and some non-zero value for failed commands.
 114 Bash stores a command's return value in a special environment variable called `$?`.
 115 Because that variable is overwritten each time the shell runs any command, you should store the result in a different, script-defined variable.
 116
 117 This is a naive script that shows one way to store the return value and make a decision based upon it.
 118
 119 [source,bash]
 120 ----
 121 #!/bin/bash
 122
 123 echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1
 124 status=$?
 125 echo "The status was " $status
 126 if ($status == 0); then
 127     echo "The command succeeded"
 128 else
 129     echo "The command may have failed."
 130 fi
 131 return $status
 132 ----
 133 ====
 134
 135 === Checking for Success or Failure In Scripts
 136
 137 Getting an exit code of `0` means that the command you scripted definitely succeeded.
 138 However, getting a non-zero exit code does not necessarily mean the command failed.
 139 The command could have succeeded, but the client lost connectivity, or some other event obscured its success.
 140 This is because RPC commands are stateless.
 141 The only way to be sure of the status of an operation is to check.
 142 For instance, if your script creates a table, but returns a non-zero exit value, you should check whether the table was actually created before trying again to create it.
 143
 144 == Read HBase Shell Commands from a Command File
 145
 146 You can enter HBase Shell commands into a text file, one command per line, and pass that file to the HBase Shell.
 147
 148 .Example Command File
 149 ----
 150 create 'test', 'cf'
 151 list 'test'
 152 put 'test', 'row1', 'cf:a', 'value1'
 153 put 'test', 'row2', 'cf:b', 'value2'
 154 put 'test', 'row3', 'cf:c', 'value3'
 155 put 'test', 'row4', 'cf:d', 'value4'
 156 scan 'test'
 157 get 'test', 'row1'
 158 disable 'test'
 159 enable 'test'
 160 ----
 161
 162 .Directing HBase Shell to Execute the Commands
 163 ====
 164 Pass the path to the command file as the only argument to the `hbase shell` command.
 165 Each command is executed and its output is shown.
 166 If you do not include the `exit` command in your script, you are returned to the HBase shell prompt.
 167 There is no way to programmatically check each individual command for success or failure.
 168 Also, though you see the output for each command, the commands themselves are not echoed to the screen so it can be difficult to line up the command with its output.
 169
 170 [source,bash]
 171 ----
 172 $ ./hbase shell ./sample_commands.txt
 173 0 row(s) in 3.4170 seconds
 174
 175 TABLE
 176 test
 177 1 row(s) in 0.0590 seconds
 178
 179 0 row(s) in 0.1540 seconds
 180
 181 0 row(s) in 0.0080 seconds
 182
 183 0 row(s) in 0.0060 seconds
 184
 185 0 row(s) in 0.0060 seconds
 186
 187 ROW                   COLUMN+CELL
 188  row1                 column=cf:a, timestamp=1407130286968, value=value1
 189  row2                 column=cf:b, timestamp=1407130286997, value=value2
 190  row3                 column=cf:c, timestamp=1407130287007, value=value3
 191  row4                 column=cf:d, timestamp=1407130287015, value=value4
 192 4 row(s) in 0.0420 seconds
 193
 194 COLUMN                CELL
 195  cf:a                 timestamp=1407130286968, value=value1
 196 1 row(s) in 0.0110 seconds
 197
 198 0 row(s) in 1.5630 seconds
 199
 200 0 row(s) in 0.4360 seconds
 201 ----
 202 ====
 203
 204 == Passing VM Options to the Shell
 205
 206 You can pass VM options to the HBase Shell using the `HBASE_SHELL_OPTS` environment variable.
 207 You can set this in your environment, for instance by editing _~/.bashrc_, or set it as part of the command to launch HBase Shell.
 208 The following example sets several garbage-collection-related variables, just for the lifetime of the VM running the HBase Shell.
 209 The command should be run all on a single line, but is broken by the `\` character, for readability.
 210
 211 [source,bash]
 212 ----
 213 $ HBASE_SHELL_OPTS="-verbose:gc -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps \
 214   -XX:+PrintGCDetails -Xloggc:$HBASE_HOME/logs/gc-hbase.log" ./bin/hbase shell
 215 ----
 216
 217 == Overriding configuration starting the HBase Shell
 218
 219 As of hbase-2.0.5/hbase-2.1.3/hbase-2.2.0/hbase-1.4.10/hbase-1.5.0, you can
 220 pass or override hbase configuration as specified in `hbase-*.xml` by passing
 221 your key/values prefixed with `-D` on the command-line as follows:
 222 [source,bash]
 223 ----
 224 $ ./bin/hbase shell -Dhbase.zookeeper.quorum=ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org -Draining=false
 225 ...
 226 hbase(main):001:0> @shell.hbase.configuration.get("hbase.zookeeper.quorum")
 227 => "ZK0.remote.cluster.example.org,ZK1.remote.cluster.example.org,ZK2.remote.cluster.example.org"
 228 hbase(main):002:0> @shell.hbase.configuration.get("raining")
 229 => "false"
 230 ----
 231
 232 == Shell Tricks
 233
 234 === Table variables
 235
 236 HBase 0.95 adds shell commands that provides jruby-style object-oriented references for tables.
 237 Previously all of the shell commands that act upon a table have a procedural style that always took the name of the table as an argument.
 238 HBase 0.95 introduces the ability to assign a table to a jruby variable.
 239 The table reference can be used to perform data read write operations such as puts, scans, and gets well as admin functionality such as disabling, dropping, describing tables.
 240
 241 For example, previously you would always specify a table name:
 242
 243 ----
 244 hbase(main):000:0> create 't', 'f'
 245 0 row(s) in 1.0970 seconds
 246 hbase(main):001:0> put 't', 'rold', 'f', 'v'
 247 0 row(s) in 0.0080 seconds
 248
 249 hbase(main):002:0> scan 't'
 250 ROW                                COLUMN+CELL
 251  rold                              column=f:, timestamp=1378473207660, value=v
 252 1 row(s) in 0.0130 seconds
 253
 254 hbase(main):003:0> describe 't'
 255 DESCRIPTION                                                                           ENABLED
 256  't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
 257  SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
 258  147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
 259  ', BLOCKCACHE => 'true'}
 260 1 row(s) in 1.4430 seconds
 261
 262 hbase(main):004:0> disable 't'
 263 0 row(s) in 14.8700 seconds
 264
 265 hbase(main):005:0> drop 't'
 266 0 row(s) in 23.1670 seconds
 267
 268 hbase(main):006:0>
 269 ----
 270
 271 Now you can assign the table to a variable and use the results in jruby shell code.
 272
 273 ----
 274 hbase(main):007 > t = create 't', 'f'
 275 0 row(s) in 1.0970 seconds
 276
 277 => Hbase::Table - t
 278 hbase(main):008 > t.put 'r', 'f', 'v'
 279 0 row(s) in 0.0640 seconds
 280 hbase(main):009 > t.scan
 281 ROW                           COLUMN+CELL
 282  r                            column=f:, timestamp=1331865816290, value=v
 283 1 row(s) in 0.0110 seconds
 284 hbase(main):010:0> t.describe
 285 DESCRIPTION                                                                           ENABLED
 286  't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
 287  SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
 288  147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
 289  ', BLOCKCACHE => 'true'}
 290 1 row(s) in 0.0210 seconds
 291 hbase(main):038:0> t.disable
 292 0 row(s) in 6.2350 seconds
 293 hbase(main):039:0> t.drop
 294 0 row(s) in 0.2340 seconds
 295 ----
 296
 297 If the table has already been created, you can assign a Table to a variable by using the get_table method:
 298
 299 ----
 300 hbase(main):011 > create 't','f'
 301 0 row(s) in 1.2500 seconds
 302
 303 => Hbase::Table - t
 304 hbase(main):012:0> tab = get_table 't'
 305 0 row(s) in 0.0010 seconds
 306
 307 => Hbase::Table - t
 308 hbase(main):013:0> tab.put 'r1' ,'f', 'v'
 309 0 row(s) in 0.0100 seconds
 310 hbase(main):014:0> tab.scan
 311 ROW                                COLUMN+CELL
 312  r1                                column=f:, timestamp=1378473876949, value=v
 313 1 row(s) in 0.0240 seconds
 314 hbase(main):015:0>
 315 ----
 316
 317 The list functionality has also been extended so that it returns a list of table names as strings.
 318 You can then use jruby to script table operations based on these names.
 319 The list_snapshots command also acts similarly.
 320
 321 ----
 322 hbase(main):016 > tables = list('t.*')
 323 TABLE
 324 t
 325 1 row(s) in 0.1040 seconds
 326
 327 => ["t"]
 328 hbase(main):017:0> tables.map { |t| disable t ; drop  t}
 329 0 row(s) in 2.2510 seconds
 330
 331 => [nil]
 332 hbase(main):018:0>
 333 ----
 334
 335 [[irbrc]]
 336 === _irbrc_
 337
 338 Create an _.irbrc_ file for yourself in your home directory.
 339 Add customizations.
 340 A useful one is command history so commands are save across Shell invocations:
 341 [source,bash]
 342 ----
 343 $ more .irbrc
 344 require 'irb/ext/save-history'
 345 IRB.conf[:SAVE_HISTORY] = 100
 346 IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"
 347 ----
 348
 349 If you'd like to avoid printing the result of evaluting each expression to stderr, for example the array of tables returned from the "list" command:
 350
 351 [source,bash]
 352 ----
 353 $ echo "IRB.conf[:ECHO] = false" >>~/.irbrc
 354 ----
 355
 356 See the `ruby` documentation of _.irbrc_ to learn about other possible configurations.
 357
 358 === LOG data to timestamp
 359
 360 To convert the date '08/08/16 20:56:29' from an hbase log into a timestamp, do:
 361
 362 ----
 363 hbase(main):021:0> import java.text.SimpleDateFormat
 364 hbase(main):022:0> import java.text.ParsePosition
 365 hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime() => 1218920189000
 366 ----
 367
 368 To go the other direction:
 369
 370 ----
 371 hbase(main):021:0> import java.util.Date
 372 hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UTC 2008"
 373 ----
 374
 375 To output in a format that is exactly like that of the HBase log format will take a little messing with link:http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html[SimpleDateFormat].
 376
 377 === Query Shell Configuration
 378 ----
 379 hbase(main):001:0> @shell.hbase.configuration.get("hbase.rpc.timeout")
 380 => "60000"
 381 ----
 382 To set a config in the shell:
 383 ----
 384 hbase(main):005:0> @shell.hbase.configuration.setInt("hbase.rpc.timeout", 61010)
 385 hbase(main):006:0> @shell.hbase.configuration.get("hbase.rpc.timeout")
 386 => "61010"
 387 ----
 388
 389
 390 [[tricks.pre-split]]
 391 === Pre-splitting tables with the HBase Shell
 392 You can use a variety of options to pre-split tables when creating them via the HBase Shell `create` command.
 393
 394 The simplest approach is to specify an array of split points when creating the table. Note that when specifying string literals as split points, these will create split points based on the underlying byte representation of the string. So when specifying a split point of '10', we are actually specifying the byte split point '\x31\30'.
 395
 396 The split points will define `n+1` regions where `n` is the number of split points. The lowest region will contain all keys from the lowest possible key up to but not including the first split point key.
 397 The next region will contain keys from the first split point up to, but not including the next split point key.
 398 This will continue for all split points up to the last. The last region will be defined from the last split point up to the maximum possible key.
 399
 400 [source]
 401 ----
 402 hbase>create 't1','f',SPLITS => ['10','20','30']
 403 ----
 404
 405 In the above example, the table 't1' will be created with column family 'f', pre-split to four regions. Note the first region will contain all keys from '\x00' up to '\x30' (as '\x31' is the ASCII code for '1').
 406
 407 You can pass the split points in a file using following variation. In this example, the splits are read from a file corresponding to the local path on the local filesystem. Each line in the file specifies a split point key.
 408
 409 [source]
 410 ----
 411 hbase>create 't14','f',SPLITS_FILE=>'splits.txt'
 412 ----
 413
 414 The other options are to automatically compute splits based on a desired number of regions and a splitting algorithm.
 415 HBase supplies algorithms for splitting the key range based on uniform splits or based on hexadecimal keys, but you can provide your own splitting algorithm to subdivide the key range.
 416
 417 [source]
 418 ----
 419 # create table with four regions based on random bytes keys
 420 hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
 421
 422 # create table with five regions based on hex keys
 423 hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }
 424 ----
 425
 426 As the HBase Shell is effectively a Ruby environment, you can use simple Ruby scripts to compute splits algorithmically.
 427
 428 [source]
 429 ----
 430 # generate splits for long (Ruby fixnum) key range from start to end key
 431 hbase(main):070:0> def gen_splits(start_key,end_key,num_regions)
 432 hbase(main):071:1>   results=[]
 433 hbase(main):072:1>   range=end_key-start_key
 434 hbase(main):073:1>   incr=(range/num_regions).floor
 435 hbase(main):074:1>   for i in 1 .. num_regions-1
 436 hbase(main):075:2>     results.push([i*incr+start_key].pack("N"))
 437 hbase(main):076:2>   end
 438 hbase(main):077:1>   return results
 439 hbase(main):078:1> end
 440 hbase(main):079:0>
 441 hbase(main):080:0> splits=gen_splits(1,2000000,10)
 442 => ["\000\003\r@", "\000\006\032\177", "\000\t'\276", "\000\f4\375", "\000\017B<", "\000\022O{", "\000\025\\\272", "\000\030i\371", "\000\ew8"]
 443 hbase(main):081:0> create 'test_splits','f',SPLITS=>splits
 444 0 row(s) in 0.2670 seconds
 445
 446 => Hbase::Table - test_splits
 447 ----
 448
 449 Note that the HBase Shell command `truncate` effectively drops and recreates the table with default options which will discard any pre-splitting.
 450 If you need to truncate a pre-split table, you must drop and recreate the table explicitly to re-specify custom split options.
 451
 452 === Debug
 453
 454 ==== Shell debug switch
 455
 456 You can set a debug switch in the shell to see more output -- e.g.
 457 more of the stack trace on exception -- when you run a command:
 458
 459 [source]
 460 ----
 461 hbase> debug <RETURN>
 462 ----
 463
 464 ==== DEBUG log level
 465
 466 To enable DEBUG level logging in the shell, launch it with the `-d` option.
 467
 468 [source,bash]
 469 ----
 470 $ ./bin/hbase shell -d
 471 ----
 472
 473 === Commands
 474
 475 ==== count
 476
 477 Count command returns the number of rows in a table.
 478 It's quite fast when configured with the right CACHE
 479
 480 [source]
 481 ----
 482 hbase> count '<tablename>', CACHE => 1000
 483 ----
 484
 485 The above count fetches 1000 rows at a time.
 486 Set CACHE lower if your rows are big.
 487 Default is to fetch one row at a time.