src/main/asciidoc/_chapters/shell.adoc

   1 ////
   2 /**
   3  *
   4  * Licensed to the Apache Software Foundation (ASF) under one
   5  * or more contributor license agreements.  See the NOTICE file
   6  * distributed with this work for additional information
   7  * regarding copyright ownership.  The ASF licenses this file
   8  * to you under the Apache License, Version 2.0 (the
   9  * "License"); you may not use this file except in compliance
  10  * with the License.  You may obtain a copy of the License at
  11  *
  12  *     http://www.apache.org/licenses/LICENSE-2.0
  13  *
  14  * Unless required by applicable law or agreed to in writing, software
  15  * distributed under the License is distributed on an "AS IS" BASIS,
  16  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  17  * See the License for the specific language governing permissions and
  18  * limitations under the License.
  19  */
  20 ////
  21
  22 [[shell]]
  23 = The Apache HBase Shell
  24 :doctype: book
  25 :numbered:
  26 :toc: left
  27 :icons: font
  28 :experimental:
  29
  30
  31 The Apache HBase Shell is link:http://jruby.org[(J)Ruby]'s IRB with some HBase particular commands added.
  32 Anything you can do in IRB, you should be able to do in the HBase Shell.
  33
  34 To run the HBase shell, do as follows:
  35
  36 [source,bash]
  37 ----
  38 $ ./bin/hbase shell
  39 ----
  40
  41 Type `help` and then `<RETURN>` to see a listing of shell commands and options.
  42 Browse at least the paragraphs at the end of the help output for the gist of how variables and command arguments are entered into the HBase shell; in particular note how table names, rows, and columns, etc., must be quoted.
  43
  44 See <<shell_exercises,shell exercises>> for example basic shell operation.
  45
  46 Here is a nicely formatted listing of link:http://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/[all shell
  47             commands] by Rajeshbabu Chintaguntla.
  48
  49 [[scripting]]
  50 == Scripting with Ruby
  51
  52 For examples scripting Apache HBase, look in the HBase _bin_            directory.
  53 Look at the files that end in _*.rb_.
  54 To run one of these files, do as follows:
  55
  56 [source,bash]
  57 ----
  58 $ ./bin/hbase org.jruby.Main PATH_TO_SCRIPT
  59 ----
  60
  61 == Running the Shell in Non-Interactive Mode
  62
  63 A new non-interactive mode has been added to the HBase Shell (link:https://issues.apache.org/jira/browse/HBASE-11658[HBASE-11658)].
  64 Non-interactive mode captures the exit status (success or failure) of HBase Shell commands and passes that status back to the command interpreter.
  65 If you use the normal interactive mode, the HBase Shell will only ever return its own exit status, which will nearly always be `0` for success.
  66
  67 To invoke non-interactive mode, pass the `-n` or `--non-interactive` option to HBase Shell.
  68
  69 [[hbase.shell.noninteractive]]
  70 == HBase Shell in OS Scripts
  71
  72 You can use the HBase shell from within operating system script interpreters like the Bash shell which is the default command interpreter for most Linux and UNIX distributions.
  73 The following guidelines use Bash syntax, but could be adjusted to work with C-style shells such as csh or tcsh, and could probably be modified to work with the Microsoft Windows script interpreter as well. Submissions are welcome.
  74
  75 NOTE: Spawning HBase Shell commands in this way is slow, so keep that in mind when you are deciding when combining HBase operations with the operating system command line is appropriate.
  76
  77 .Passing Commands to the HBase Shell
  78 ====
  79 You can pass commands to the HBase Shell in non-interactive mode (see <<hbase.shell.noninteractive,hbase.shell.noninteractive>>) using the `echo` command and the `|` (pipe) operator.
  80 Be sure to escape characters in the HBase commands which would otherwise be interpreted by the shell.
  81 Some debug-level output has been truncated from the example below.
  82
  83 [source,bash]
  84 ----
  85 $ echo "describe 'test1'" | ./hbase shell -n
  86
  87 Version 0.98.3-hadoop2, rd5e65a9144e315bb0a964e7730871af32f5018d5, Sat May 31 19:56:09 PDT 2014
  88
  89 describe 'test1'
  90
  91 DESCRIPTION                                          ENABLED
  92  'test1', {NAME => 'cf', DATA_BLOCK_ENCODING => 'NON true
  93  E', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0',
  94   VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIO
  95  NS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS =>
  96  'false', BLOCKSIZE => '65536', IN_MEMORY => 'false'
  97  , BLOCKCACHE => 'true'}
  98 1 row(s) in 3.2410 seconds
  99 ----
 100
 101 To suppress all output, echo it to _/dev/null:_
 102
 103 [source,bash]
 104 ----
 105 $ echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1
 106 ----
 107 ====
 108
 109 .Checking the Result of a Scripted Command
 110 ====
 111 Since scripts are not designed to be run interactively, you need a way to check whether your command failed or succeeded.
 112 The HBase shell uses the standard convention of returning a value of `0` for successful commands, and some non-zero value for failed commands.
 113 Bash stores a command's return value in a special environment variable called `$?`.
 114 Because that variable is overwritten each time the shell runs any command, you should store the result in a different, script-defined variable.
 115
 116 This is a naive script that shows one way to store the return value and make a decision based upon it.
 117
 118 [source,bash]
 119 ----
 120 #!/bin/bash
 121
 122 echo "describe 'test'" | ./hbase shell -n > /dev/null 2>&1
 123 status=$?
 124 echo "The status was " $status
 125 if ($status == 0); then
 126     echo "The command succeeded"
 127 else
 128     echo "The command may have failed."
 129 fi
 130 return $status
 131 ----
 132 ====
 133
 134 === Checking for Success or Failure In Scripts
 135
 136 Getting an exit code of `0` means that the command you scripted definitely succeeded.
 137 However, getting a non-zero exit code does not necessarily mean the command failed.
 138 The command could have succeeded, but the client lost connectivity, or some other event obscured its success.
 139 This is because RPC commands are stateless.
 140 The only way to be sure of the status of an operation is to check.
 141 For instance, if your script creates a table, but returns a non-zero exit value, you should check whether the table was actually created before trying again to create it.
 142
 143 == Read HBase Shell Commands from a Command File
 144
 145 You can enter HBase Shell commands into a text file, one command per line, and pass that file to the HBase Shell.
 146
 147 .Example Command File
 148 ====
 149 ----
 150 create 'test', 'cf'
 151 list 'test'
 152 put 'test', 'row1', 'cf:a', 'value1'
 153 put 'test', 'row2', 'cf:b', 'value2'
 154 put 'test', 'row3', 'cf:c', 'value3'
 155 put 'test', 'row4', 'cf:d', 'value4'
 156 scan 'test'
 157 get 'test', 'row1'
 158 disable 'test'
 159 enable 'test'
 160 ----
 161 ====
 162
 163 .Directing HBase Shell to Execute the Commands
 164 ====
 165 Pass the path to the command file as the only argument to the `hbase shell` command.
 166 Each command is executed and its output is shown.
 167 If you do not include the `exit` command in your script, you are returned to the HBase shell prompt.
 168 There is no way to programmatically check each individual command for success or failure.
 169 Also, though you see the output for each command, the commands themselves are not echoed to the screen so it can be difficult to line up the command with its output.
 170
 171 [source,bash]
 172 ----
 173 $ ./hbase shell ./sample_commands.txt
 174 0 row(s) in 3.4170 seconds
 175
 176 TABLE
 177 test
 178 1 row(s) in 0.0590 seconds
 179
 180 0 row(s) in 0.1540 seconds
 181
 182 0 row(s) in 0.0080 seconds
 183
 184 0 row(s) in 0.0060 seconds
 185
 186 0 row(s) in 0.0060 seconds
 187
 188 ROW                   COLUMN+CELL
 189  row1                 column=cf:a, timestamp=1407130286968, value=value1
 190  row2                 column=cf:b, timestamp=1407130286997, value=value2
 191  row3                 column=cf:c, timestamp=1407130287007, value=value3
 192  row4                 column=cf:d, timestamp=1407130287015, value=value4
 193 4 row(s) in 0.0420 seconds
 194
 195 COLUMN                CELL
 196  cf:a                 timestamp=1407130286968, value=value1
 197 1 row(s) in 0.0110 seconds
 198
 199 0 row(s) in 1.5630 seconds
 200
 201 0 row(s) in 0.4360 seconds
 202 ----
 203 ====
 204
 205 == Passing VM Options to the Shell
 206
 207 You can pass VM options to the HBase Shell using the `HBASE_SHELL_OPTS` environment variable.
 208 You can set this in your environment, for instance by editing _~/.bashrc_, or set it as part of the command to launch HBase Shell.
 209 The following example sets several garbage-collection-related variables, just for the lifetime of the VM running the HBase Shell.
 210 The command should be run all on a single line, but is broken by the `\` character, for readability.
 211
 212 [source,bash]
 213 ----
 214 $ HBASE_SHELL_OPTS="-verbose:gc -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps \
 215   -XX:+PrintGCDetails -Xloggc:$HBASE_HOME/logs/gc-hbase.log" ./bin/hbase shell
 216 ----
 217
 218 == Shell Tricks
 219
 220 === Table variables
 221
 222 HBase 0.95 adds shell commands that provides jruby-style object-oriented references for tables.
 223 Previously all of the shell commands that act upon a table have a procedural style that always took the name of the table as an argument.
 224 HBase 0.95 introduces the ability to assign a table to a jruby variable.
 225 The table reference can be used to perform data read write operations such as puts, scans, and gets well as admin functionality such as disabling, dropping, describing tables.
 226
 227 For example, previously you would always specify a table name:
 228
 229 ----
 230 hbase(main):000:0> create ‘t’, ‘f’
 231 0 row(s) in 1.0970 seconds
 232 hbase(main):001:0> put 't', 'rold', 'f', 'v'
 233 0 row(s) in 0.0080 seconds
 234
 235 hbase(main):002:0> scan 't'
 236 ROW                                COLUMN+CELL
 237  rold                              column=f:, timestamp=1378473207660, value=v
 238 1 row(s) in 0.0130 seconds
 239
 240 hbase(main):003:0> describe 't'
 241 DESCRIPTION                                                                           ENABLED
 242  't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
 243  SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
 244  147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
 245  ', BLOCKCACHE => 'true'}
 246 1 row(s) in 1.4430 seconds
 247
 248 hbase(main):004:0> disable 't'
 249 0 row(s) in 14.8700 seconds
 250
 251 hbase(main):005:0> drop 't'
 252 0 row(s) in 23.1670 seconds
 253
 254 hbase(main):006:0>
 255 ----
 256
 257 Now you can assign the table to a variable and use the results in jruby shell code.
 258
 259 ----
 260 hbase(main):007 > t = create 't', 'f'
 261 0 row(s) in 1.0970 seconds
 262
 263 => Hbase::Table - t
 264 hbase(main):008 > t.put 'r', 'f', 'v'
 265 0 row(s) in 0.0640 seconds
 266 hbase(main):009 > t.scan
 267 ROW                           COLUMN+CELL
 268  r                            column=f:, timestamp=1331865816290, value=v
 269 1 row(s) in 0.0110 seconds
 270 hbase(main):010:0> t.describe
 271 DESCRIPTION                                                                           ENABLED
 272  't', {NAME => 'f', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_ true
 273  SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2
 274  147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false
 275  ', BLOCKCACHE => 'true'}
 276 1 row(s) in 0.0210 seconds
 277 hbase(main):038:0> t.disable
 278 0 row(s) in 6.2350 seconds
 279 hbase(main):039:0> t.drop
 280 0 row(s) in 0.2340 seconds
 281 ----
 282
 283 If the table has already been created, you can assign a Table to a variable by using the get_table method:
 284
 285 ----
 286 hbase(main):011 > create 't','f'
 287 0 row(s) in 1.2500 seconds
 288
 289 => Hbase::Table - t
 290 hbase(main):012:0> tab = get_table 't'
 291 0 row(s) in 0.0010 seconds
 292
 293 => Hbase::Table - t
 294 hbase(main):013:0> tab.put ‘r1’ ,’f’, ‘v’
 295 0 row(s) in 0.0100 seconds
 296 hbase(main):014:0> tab.scan
 297 ROW                                COLUMN+CELL
 298  r1                                column=f:, timestamp=1378473876949, value=v
 299 1 row(s) in 0.0240 seconds
 300 hbase(main):015:0>
 301 ----
 302
 303 The list functionality has also been extended so that it returns a list of table names as strings.
 304 You can then use jruby to script table operations based on these names.
 305 The list_snapshots command also acts similarly.
 306
 307 ----
 308 hbase(main):016 > tables = list(‘t.*’)
 309 TABLE
 310 t
 311 1 row(s) in 0.1040 seconds
 312
 313 => #<#<Class:0x7677ce29>:0x21d377a4>
 314 hbase(main):017:0> tables.map { |t| disable t ; drop  t}
 315 0 row(s) in 2.2510 seconds
 316
 317 => [nil]
 318 hbase(main):018:0>
 319 ----
 320
 321 [[irbrc]]
 322 === _irbrc_
 323
 324 Create an _.irbrc_ file for yourself in your home directory.
 325 Add customizations.
 326 A useful one is command history so commands are save across Shell invocations:
 327 [source,bash]
 328 ----
 329 $ more .irbrc
 330 require 'irb/ext/save-history'
 331 IRB.conf[:SAVE_HISTORY] = 100
 332 IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"
 333 ----
 334
 335 If you'd like to avoid printing the result of evaluting each expression to stderr, for example the array of tables returned from the "list" command:
 336
 337 [source,bash]
 338 ----
 339 $ echo "IRB.conf[:ECHO] = false" >>~/.irbrc
 340 ----
 341
 342 See the `ruby` documentation of _.irbrc_ to learn about other possible configurations.
 343
 344 === LOG data to timestamp
 345
 346 To convert the date '08/08/16 20:56:29' from an hbase log into a timestamp, do:
 347
 348 ----
 349 hbase(main):021:0> import java.text.SimpleDateFormat
 350 hbase(main):022:0> import java.text.ParsePosition
 351 hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime() => 1218920189000
 352 ----
 353
 354 To go the other direction:
 355
 356 ----
 357 hbase(main):021:0> import java.util.Date
 358 hbase(main):022:0> Date.new(1218920189000).toString() => "Sat Aug 16 20:56:29 UTC 2008"
 359 ----
 360
 361 To output in a format that is exactly like that of the HBase log format will take a little messing with link:http://download.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html[SimpleDateFormat].
 362
 363 === Query Shell Configuration
 364 ----
 365 hbase(main):001:0> @shell.hbase.configuration.get("hbase.rpc.timeout")
 366 => "60000"
 367 ----
 368 To set a config in the shell:
 369 ----
 370 hbase(main):005:0> @shell.hbase.configuration.setInt("hbase.rpc.timeout", 61010)
 371 hbase(main):006:0> @shell.hbase.configuration.get("hbase.rpc.timeout")
 372 => "61010"
 373 ----
 374
 375
 376 [[tricks.pre-split]]
 377 === Pre-splitting tables with the HBase Shell
 378 You can use a variety of options to pre-split tables when creating them via the HBase Shell `create` command.
 379
 380 The simplest approach is to specify an array of split points when creating the table. Note that when specifying string literals as split points, these will create split points based on the underlying byte representation of the string. So when specifying a split point of '10', we are actually specifying the byte split point '\x31\30'.
 381
 382 The split points will define `n+1` regions where `n` is the number of split points. The lowest region will contain all keys from the lowest possible key up to but not including the first split point key.
 383 The next region will contain keys from the first split point up to, but not including the next split point key.
 384 This will continue for all split points up to the last. The last region will be defined from the last split point up to the maximum possible key.
 385
 386 [source]
 387 ----
 388 hbase>create 't1','f',SPLITS => ['10','20','30']
 389 ----
 390
 391 In the above example, the table 't1' will be created with column family 'f', pre-split to four regions. Note the first region will contain all keys from '\x00' up to '\x30' (as '\x31' is the ASCII code for '1').
 392
 393 You can pass the split points in a file using following variation. In this example, the splits are read from a file corresponding to the local path on the local filesystem. Each line in the file specifies a split point key.
 394
 395 [source]
 396 ----
 397 hbase>create 't14','f',SPLITS_FILE=>'splits.txt'
 398 ----
 399
 400 The other options are to automatically compute splits based on a desired number of regions and a splitting algorithm.
 401 HBase supplies algorithms for splitting the key range based on uniform splits or based on hexadecimal keys, but you can provide your own splitting algorithm to subdivide the key range.
 402
 403 [source]
 404 ----
 405 # create table with four regions based on random bytes keys
 406 hbase>create 't2','f1', { NUMREGIONS => 4 , SPLITALGO => 'UniformSplit' }
 407
 408 # create table with five regions based on hex keys
 409 hbase>create 't3','f1', { NUMREGIONS => 5, SPLITALGO => 'HexStringSplit' }
 410 ----
 411
 412 As the HBase Shell is effectively a Ruby environment, you can use simple Ruby scripts to compute splits algorithmically.
 413
 414 [source]
 415 ----
 416 # generate splits for long (Ruby fixnum) key range from start to end key
 417 hbase(main):070:0> def gen_splits(start_key,end_key,num_regions)
 418 hbase(main):071:1>   results=[]
 419 hbase(main):072:1>   range=end_key-start_key
 420 hbase(main):073:1>   incr=(range/num_regions).floor
 421 hbase(main):074:1>   for i in 1 .. num_regions-1
 422 hbase(main):075:2>     results.push([i*incr+start_key].pack("N"))
 423 hbase(main):076:2>   end
 424 hbase(main):077:1>   return results
 425 hbase(main):078:1> end
 426 hbase(main):079:0>
 427 hbase(main):080:0> splits=gen_splits(1,2000000,10)
 428 => ["\000\003\r@", "\000\006\032\177", "\000\t'\276", "\000\f4\375", "\000\017B<", "\000\022O{", "\000\025\\\272", "\000\030i\371", "\000\ew8"]
 429 hbase(main):081:0> create 'test_splits','f',SPLITS=>splits
 430 0 row(s) in 0.2670 seconds
 431
 432 => Hbase::Table - test_splits
 433 ----
 434
 435 Note that the HBase Shell command `truncate` effectively drops and recreates the table with default options which will discard any pre-splitting.
 436 If you need to truncate a pre-split table, you must drop and recreate the table explicitly to re-specify custom split options.
 437
 438 === Debug
 439
 440 ==== Shell debug switch
 441
 442 You can set a debug switch in the shell to see more output -- e.g.
 443 more of the stack trace on exception -- when you run a command:
 444
 445 [source]
 446 ----
 447 hbase> debug <RETURN>
 448 ----
 449
 450 ==== DEBUG log level
 451
 452 To enable DEBUG level logging in the shell, launch it with the `-d` option.
 453
 454 [source,bash]
 455 ----
 456 $ ./bin/hbase shell -d
 457 ----
 458
 459 === Commands
 460
 461 ==== count
 462
 463 Count command returns the number of rows in a table.
 464 It's quite fast when configured with the right CACHE
 465
 466 [source]
 467 ----
 468 hbase> count '<tablename>', CACHE => 1000
 469 ----
 470
 471 The above count fetches 1000 rows at a time.
 472 Set CACHE lower if your rows are big.
 473 Default is to fetch one row at a time.