src/main/asciidoc/_chapters/cp.adoc

   1 ////
   2 /**
   3  *
   4  * Licensed to the Apache Software Foundation (ASF) under one
   5  * or more contributor license agreements.  See the NOTICE file
   6  * distributed with this work for additional information
   7  * regarding copyright ownership.  The ASF licenses this file
   8  * to you under the Apache License, Version 2.0 (the
   9  * "License"); you may not use this file except in compliance
  10  * with the License.  You may obtain a copy of the License at
  11  *
  12  *     http://www.apache.org/licenses/LICENSE-2.0
  13  *
  14  * Unless required by applicable law or agreed to in writing, software
  15  * distributed under the License is distributed on an "AS IS" BASIS,
  16  * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  17  * See the License for the specific language governing permissions and
  18  * limitations under the License.
  19  */
  20 ////
  21
  22 [[cp]]
  23 = Apache HBase Coprocessors
  24 :doctype: book
  25 :numbered:
  26 :toc: left
  27 :icons: font
  28 :experimental:
  29
  30 HBase Coprocessors are modeled after Google BigTable's coprocessor implementation
  31 (http://research.google.com/people/jeff/SOCC2010-keynote-slides.pdf pages 41-42.).
  32
  33 The coprocessor framework provides mechanisms for running your custom code directly on
  34 the RegionServers managing your data. Efforts are ongoing to bridge gaps between HBase's
  35 implementation and BigTable's architecture. For more information see
  36 link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
  37
  38 The information in this chapter is primarily sourced and heavily reused from the following
  39 resources:
  40
  41 . Mingjie Lai's blog post
  42 link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor Introduction].
  43 . Gaurav Bhardwaj's blog post
  44 link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of HBase Coprocessors].
  45
  46 [WARNING]
  47 .Use Coprocessors At Your Own Risk
  48 ====
  49 Coprocessors are an advanced feature of HBase and are intended to be used by system
  50 developers only. Because coprocessor code runs directly on the RegionServer and has
  51 direct access to your data, they introduce the risk of data corruption, man-in-the-middle
  52 attacks, or other malicious data access. Currently, there is no mechanism to prevent
  53 data corruption by coprocessors, though work is underway on
  54 link:https://issues.apache.org/jira/browse/HBASE-4047[HBASE-4047].
  55 +
  56 In addition, there is no resource isolation, so a well-intentioned but misbehaving
  57 coprocessor can severely degrade cluster performance and stability.
  58 ====
  59
  60 == Coprocessor Overview
  61
  62 In HBase, you fetch data using a `Get` or `Scan`, whereas in an RDBMS you use a SQL
  63 query. In order to fetch only the relevant data, you filter it using a HBase
  64 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/Filter.html[Filter]
  65 , whereas in an RDBMS you use a `WHERE` predicate.
  66
  67 After fetching the data, you perform computations on it. This paradigm works well
  68 for "small data" with a few thousand rows and several columns. However, when you scale
  69 to billions of rows and millions of columns, moving large amounts of data across your
  70 network will create bottlenecks at the network layer, and the client needs to be powerful
  71 enough and have enough memory to handle the large amounts of data and the computations.
  72 In addition, the client code can grow large and complex.
  73
  74 In this scenario, coprocessors might make sense. You can put the business computation
  75 code into a coprocessor which runs on the RegionServer, in the same location as the
  76 data, and returns the result to the client.
  77
  78 This is only one scenario where using coprocessors can provide benefit. Following
  79 are some analogies which may help to explain some of the benefits of coprocessors.
  80
  81 [[cp_analogies]]
  82 === Coprocessor Analogies
  83
  84 Triggers and Stored Procedure::
  85   An Observer coprocessor is similar to a trigger in a RDBMS in that it executes
  86   your code either before or after a specific event (such as a `Get` or `Put`)
  87   occurs. An endpoint coprocessor is similar to a stored procedure in a RDBMS
  88   because it allows you to perform custom computations on the data on the
  89   RegionServer itself, rather than on the client.
  90
  91 MapReduce::
  92   MapReduce operates on the principle of moving the computation to the location of
  93   the data. Coprocessors operate on the same principal.
  94
  95 AOP::
  96   If you are familiar with Aspect Oriented Programming (AOP), you can think of a coprocessor
  97   as applying advice by intercepting a request and then running some custom code,
  98   before passing the request on to its final destination (or even changing the destination).
  99
 100
 101 === Coprocessor Implementation Overview
 102
 103 . Your class should implement one of the Coprocessor interfaces -
 104 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/Coprocessor.html[Coprocessor],
 105 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver],
 106 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/CoprocessorService.html[CoprocessorService] - to name a few.
 107
 108 . Load the coprocessor, either statically (from the configuration) or dynamically,
 109 using HBase Shell. For more details see <<cp_loading,Loading Coprocessors>>.
 110
 111 . Call the coprocessor from your client-side code. HBase handles the coprocessor
 112 transparently.
 113
 114 The framework API is provided in the
 115 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/package-summary.html[coprocessor]
 116 package.
 117
 118 == Types of Coprocessors
 119
 120 === Observer Coprocessors
 121
 122 Observer coprocessors are triggered either before or after a specific event occurs.
 123 Observers that happen before an event use methods that start with a `pre` prefix,
 124 such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#prePut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`prePut`]. Observers that happen just after an event override methods that start
 125 with a `post` prefix, such as link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html#postPut-org.apache.hadoop.hbase.coprocessor.ObserverContext-org.apache.hadoop.hbase.client.Put-org.apache.hadoop.hbase.wal.WALEdit-org.apache.hadoop.hbase.client.Durability-[`postPut`].
 126
 127
 128 ==== Use Cases for Observer Coprocessors
 129 Security::
 130   Before performing a `Get` or `Put` operation, you can check for permission using
 131   `preGet` or `prePut` methods.
 132
 133 Referential Integrity::
 134   HBase does not directly support the RDBMS concept of refential integrity, also known
 135   as foreign keys. You can use a coprocessor to enforce such integrity. For instance,
 136   if you have a business rule that every insert to the `users` table must be followed
 137   by a corresponding entry in the `user_daily_attendance` table, you could implement
 138   a coprocessor to use the `prePut` method on `user` to insert a record into `user_daily_attendance`.
 139
 140 Secondary Indexes::
 141   You can use a coprocessor to maintain secondary indexes. For more information, see
 142   link:https://cwiki.apache.org/confluence/display/HADOOP2/Hbase+SecondaryIndexing[SecondaryIndexing].
 143
 144
 145 ==== Types of Observer Coprocessor
 146
 147 RegionObserver::
 148   A RegionObserver coprocessor allows you to observe events on a region, such as `Get`
 149   and `Put` operations. See
 150   link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver].
 151
 152 RegionServerObserver::
 153   A RegionServerObserver allows you to observe events related to the RegionServer's
 154   operation, such as starting, stopping, or performing merges, commits, or rollbacks.
 155   See
 156   link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionServerObserver.html[RegionServerObserver].
 157
 158 MasterObserver::
 159   A MasterObserver allows you to observe events related to the HBase Master, such
 160   as table creation, deletion, or schema modification. See
 161   link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/MasterObserver.html[MasterObserver].
 162
 163 WalObserver::
 164   A WalObserver allows you to observe events related to writes to the Write-Ahead
 165   Log (WAL). See
 166   link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/WALObserver.html[WALObserver].
 167
 168 <<cp_example,Examples>> provides working examples of observer coprocessors.
 169
 170
 171
 172 [[cpeps]]
 173 === Endpoint Coprocessor
 174
 175 Endpoint processors allow you to perform computation at the location of the data.
 176 See <<cp_analogies, Coprocessor Analogy>>. An example is the need to calculate a running
 177 average or summation for an entire table which spans hundreds of regions.
 178
 179 In contrast to observer coprocessors, where your code is run transparently, endpoint
 180 coprocessors must be explicitly invoked using the
 181 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/AsyncTable.html#coprocessorService-java.util.function.Function-org.apache.hadoop.hbase.client.ServiceCaller-byte:A-[CoprocessorService()]
 182 method available in
 183 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/AsyncTable.html[AsyncTable].
 184
 185 [WARNING]
 186 .On using coprocessorService method with sync client
 187 ====
 188 The coprocessorService method in link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Table.html[Table]
 189 has been deprecated.
 190
 191 In link:https://issues.apache.org/jira/browse/HBASE-21512[HBASE-21512]
 192 we reimplement the sync client based on the async client. The coprocessorService
 193 method defined in `Table` interface directly references a method from protobuf's
 194 `BlockingInterface`, which means we need to use a separate thread pool to execute
 195 the method so we avoid blocking the async client(We want to avoid blocking calls in
 196 our async implementation).
 197
 198 Since coprocessor is an advanced feature, we believe it is OK for coprocessor users to
 199 instead switch over to use `AsyncTable`. There is a lightweight
 200 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/client/Connection.html#toAsyncConnection--[toAsyncConnection]
 201 method to get an `AsyncConnection` from `Connection` if needed.
 202 ====
 203
 204 Starting with HBase 0.96, endpoint coprocessors are implemented using Google Protocol
 205 Buffers (protobuf). For more details on protobuf, see Google's
 206 link:https://developers.google.com/protocol-buffers/docs/proto[Protocol Buffer Guide].
 207 Endpoints Coprocessor written in version 0.94 are not compatible with version 0.96 or later.
 208 See
 209 link:https://issues.apache.org/jira/browse/HBASE-5448[HBASE-5448]). To upgrade your
 210 HBase cluster from 0.94 or earlier to 0.96 or later, you need to reimplement your
 211 coprocessor.
 212
 213 In HBase 2.x, we made use of a shaded version of protobuf 3.x, but kept the
 214 protobuf for coprocessors on 2.5.0. In HBase 3.0.0, we removed all dependencies on
 215 non-shaded protobuf so you need to reimplement your coprocessor to make use of the
 216 shaded protobuf version provided in hbase-thirdparty. Please see
 217 the <<protobuf,protobuf>> section for more details.
 218
 219 Coprocessor Endpoints should make no use of HBase internals and
 220 only avail of public APIs; ideally a CPEP should depend on Interfaces
 221 and data structures only. This is not always possible but beware
 222 that doing so makes the Endpoint brittle, liable to breakage as HBase
 223 internals evolve. HBase internal APIs annotated as private or evolving
 224 do not have to respect semantic versioning rules or general java rules on
 225 deprecation before removal. While generated protobuf files are
 226 absent the hbase audience annotations -- they are created by the
 227 protobuf protoc tool which knows nothing of how HBase works --
 228 they should be consided `@InterfaceAudience.Private` so are liable to
 229 change.
 230
 231 <<cp_example,Examples>> provides working examples of endpoint coprocessors.
 232
 233 [[cp_loading]]
 234 == Loading Coprocessors
 235
 236 To make your coprocessor available to HBase, it must be _loaded_, either statically
 237 (through the HBase configuration) or dynamically (using HBase Shell or the Java API).
 238
 239 === Static Loading
 240
 241 Follow these steps to statically load your coprocessor. Keep in mind that you must
 242 restart HBase to unload a coprocessor that has been loaded statically.
 243
 244 . Define the Coprocessor in _hbase-site.xml_, with a <property> element with a <name>
 245 and a <value> sub-element. The <name> should be one of the following:
 246 +
 247 - `hbase.coprocessor.region.classes` for RegionObservers and Endpoints.
 248 - `hbase.coprocessor.wal.classes` for WALObservers.
 249 - `hbase.coprocessor.master.classes` for MasterObservers.
 250 +
 251 <value> must contain the fully-qualified class name of your coprocessor's implementation
 252 class.
 253 +
 254 For example to load a Coprocessor (implemented in class SumEndPoint.java) you have to create
 255 following entry in RegionServer's 'hbase-site.xml' file (generally located under 'conf' directory):
 256 +
 257 [source,xml]
 258 ----
 259 <property>
 260     <name>hbase.coprocessor.region.classes</name>
 261     <value>org.myname.hbase.coprocessor.endpoint.SumEndPoint</value>
 262 </property>
 263 ----
 264 +
 265 If multiple classes are specified for loading, the class names must be comma-separated.
 266 The framework attempts to load all the configured classes using the default class loader.
 267 Therefore, the jar file must reside on the server-side HBase classpath.
 268
 269 +
 270 Coprocessors which are loaded in this way will be active on all regions of all tables.
 271 These are also called system Coprocessor.
 272 The first listed Coprocessors will be assigned the priority `Coprocessor.Priority.SYSTEM`.
 273 Each subsequent coprocessor in the list will have its priority value incremented by one (which
 274 reduces its priority, because priorities have the natural sort order of Integers).
 275
 276 +
 277 These priority values can be manually overriden in hbase-site.xml. This can be useful if you
 278 want to guarantee that a coprocessor will execute after another. For example, in the following
 279 configuration `SumEndPoint` would be guaranteed to go last, except in the case of a tie with
 280 another coprocessor:
 281 +
 282 [source,xml]
 283 ----
 284 <property>
 285     <name>hbase.coprocessor.region.classes</name>
 286     <value>org.myname.hbase.coprocessor.endpoint.SumEndPoint|2147483647</value>
 287 </property>
 288 ----
 289
 290 +
 291 When calling out to registered observers, the framework executes their callbacks methods in the
 292 sorted order of their priority. +
 293 Ties are broken arbitrarily.
 294
 295 . Put your code on HBase's classpath. One easy way to do this is to drop the jar
 296   (containing you code and all the dependencies) into the `lib/` directory in the
 297   HBase installation.
 298
 299 . Restart HBase.
 300
 301
 302 === Static Unloading
 303
 304 . Delete the coprocessor's <property> element, including sub-elements, from `hbase-site.xml`.
 305 . Restart HBase.
 306 . Optionally, remove the coprocessor's JAR file from the classpath or HBase's `lib/`
 307   directory.
 308
 309
 310 === Dynamic Loading
 311
 312 You can also load a coprocessor dynamically, without restarting HBase. This may seem
 313 preferable to static loading, but dynamically loaded coprocessors are loaded on a
 314 per-table basis, and are only available to the table for which they were loaded. For
 315 this reason, dynamically loaded tables are sometimes called *Table Coprocessor*.
 316
 317 In addition, dynamically loading a coprocessor acts as a schema change on the table,
 318 and the table must be taken offline to load the coprocessor.
 319
 320 There are three ways to dynamically load Coprocessor.
 321
 322 [NOTE]
 323 .Assumptions
 324 ====
 325 The below mentioned instructions makes the following assumptions:
 326
 327 * A JAR called `coprocessor.jar` contains the Coprocessor implementation along with all of its
 328 dependencies.
 329 * The JAR is available in HDFS in some location like
 330 `hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar`.
 331 ====
 332
 333 [[load_coprocessor_in_shell]]
 334 ==== Using HBase Shell
 335
 336 . Load the Coprocessor, using a command like the following:
 337 +
 338 [source]
 339 ----
 340 hbase alter 'users', METHOD => 'table_att', 'Coprocessor'=>'hdfs://<namenode>:<port>/
 341 user/<hadoop-user>/coprocessor.jar| org.myname.hbase.Coprocessor.RegionObserverExample|1073741823|
 342 arg1=1,arg2=2'
 343 ----
 344 +
 345 The Coprocessor framework will try to read the class information from the coprocessor table
 346 attribute value.
 347 The value contains four pieces of information which are separated by the pipe (`|`) character.
 348 +
 349 * File path: The jar file containing the Coprocessor implementation must be in a location where
 350 all region servers can read it. +
 351 You could copy the file onto the local disk on each region server, but it is recommended to store
 352 it in HDFS. +
 353 https://issues.apache.org/jira/browse/HBASE-14548[HBASE-14548] allows a directory containing the jars
 354 or some wildcards to be specified, such as: hdfs://<namenode>:<port>/user/<hadoop-user>/ or
 355 hdfs://<namenode>:<port>/user/<hadoop-user>/*.jar. Please note that if a directory is specified,
 356 all jar files(.jar) in the directory are added. It does not search for files in sub-directories.
 357 Do not use a wildcard if you would like to specify a directory. This enhancement applies to the
 358 usage via the JAVA API as well.
 359 * Class name: The full class name of the Coprocessor.
 360 * Priority: An integer. The framework will determine the execution sequence of all configured
 361 observers registered at the same hook using priorities. This field can be left blank. In that
 362 case the framework will assign a default priority value.
 363 * Arguments (Optional): This field is passed to the Coprocessor implementation. This is optional.
 364
 365 . Verify that the coprocessor loaded:
 366 +
 367 ----
 368 hbase(main):04:0> describe 'users'
 369 ----
 370 +
 371 The coprocessor should be listed in the `TABLE_ATTRIBUTES`.
 372
 373 ==== Using the Java API (all HBase versions)
 374
 375 The following Java code shows how to use the `setValue()` method of `HTableDescriptor`
 376 to load a coprocessor on the `users` table.
 377
 378 [source,java]
 379 ----
 380 TableName tableName = TableName.valueOf("users");
 381 String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
 382 Configuration conf = HBaseConfiguration.create();
 383 Connection connection = ConnectionFactory.createConnection(conf);
 384 Admin admin = connection.getAdmin();
 385 HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
 386 HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
 387 columnFamily1.setMaxVersions(3);
 388 hTableDescriptor.addFamily(columnFamily1);
 389 HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
 390 columnFamily2.setMaxVersions(3);
 391 hTableDescriptor.addFamily(columnFamily2);
 392 hTableDescriptor.setValue("COPROCESSOR$1", path + "|"
 393 + RegionObserverExample.class.getCanonicalName() + "|"
 394 + Coprocessor.PRIORITY_USER);
 395 admin.modifyTable(tableName, hTableDescriptor);
 396 ----
 397
 398 ==== Using the Java API (HBase 0.96+ only)
 399
 400 In HBase 0.96 and newer, the `addCoprocessor()` method of `HTableDescriptor` provides
 401 an easier way to load a coprocessor dynamically.
 402
 403 [source,java]
 404 ----
 405 TableName tableName = TableName.valueOf("users");
 406 Path path = new Path("hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar");
 407 Configuration conf = HBaseConfiguration.create();
 408 Connection connection = ConnectionFactory.createConnection(conf);
 409 Admin admin = connection.getAdmin();
 410 HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
 411 HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
 412 columnFamily1.setMaxVersions(3);
 413 hTableDescriptor.addFamily(columnFamily1);
 414 HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
 415 columnFamily2.setMaxVersions(3);
 416 hTableDescriptor.addFamily(columnFamily2);
 417 hTableDescriptor.addCoprocessor(RegionObserverExample.class.getCanonicalName(), path,
 418 Coprocessor.PRIORITY_USER, null);
 419 admin.modifyTable(tableName, hTableDescriptor);
 420 ----
 421
 422 WARNING: There is no guarantee that the framework will load a given Coprocessor successfully.
 423 For example, the shell command neither guarantees a jar file exists at a particular location nor
 424 verifies whether the given class is actually contained in the jar file.
 425
 426
 427 === Dynamic Unloading
 428
 429 ==== Using HBase Shell
 430
 431 . Alter the table to remove the coprocessor with `table_att_unset`.
 432 +
 433 [source]
 434 ----
 435 hbase> alter 'users', METHOD => 'table_att_unset', NAME => 'coprocessor$1'
 436 ----
 437
 438 . Alter the table to remove the coprocessor with `table_remove_coprocessor` introduced in
 439 link:https://issues.apache.org/jira/browse/HBASE-26524[HBASE-26524] by specifying an explicit
 440 classname
 441 +
 442 [source]
 443 ----
 444 hbase> alter 'users', METHOD => 'table_remove_coprocessor', CLASSNAME =>
 445          'org.myname.hbase.Coprocessor.RegionObserverExample'
 446 ----
 447
 448
 449 ==== Using the Java API
 450
 451 Reload the table definition without setting the value of the coprocessor either by
 452 using `setValue()` or `addCoprocessor()` methods. This will remove any coprocessor
 453 attached to the table.
 454
 455 [source,java]
 456 ----
 457 TableName tableName = TableName.valueOf("users");
 458 String path = "hdfs://<namenode>:<port>/user/<hadoop-user>/coprocessor.jar";
 459 Configuration conf = HBaseConfiguration.create();
 460 Connection connection = ConnectionFactory.createConnection(conf);
 461 Admin admin = connection.getAdmin();
 462 HTableDescriptor hTableDescriptor = new HTableDescriptor(tableName);
 463 HColumnDescriptor columnFamily1 = new HColumnDescriptor("personalDet");
 464 columnFamily1.setMaxVersions(3);
 465 hTableDescriptor.addFamily(columnFamily1);
 466 HColumnDescriptor columnFamily2 = new HColumnDescriptor("salaryDet");
 467 columnFamily2.setMaxVersions(3);
 468 hTableDescriptor.addFamily(columnFamily2);
 469 admin.modifyTable(tableName, hTableDescriptor);
 470 ----
 471
 472 In HBase 0.96 and newer, you can instead use the `removeCoprocessor()` method of the
 473 `HTableDescriptor` class.
 474
 475
 476 [[cp_example]]
 477 == Examples
 478 HBase ships examples for Observer Coprocessor.
 479
 480 A more detailed example is given below.
 481
 482 These examples assume a table called `users`, which has two column families `personalDet`
 483 and `salaryDet`, containing personal and salary details. Below is the graphical representation
 484 of the `users` table.
 485
 486 .Users Table
 487 [width="100%",cols="7",options="header,footer"]
 488 |====================
 489 | 3+|personalDet  3+|salaryDet
 490 |*rowkey* |*name* |*lastname* |*dob* |*gross* |*net* |*allowances*
 491 |admin |Admin |Admin |  3+|
 492 |cdickens |Charles |Dickens |02/07/1812 |10000 |8000 |2000
 493 |jverne |Jules |Verne |02/08/1828 |12000 |9000 |3000
 494 |====================
 495
 496
 497 === Observer Example
 498
 499 The following Observer coprocessor prevents the details of the user `admin` from being
 500 returned in a `Get` or `Scan` of the `users` table.
 501
 502 . Write a class that implements the
 503 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionCoprocessor.html[RegionCoprocessor],
 504 link:https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html[RegionObserver]
 505 class.
 506
 507 . Override the `preGetOp()` method (the `preGet()` method is deprecated) to check
 508 whether the client has queried for the rowkey with value `admin`. If so, return an
 509 empty result. Otherwise, process the request as normal.
 510
 511 . Put your code and dependencies in a JAR file.
 512
 513 . Place the JAR in HDFS where HBase can locate it.
 514
 515 . Load the Coprocessor.
 516
 517 . Write a simple program to test it.
 518
 519 Following are the implementation of the above steps:
 520
 521 [source,java]
 522 ----
 523 public class RegionObserverExample implements RegionCoprocessor, RegionObserver {
 524
 525     private static final byte[] ADMIN = Bytes.toBytes("admin");
 526     private static final byte[] COLUMN_FAMILY = Bytes.toBytes("details");
 527     private static final byte[] COLUMN = Bytes.toBytes("Admin_det");
 528     private static final byte[] VALUE = Bytes.toBytes("You can't see Admin details");
 529
 530     @Override
 531     public Optional<RegionObserver> getRegionObserver() {
 532       return Optional.of(this);
 533     }
 534
 535     @Override
 536     public void preGetOp(final ObserverContext<RegionCoprocessorEnvironment> e, final Get get, final List<Cell> results)
 537     throws IOException {
 538
 539         if (Bytes.equals(get.getRow(),ADMIN)) {
 540             Cell c = CellUtil.createCell(get.getRow(),COLUMN_FAMILY, COLUMN,
 541             System.currentTimeMillis(), (byte)4, VALUE);
 542             results.add(c);
 543             e.bypass();
 544         }
 545     }
 546 }
 547 ----
 548
 549 Overriding the `preGetOp()` will only work for `Get` operations. You also need to override
 550 the `preScannerOpen()` method to filter the `admin` row from scan results.
 551
 552 [source,java]
 553 ----
 554 @Override
 555 public RegionScanner preScannerOpen(final ObserverContext<RegionCoprocessorEnvironment> e, final Scan scan,
 556 final RegionScanner s) throws IOException {
 557
 558     Filter filter = new RowFilter(CompareOp.NOT_EQUAL, new BinaryComparator(ADMIN));
 559     scan.setFilter(filter);
 560     return s;
 561 }
 562 ----
 563
 564 This method works but there is a _side effect_. If the client has used a filter in
 565 its scan, that filter will be replaced by this filter. Instead, you can explicitly
 566 remove any `admin` results from the scan:
 567
 568 [source,java]
 569 ----
 570 @Override
 571 public boolean postScannerNext(final ObserverContext<RegionCoprocessorEnvironment> e, final InternalScanner s,
 572 final List<Result> results, final int limit, final boolean hasMore) throws IOException {
 573         Result result = null;
 574     Iterator<Result> iterator = results.iterator();
 575     while (iterator.hasNext()) {
 576     result = iterator.next();
 577         if (Bytes.equals(result.getRow(), ROWKEY)) {
 578             iterator.remove();
 579             break;
 580         }
 581     }
 582     return hasMore;
 583 }
 584 ----
 585
 586 === Endpoint Example
 587
 588 Still using the `users` table, this example implements a coprocessor to calculate
 589 the sum of all employee salaries, using an endpoint coprocessor.
 590
 591 . Create a '.proto' file defining your service.
 592 +
 593 [source]
 594 ----
 595 option java_package = "org.myname.hbase.coprocessor.autogenerated";
 596 option java_outer_classname = "Sum";
 597 option java_generic_services = true;
 598 option java_generate_equals_and_hash = true;
 599 option optimize_for = SPEED;
 600 message SumRequest {
 601     required string family = 1;
 602     required string column = 2;
 603 }
 604
 605 message SumResponse {
 606   required int64 sum = 1 [default = 0];
 607 }
 608
 609 service SumService {
 610   rpc getSum(SumRequest)
 611     returns (SumResponse);
 612 }
 613 ----
 614
 615 . Execute the `protoc` command to generate the Java code from the above .proto' file.
 616 +
 617 [source]
 618 ----
 619 $ mkdir src
 620 $ protoc --java_out=src ./sum.proto
 621 ----
 622 +
 623 This will generate a class call `Sum.java`.
 624
 625 . Write a class that extends the generated service class, implement the `Coprocessor`
 626 and `CoprocessorService` classes, and override the service method.
 627 +
 628 WARNING: If you load a coprocessor from `hbase-site.xml` and then load the same coprocessor
 629 again using HBase Shell, it will be loaded a second time. The same class will
 630 exist twice, and the second instance will have a higher ID (and thus a lower priority).
 631 The effect is that the duplicate coprocessor is effectively ignored.
 632 +
 633 [source, java]
 634 ----
 635 public class SumEndPoint extends Sum.SumService implements Coprocessor, CoprocessorService {
 636
 637     private RegionCoprocessorEnvironment env;
 638
 639     @Override
 640     public Service getService() {
 641         return this;
 642     }
 643
 644     @Override
 645     public void start(CoprocessorEnvironment env) throws IOException {
 646         if (env instanceof RegionCoprocessorEnvironment) {
 647             this.env = (RegionCoprocessorEnvironment)env;
 648         } else {
 649             throw new CoprocessorException("Must be loaded on a table region!");
 650         }
 651     }
 652
 653     @Override
 654     public void stop(CoprocessorEnvironment env) throws IOException {
 655         // do nothing
 656     }
 657
 658     @Override
 659     public void getSum(RpcController controller, Sum.SumRequest request, RpcCallback<Sum.SumResponse> done) {
 660         Scan scan = new Scan();
 661         scan.addFamily(Bytes.toBytes(request.getFamily()));
 662         scan.addColumn(Bytes.toBytes(request.getFamily()), Bytes.toBytes(request.getColumn()));
 663
 664         Sum.SumResponse response = null;
 665         InternalScanner scanner = null;
 666
 667         try {
 668             scanner = env.getRegion().getScanner(scan);
 669             List<Cell> results = new ArrayList<>();
 670             boolean hasMore = false;
 671             long sum = 0L;
 672
 673             do {
 674                 hasMore = scanner.next(results);
 675                 for (Cell cell : results) {
 676                     sum = sum + Bytes.toLong(CellUtil.cloneValue(cell));
 677                 }
 678                 results.clear();
 679             } while (hasMore);
 680
 681             response = Sum.SumResponse.newBuilder().setSum(sum).build();
 682         } catch (IOException ioe) {
 683             ResponseConverter.setControllerException(controller, ioe);
 684         } finally {
 685             if (scanner != null) {
 686                 try {
 687                     scanner.close();
 688                 } catch (IOException ignored) {}
 689             }
 690         }
 691
 692         done.run(response);
 693     }
 694 }
 695 ----
 696 +
 697 [source, java]
 698 ----
 699 Configuration conf = HBaseConfiguration.create();
 700 Connection connection = ConnectionFactory.createConnection(conf);
 701 TableName tableName = TableName.valueOf("users");
 702 Table table = connection.getTable(tableName);
 703
 704 final Sum.SumRequest request = Sum.SumRequest.newBuilder().setFamily("salaryDet").setColumn("gross").build();
 705 try {
 706     Map<byte[], Long> results = table.coprocessorService(
 707         Sum.SumService.class,
 708         null,  /* start key */
 709         null,  /* end   key */
 710         new Batch.Call<Sum.SumService, Long>() {
 711             @Override
 712             public Long call(Sum.SumService aggregate) throws IOException {
 713                 BlockingRpcCallback<Sum.SumResponse> rpcCallback = new BlockingRpcCallback<>();
 714                 aggregate.getSum(null, request, rpcCallback);
 715                 Sum.SumResponse response = rpcCallback.get();
 716
 717                 return response.hasSum() ? response.getSum() : 0L;
 718             }
 719         }
 720     );
 721
 722     for (Long sum : results.values()) {
 723         System.out.println("Sum = " + sum);
 724     }
 725 } catch (ServiceException e) {
 726     e.printStackTrace();
 727 } catch (Throwable e) {
 728     e.printStackTrace();
 729 }
 730 ----
 731
 732 . Load the Coprocessor.
 733
 734 . Write a client code to call the Coprocessor.
 735
 736
 737 == Guidelines For Deploying A Coprocessor
 738
 739 Bundling Coprocessors::
 740   You can bundle all classes for a coprocessor into a
 741   single JAR on the RegionServer's classpath, for easy deployment. Otherwise,
 742   place all dependencies  on the RegionServer's classpath so that they can be
 743   loaded during RegionServer start-up.  The classpath for a RegionServer is set
 744   in the RegionServer's `hbase-env.sh` file.
 745 Automating Deployment::
 746   You can use a tool such as Puppet, Chef, or
 747   Ansible to ship the JAR for the coprocessor  to the required location on your
 748   RegionServers' filesystems and restart each RegionServer,  to automate
 749   coprocessor deployment. Details for such set-ups are out of scope of  this
 750   document.
 751 Updating a Coprocessor::
 752   Deploying a new version of a given coprocessor is not as simple as disabling it,
 753   replacing the JAR, and re-enabling the coprocessor. This is because you cannot
 754   reload a class in a JVM unless you delete all the current references to it.
 755   Since the current JVM has reference to the existing coprocessor, you must restart
 756   the JVM, by restarting the RegionServer, in order to replace it. This behavior
 757   is not expected to change.
 758 Coprocessor Logging::
 759   The Coprocessor framework does not provide an API for logging beyond standard Java
 760   logging.
 761 Coprocessor Configuration::
 762   If you do not want to load coprocessors from the HBase Shell, you can add their configuration
 763   properties to `hbase-site.xml`. In <<load_coprocessor_in_shell>>, two arguments are
 764   set: `arg1=1,arg2=2`. These could have been added to `hbase-site.xml` as follows:
 765 [source,xml]
 766 ----
 767 <property>
 768   <name>arg1</name>
 769   <value>1</value>
 770 </property>
 771 <property>
 772   <name>arg2</name>
 773   <value>2</value>
 774 </property>
 775 ----
 776 Then you can read the configuration using code like the following:
 777 [source,java]
 778 ----
 779 Configuration conf = HBaseConfiguration.create();
 780 Connection connection = ConnectionFactory.createConnection(conf);
 781 TableName tableName = TableName.valueOf("users");
 782 Table table = connection.getTable(tableName);
 783
 784 Get get = new Get(Bytes.toBytes("admin"));
 785 Result result = table.get(get);
 786 for (Cell c : result.rawCells()) {
 787     System.out.println(Bytes.toString(CellUtil.cloneRow(c))
 788         + "==> " + Bytes.toString(CellUtil.cloneFamily(c))
 789         + "{" + Bytes.toString(CellUtil.cloneQualifier(c))
 790         + ":" + Bytes.toLong(CellUtil.cloneValue(c)) + "}");
 791 }
 792 Scan scan = new Scan();
 793 ResultScanner scanner = table.getScanner(scan);
 794 for (Result res : scanner) {
 795     for (Cell c : res.rawCells()) {
 796         System.out.println(Bytes.toString(CellUtil.cloneRow(c))
 797         + " ==> " + Bytes.toString(CellUtil.cloneFamily(c))
 798         + " {" + Bytes.toString(CellUtil.cloneQualifier(c))
 799         + ":" + Bytes.toLong(CellUtil.cloneValue(c))
 800         + "}");
 801     }
 802 }
 803 ----
 804
 805 == Restricting Coprocessor Usage
 806
 807 Restricting arbitrary user coprocessors can be a big concern in multitenant environments. HBase provides a continuum of options for ensuring only expected coprocessors are running:
 808
 809 - `hbase.coprocessor.enabled`: Enables or disables all coprocessors. This will limit the functionality of HBase, as disabling all coprocessors will disable some security providers. An example coproccessor so affected is `org.apache.hadoop.hbase.security.access.AccessController`.
 810 * `hbase.coprocessor.user.enabled`: Enables or disables loading coprocessors on tables (i.e. user coprocessors).
 811 * One can statically load coprocessors, and optionally tune their priorities, via the following tunables in `hbase-site.xml`:
 812 ** `hbase.coprocessor.regionserver.classes`: A comma-separated list of coprocessors that are loaded by region servers
 813 ** `hbase.coprocessor.region.classes`: A comma-separated list of RegionObserver and Endpoint coprocessors
 814 ** `hbase.coprocessor.user.region.classes`: A comma-separated list of coprocessors that are loaded by all regions
 815 ** `hbase.coprocessor.master.classes`: A comma-separated list of coprocessors that are loaded by the master (MasterObserver coprocessors)
 816 ** `hbase.coprocessor.wal.classes`: A comma-separated list of WALObserver coprocessors to load
 817 * `hbase.coprocessor.abortonerror`: Whether to abort the daemon which has loaded the coprocessor if the coprocessor should error other than `IOError`. If this is set to false and an access controller coprocessor should have a fatal error the coprocessor will be circumvented, as such in secure installations this is advised to be `true`; however, one may override this on a per-table basis for user coprocessors, to ensure they do not abort their running region server and are instead unloaded on error.
 818 * `hbase.coprocessor.region.whitelist.paths`: A comma separated list available for those loading `org.apache.hadoop.hbase.security.access.CoprocessorWhitelistMasterObserver` whereby one can use the following options to white-list paths from which coprocessors may be loaded.
 819 ** Coprocessors on the classpath are implicitly white-listed
 820 ** `*` to wildcard all coprocessor paths
 821 ** An entire filesystem (e.g. `hdfs://my-cluster/`)
 822 ** A wildcard path to be evaluated by link:https://commons.apache.org/proper/commons-io/javadocs/api-release/org/apache/commons/io/FilenameUtils.html[FilenameUtils.wildcardMatch]
 823 ** Note: Path can specify scheme or not (e.g. `file:///usr/hbase/lib/coprocessors` or for all filesystems `/usr/hbase/lib/coprocessors`)