4 * Licensed to the Apache Software Foundation (ASF) under one
5 * or more contributor license agreements. See the NOTICE file
6 * distributed with this work for additional information
7 * regarding copyright ownership. The ASF licenses this file
8 * to you under the Apache License, Version 2.0 (the
9 * "License"); you may not use this file except in compliance
10 * with the License. You may obtain a copy of the License at
12 * http://www.apache.org/licenses/LICENSE-2.0
14 * Unless required by applicable law or agreed to in writing, software
15 * distributed under the License is distributed on an "AS IS" BASIS,
16 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17 * See the License for the specific language governing permissions and
18 * limitations under the License.
30 In HBase, data is stored in tables, which have rows and columns.
31 This is a terminology overlap with relational databases (RDBMSs), but this is not a helpful analogy.
32 Instead, it can be helpful to think of an HBase table as a multi-dimensional map.
34 .HBase Data Model Terminology
37 An HBase table consists of multiple rows.
40 A row in HBase consists of a row key and one or more columns with values associated with them.
41 Rows are sorted alphabetically by the row key as they are stored.
42 For this reason, the design of the row key is very important.
43 The goal is to store data in such a way that related rows are near each other.
44 A common row key pattern is a website domain.
45 If your row keys are domains, you should probably store them in reverse (org.apache.www, org.apache.mail, org.apache.jira). This way, all of the Apache domains are near each other in the table, rather than being spread out based on the first letter of the subdomain.
48 A column in HBase consists of a column family and a column qualifier, which are delimited by a `:` (colon) character.
51 Column families physically colocate a set of columns and their values, often for performance reasons.
52 Each column family has a set of storage properties, such as whether its values should be cached in memory, how its data is compressed or its row keys are encoded, and others.
53 Each row in a table has the same column families, though a given row might not store anything in a given column family.
56 A column qualifier is added to a column family to provide the index for a given piece of data.
57 Given a column family `content`, a column qualifier might be `content:html`, and another might be `content:pdf`.
58 Though column families are fixed at table creation, column qualifiers are mutable and may differ greatly between rows.
61 A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value's version.
64 A timestamp is written alongside each value, and is the identifier for a given version of a value.
65 By default, the timestamp represents the time on the RegionServer when the data was written, but you can specify a different timestamp value when you put data into the cell.
70 You can read a very understandable explanation of the HBase data model in the blog post link:https://dzone.com/articles/understanding-hbase-and-bigtab[Understanding HBase and BigTable] by Jim R. Wilson.
71 Another good explanation is available in the PDF link:http://0b4af6cdc2f0c5998459-c0245c5c937c5dedcca3f1764ecc9b2f.r43.cf2.rackcdn.com/9353-login1210_khurana.pdf[Introduction to Basic Schema Design] by Amandeep Khurana.
73 It may help to read different perspectives to get a solid understanding of HBase schema design.
74 The linked articles cover the same ground as the information in this section.
76 The following example is a slightly modified form of the one on page 2 of the link:http://research.google.com/archive/bigtable.html[BigTable] paper.
77 There is a table called `webtable` that contains two rows (`com.cnn.www` and `com.example.www`) and three column families named `contents`, `anchor`, and `people`.
78 In this example, for the first row (`com.cnn.www`), `anchor` contains two columns (`anchor:cssnsi.com`, `anchor:my.look.ca`) and `contents` contains one column (`contents:html`). This example contains 5 versions of the row with the row key `com.cnn.www`, and one version of the row with the row key `com.example.www`.
79 The `contents:html` column qualifier contains the entire HTML of a given website.
80 Qualifiers of the `anchor` column family each contain the external site which links to the site represented by the row, along with the text it used in the anchor of its link.
81 The `people` column family represents people associated with the site.
86 By convention, a column name is made of its column family prefix and a _qualifier_.
87 For example, the column _contents:html_ is made up of the column family `contents` and the `html` qualifier.
88 The colon character (`:`) delimits the column family from the column family _qualifier_.
92 [cols="1,1,1,1,1", frame="all", options="header"]
94 |Row Key |Time Stamp |ColumnFamily `contents` |ColumnFamily `anchor`|ColumnFamily `people`
95 |"com.cnn.www" |t9 | |anchor:cnnsi.com = "CNN" |
96 |"com.cnn.www" |t8 | |anchor:my.look.ca = "CNN.com" |
97 |"com.cnn.www" |t6 | contents:html = "<html>..." | |
98 |"com.cnn.www" |t5 | contents:html = "<html>..." | |
99 |"com.cnn.www" |t3 | contents:html = "<html>..." | |
100 |"com.example.www"| t5 | contents:html = "<html>..." | | people:author = "John Doe"
103 Cells in this table that appear to be empty do not take space, or in fact exist, in HBase.
104 This is what makes HBase "sparse." A tabular view is not the only possible way to look at data in HBase, or even the most accurate.
105 The following represents the same information as a multi-dimensional map.
106 This is only a mock-up for illustrative purposes and may not be strictly accurate.
113 t6: contents:html: "<html>..."
114 t5: contents:html: "<html>..."
115 t3: contents:html: "<html>..."
118 t9: anchor:cnnsi.com = "CNN"
119 t8: anchor:my.look.ca = "CNN.com"
125 t5: contents:html: "<html>..."
129 t5: people:author: "John Doe"
138 Although at a conceptual level tables may be viewed as a sparse set of rows, they are physically stored by column family.
139 A new column qualifier (column_family:column_qualifier) can be added to an existing column family at any time.
141 .ColumnFamily `anchor`
142 [cols="1,1,1", frame="all", options="header"]
144 |Row Key | Time Stamp |Column Family `anchor`
145 |"com.cnn.www" |t9 |`anchor:cnnsi.com = "CNN"`
146 |"com.cnn.www" |t8 |`anchor:my.look.ca = "CNN.com"`
150 .ColumnFamily `contents`
151 [cols="1,1,1", frame="all", options="header"]
153 |Row Key |Time Stamp |ColumnFamily `contents:`
154 |"com.cnn.www" |t6 |contents:html = "<html>..."
155 |"com.cnn.www" |t5 |contents:html = "<html>..."
156 |"com.cnn.www" |t3 |contents:html = "<html>..."
160 The empty cells shown in the conceptual view are not stored at all.
161 Thus a request for the value of the `contents:html` column at time stamp `t8` would return no value.
162 Similarly, a request for an `anchor:my.look.ca` value at time stamp `t9` would return no value.
163 However, if no timestamp is supplied, the most recent value for a particular column would be returned.
164 Given multiple versions, the most recent is also the first one found, since timestamps are stored in descending order.
165 Thus a request for the values of all columns in the row `com.cnn.www` if no timestamp is specified would be: the value of `contents:html` from timestamp `t6`, the value of `anchor:cnnsi.com` from timestamp `t9`, the value of `anchor:my.look.ca` from timestamp `t8`.
167 For more information about the internals of how Apache HBase stores data, see <<regions.arch,regions.arch>>.
171 A namespace is a logical grouping of tables analogous to a database in relation database systems.
172 This abstraction lays the groundwork for upcoming multi-tenancy related features:
174 * Quota Management (link:https://issues.apache.org/jira/browse/HBASE-8410[HBASE-8410]) - Restrict the amount of resources (i.e. regions, tables) a namespace can consume.
175 * Namespace Security Administration (link:https://issues.apache.org/jira/browse/HBASE-9206[HBASE-9206]) - Provide another level of security administration for tenants.
176 * Region server groups (link:https://issues.apache.org/jira/browse/HBASE-6721[HBASE-6721]) - A namespace/table can be pinned onto a subset of RegionServers thus guaranteeing a coarse level of isolation.
178 [[namespace_creation]]
179 === Namespace management
181 A namespace can be created, removed or altered.
182 Namespace membership is determined during table creation by specifying a fully-qualified table name of the form:
186 <table namespace>:<table qualifier>
195 create_namespace 'my_ns'
201 #create my_table in my_ns namespace
202 create 'my_ns:my_table', 'fam'
209 drop_namespace 'my_ns'
216 alter_namespace 'my_ns', {METHOD => 'set', 'PROPERTY_NAME' => 'PROPERTY_VALUE'}
220 [[namespace_special]]
221 === Predefined namespaces
223 There are two predefined special namespaces:
225 * hbase - system namespace, used to contain HBase internal tables
226 * default - tables with no explicit specified namespace will automatically fall into this namespace
233 #namespace=foo and table qualifier=bar
234 create 'foo:bar', 'fam'
236 #namespace=default and table qualifier=bar
243 Tables are declared up front at schema definition time.
247 Row keys are uninterpreted bytes.
248 Rows are lexicographically sorted with the lowest order appearing first in a table.
249 The empty byte array is used to denote both the start and end of a tables' namespace.
254 Columns in Apache HBase are grouped into _column families_.
255 All column members of a column family have the same prefix.
256 For example, the columns _courses:history_ and _courses:math_ are both members of the _courses_ column family.
257 The colon character (`:`) delimits the column family from the column family qualifier.
258 The column family prefix must be composed of _printable_ characters.
259 The qualifying tail, the column family _qualifier_, can be made of any arbitrary bytes.
260 Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up and running.
262 Physically, all column family members are stored together on the filesystem.
263 Because tunings and storage specifications are done at the column family level, it is advised that all column family members have the same general access pattern and size characteristics.
267 A _{row, column, version}_ tuple exactly specifies a `cell` in HBase.
268 Cell content is uninterpreted bytes
270 == Data Model Operations
272 The four primary data model operations are Get, Put, Scan, and Delete.
273 Operations are applied via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html[Table] instances.
277 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] returns attributes for a specified row.
278 Gets are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#get-org.apache.hadoop.hbase.client.Get-[Table.get]
282 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Put.html[Put] either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#put-org.apache.hadoop.hbase.client.Put-[Table.put] (non-writeBuffer) or link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#batch-java.util.List-java.lang.Object:A-[Table.batch] (non-writeBuffer)
287 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scan] allow iteration over multiple rows for specified attributes.
289 The following is an example of a Scan on a Table instance.
290 Assume that a table is populated with rows with keys "row1", "row2", "row3", and then another set of rows with the keys "abc1", "abc2", and "abc3". The following example shows how to set a Scan instance to return the rows beginning with "row".
295 public static final byte[] CF = "cf".getBytes();
296 public static final byte[] ATTR = "attr".getBytes();
299 Table table = ... // instantiate a Table instance
301 Scan scan = new Scan();
302 scan.addColumn(CF, ATTR);
303 scan.setStartStopRowForPrefixScan(Bytes.toBytes("row"));
304 ResultScanner rs = table.getScanner(scan);
306 for (Result r = rs.next(); r != null; r = rs.next()) {
310 rs.close(); // always close the ResultScanner!
314 Note that generally the easiest way to specify a specific stop point for a scan is by using the link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/InclusiveStopFilter.html[InclusiveStopFilter] class.
318 link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Delete.html[Delete] removes a row from a table.
319 Deletes are executed via link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Table.html#delete-org.apache.hadoop.hbase.client.Delete-[Table.delete].
321 HBase does not modify data in place, and so deletes are handled by creating new markers called _tombstones_.
322 These tombstones, along with the dead values, are cleaned up on major compactions.
324 See <<version.delete,version.delete>> for more information on deleting versions of columns, and see <<compaction,compaction>> for more information on compactions.
329 A _{row, column, version}_ tuple exactly specifies a `cell` in HBase.
330 It's possible to have an unbounded number of cells where the row and column are the same but the cell address differs only in its version dimension.
332 While rows and column keys are expressed as bytes, the version is specified using a long integer.
333 Typically this long contains time instances such as those returned by `java.util.Date.getTime()` or `System.currentTimeMillis()`, that is: [quote]_the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC_.
335 The HBase version dimension is stored in decreasing order, so that when reading from a store file, the most recent values are found first.
337 There is a lot of confusion over the semantics of `cell` versions, in HBase.
340 * If multiple writes to a cell have the same version, only the last written is fetchable.
341 * It is OK to write cells in a non-increasing version order.
343 Below we describe how the version dimension in HBase currently works.
344 See link:https://issues.apache.org/jira/browse/HBASE-2406[HBASE-2406] for discussion of HBase versions. link:https://www.ngdata.com/bending-time-in-hbase/[Bending time in HBase] makes for a good read on the version, or time, dimension in HBase.
345 It has more detail on versioning than is provided here.
347 As of this writing, the limitation _Overwriting values at existing timestamps_ mentioned in the article no longer holds in HBase.
348 This section is basically a synopsis of this article by Bruno Dumon.
350 [[specify.number.of.versions]]
351 === Specifying the Number of Versions to Store
353 The maximum number of versions to store for a given column is part of the column schema and is specified at table creation, or via an `alter` command, via `HColumnDescriptor.DEFAULT_VERSIONS`.
354 Prior to HBase 0.96, the default number of versions kept was `3`, but in 0.96 and newer has been changed to `1`.
356 .Modify the Maximum Number of Versions for a Column Family
358 This example uses HBase Shell to keep a maximum of 5 versions of all columns in column family `f1`.
359 You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
362 hbase> alter ‘t1′, NAME => ‘f1′, VERSIONS => 5
366 .Modify the Minimum Number of Versions for a Column Family
368 You can also specify the minimum number of versions to store per column family.
369 By default, this is set to 0, which means the feature is disabled.
370 The following example sets the minimum number of versions on all columns in column family `f1` to `2`, via HBase Shell.
371 You could also use link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html[HColumnDescriptor].
374 hbase> alter ‘t1′, NAME => ‘f1′, MIN_VERSIONS => 2
378 Starting with HBase 0.98.2, you can specify a global default for the maximum number of versions kept for all newly-created columns, by setting `hbase.column.max.version` in _hbase-site.xml_.
379 See <<hbase.column.max.version,hbase.column.max.version>>.
382 === Versions and HBase Operations
384 In this section we look at the behavior of the version dimension for each of the core HBase operations.
388 Gets are implemented on top of Scans.
389 The below discussion of link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html[Get] applies equally to link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html[Scans].
391 By default, i.e. if you specify no explicit version, when doing a `get`, the cell whose version has the largest value is returned (which may or may not be the latest one written, see later). The default behavior can be modified in the following ways:
393 * to return more than one version, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setMaxVersions--[Get.setMaxVersions()]
394 * to return versions other than the latest, see link:https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Get.html#setTimeRange-long-long-[Get.setTimeRange()]
396 To retrieve the latest version that is less than or equal to a given value, thus giving the 'latest' state of the record at a certain point in time, just use a range from 0 to the desired version and set the max versions to 1.
399 ==== Default Get Example
401 The following Get will only retrieve the current version of the row
406 public static final byte[] CF = "cf".getBytes();
407 public static final byte[] ATTR = "attr".getBytes();
409 Get get = new Get(Bytes.toBytes("row1"));
410 Result r = table.get(get);
411 byte[] b = r.getValue(CF, ATTR); // returns current version of value
414 ==== Versioned Get Example
416 The following Get will return the last 3 versions of the row.
421 public static final byte[] CF = "cf".getBytes();
422 public static final byte[] ATTR = "attr".getBytes();
424 Get get = new Get(Bytes.toBytes("row1"));
425 get.setMaxVersions(3); // will return last 3 versions of row
426 Result r = table.get(get);
427 byte[] b = r.getValue(CF, ATTR); // returns current version of value
428 List<Cell> cells = r.getColumnCells(CF, ATTR); // returns all versions of this column
433 Doing a put always creates a new version of a `cell`, at a certain timestamp.
434 By default the system uses the server's `currentTimeMillis`, but you can specify the version (= the long integer) yourself, on a per-column level.
435 This means you could assign a time in the past or the future, or use the long value for non-time purposes.
437 To overwrite an existing value, do a put at exactly the same row, column, and version as that of the cell you want to overwrite.
439 ===== Implicit Version Example
441 The following Put will be implicitly versioned by HBase with the current time.
446 public static final byte[] CF = "cf".getBytes();
447 public static final byte[] ATTR = "attr".getBytes();
449 Put put = new Put(Bytes.toBytes(row));
450 put.add(CF, ATTR, Bytes.toBytes( data));
454 ===== Explicit Version Example
456 The following Put has the version timestamp explicitly set.
461 public static final byte[] CF = "cf".getBytes();
462 public static final byte[] ATTR = "attr".getBytes();
464 Put put = new Put( Bytes.toBytes(row));
465 long explicitTimeInMs = 555; // just an example
466 put.add(CF, ATTR, explicitTimeInMs, Bytes.toBytes(data));
470 Caution: the version timestamp is used internally by HBase for things like time-to-live calculations.
471 It's usually best to avoid setting this timestamp yourself.
472 Prefer using a separate timestamp attribute of the row, or have the timestamp as a part of the row key, or both.
474 ===== Cell Version Example
476 The following Put uses a method getCellBuilder() to get a CellBuilder instance
477 that already has relevant Type and Row set.
482 public static final byte[] CF = "cf".getBytes();
483 public static final byte[] ATTR = "attr".getBytes();
486 Put put = new Put(Bytes.toBytes(row));
487 put.add(put.getCellBuilder().setQualifier(ATTR)
489 .setValue(Bytes.toBytes(data))
497 There are three different types of internal delete markers.
498 See Lars Hofhansl's blog for discussion of his attempt adding another, link:http://hadoop-hbase.blogspot.com/2012/01/scanning-in-hbase.html[Scanning in HBase: Prefix Delete Marker].
500 * Delete: for a specific version of a column.
501 * Delete column: for all versions of a column.
502 * Delete family: for all columns of a particular ColumnFamily
504 When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column).
506 Deletes work by creating _tombstone_ markers.
507 For example, let's suppose we want to delete a row.
508 For this you can specify a version, or else by default the `currentTimeMillis` is used.
509 What this means is _delete all cells where the version is less than or equal to this version_.
510 HBase never modifies data in place, so for example a delete will not immediately delete (or mark as deleted) the entries in the storage file that correspond to the delete condition.
511 Rather, a so-called _tombstone_ is written, which will mask the deleted values.
512 When HBase does a major compaction, the tombstones are processed to actually remove the dead values, together with the tombstones themselves.
513 If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted.
515 For an informative discussion on how deletes and versioning interact, see the thread link:http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28421[Put w/timestamp -> Deleteall -> Put w/ timestamp fails] up on the user mailing list.
517 Also see <<keyvalue,keyvalue>> for more information on the internal KeyValue format.
519 Delete markers are purged during the next major compaction of the store, unless the `KEEP_DELETED_CELLS` option is set in the column family (See <<cf.keep.deleted>>).
520 To keep the deletes for a configurable amount of time, you can set the delete TTL via the +hbase.hstore.time.to.purge.deletes+ property in _hbase-site.xml_.
521 If `hbase.hstore.time.to.purge.deletes` is not set, or set to 0, all delete markers, including those with timestamps in the future, are purged during the next major compaction.
522 Otherwise, a delete marker with a timestamp in the future is kept until the major compaction which occurs after the time represented by the marker's timestamp plus the value of `hbase.hstore.time.to.purge.deletes`, in milliseconds.
524 NOTE: This behavior represents a fix for an unexpected change that was introduced in HBase 0.94, and was fixed in link:https://issues.apache.org/jira/browse/HBASE-10118[HBASE-10118].
525 The change has been backported to HBase 0.94 and newer branches.
527 [[new.version.behavior]]
528 === Optional New Version and Delete behavior in HBase-2.0.0
530 In `hbase-2.0.0`, the operator can specify an alternate version and
531 delete treatment by setting the column descriptor property
532 `NEW_VERSION_BEHAVIOR` to true (To set a property on a column family
533 descriptor, you must first disable the table and then alter the
534 column family descriptor; see <<cf.keep.deleted>> for an example
535 of editing an attribute on a column family descriptor).
537 The 'new version behavior', undoes the limitations listed below
538 whereby a `Delete` ALWAYS overshadows a `Put` if at the same
539 location -- i.e. same row, column family, qualifier and timestamp
540 -- regardless of which arrived first. Version accounting is also
541 changed as deleted versions are considered toward total version count.
542 This is done to ensure results are not changed should a major
543 compaction intercede. See `HBASE-15968` and linked issues for
546 Running with this new configuration currently costs; we factor
547 the Cell MVCC on every compare so we burn more CPU. The slow
548 down will depend. In testing we've seen between 0% and 25%
551 If replicating, it is advised that you run with the new
552 serial replication feature (See `HBASE-9465`; the serial
553 replication feature did NOT make it into `hbase-2.0.0` but
554 should arrive in a subsequent hbase-2.x release) as now
555 the order in which Mutations arrive is a factor.
558 === Current Limitations
560 The below limitations are addressed in hbase-2.0.0. See
561 the section above, <<new.version.behavior>>.
563 ==== Deletes mask Puts
565 Deletes mask puts, even puts that happened after the delete was entered.
566 See link:https://issues.apache.org/jira/browse/HBASE-2256[HBASE-2256].
567 Remember that a delete writes a tombstone, which only disappears after then next major compaction has run.
568 Suppose you do a delete of everything <= T.
569 After this you do a new put with a timestamp <= T.
570 This put, even if it happened after the delete, will be masked by the delete tombstone.
571 Performing the put will not fail, but when you do a get you will notice the put did have no effect.
572 It will start working again after the major compaction has run.
573 These issues should not be a problem if you use always-increasing versions for new puts to a row.
574 But they can occur even if you do not care about time: just do delete and put immediately after each other, and there is some chance they happen within the same millisecond.
576 [[major.compactions.change.query.results]]
577 ==== Major compactions change query results
579 _...create three cell versions at t1, t2 and t3, with a maximum-versions
580 setting of 2. So when getting all versions, only the values at t2 and t3 will be
581 returned. But if you delete the version at t2 or t3, the one at t1 will appear again.
582 Obviously, once a major compaction has run, such behavior will not be the case
583 anymore..._ (See _Garbage Collection_ in link:https://www.ngdata.com/bending-time-in-hbase/[Bending time in HBase].)
588 All data model operations HBase return data in sorted order.
589 First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first).
591 [[dm.column.metadata]]
594 There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily.
595 Thus, while HBase can support not only a wide number of columns per row, but a heterogeneous set of columns between rows as well, it is your responsibility to keep track of the column names.
597 The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows.
598 For more information about how HBase stores data internally, see <<keyvalue,keyvalue>>.
603 Whether HBase supports joins is a common question on the dist-list, and there is a simple answer: it doesn't, at not least in the way that RDBMS' support them (e.g., with equi-joins or outer-joins in SQL). As has been illustrated in this chapter, the read data model operations in HBase are Get and Scan.
605 However, that doesn't mean that equivalent join functionality can't be supported in your application, but you have to do it yourself.
606 The two primary strategies are either denormalizing the data upon writing to HBase, or to have lookup tables and do the join between HBase tables in your application or MapReduce code (and as RDBMS' demonstrate, there are several strategies for this depending on the size of the tables, e.g., nested loops vs.
607 hash-joins). So which is the best approach? It depends on what you are trying to do, and as such there isn't a single answer that works for every use case.
611 See link:/acid-semantics.html[ACID Semantics].
612 Lars Hofhansl has also written a note on link:http://hadoop-hbase.blogspot.com/2012/03/acid-in-hbase.html[ACID in HBase].
614 ifdef::backend-docbook[]
617 // Generated automatically by the DocBook toolchain.
618 endif::backend-docbook[]