1 <?xml version="1.0" encoding="UTF-8"?>
3 Licensed to the Apache Software Foundation (ASF) under one
4 or more contributor license agreements. See the NOTICE file
5 distributed with this work for additional information
6 regarding copyright ownership. The ASF licenses this file
7 to you under the Apache License, Version 2.0 (the
8 "License"); you may not use this file except in compliance
9 with the License. You may obtain a copy of the License at
11 http://www.apache.org/licenses/LICENSE-2.0
13 Unless required by applicable law or agreed to in writing,
14 software distributed under the License is distributed on an
15 "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16 KIND, either express or implied. See the License for the
17 specific language governing permissions and limitations
20 <document xmlns="http://maven.apache.org/XDOC/2.0"
21 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
22 xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
24 <title>Powered By Apache HBase™</title>
28 <section name="Powered By Apache HBase™">
29 <p>This page lists some institutions and projects which are using HBase. To
30 have your organization added, file a documentation JIRA or email
31 <a href="mailto:dev@hbase.apache.org">hbase-dev</a> with the relevant
32 information. If you notice out-of-date information, use the same avenues to
35 <p><b>These items are user-submitted and the HBase team assumes no responsibility for their accuracy.</b></p>
37 <dt><a href="http://www.adobe.com">Adobe</a></dt>
38 <dd>We currently have about 30 nodes running HDFS, Hadoop and HBase in clusters
39 ranging from 5 to 14 nodes on both production and development. We plan a
40 deployment on an 80 nodes cluster. We are using HBase in several areas from
41 social services to structured data and processing for internal use. We constantly
42 write data to HBase and run mapreduce jobs to process then store it back to
43 HBase or external systems. Our production cluster has been running since Oct 2008.</dd>
45 <dt><a href="http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase">Project Astro</a></dt>
47 Astro provides fast Spark SQL/DataFrame capabilities to HBase data,
48 featuring super-efficient access to multi-dimensional HBase rows through
49 native Spark execution in HBase coprocessor plus systematic and accurate
50 partition pruning and predicate pushdown from arbitrarily complex data
51 filtering logic. The batch load is optimized to run on the Spark execution
52 engine. Note that <a href="http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase">Spark-SQL-on-HBase</a>
53 is the release site. Interested parties are free to make clones and claim
54 to be "latest(and active)", but they are not endorsed by the owner.
57 <dt><a href="http://axibase.com/products/axibase-time-series-database/">Axibase
58 Time Series Database (ATSD)</a></dt>
59 <dd>ATSD runs on top of HBase to collect, analyze and visualize time series
60 data at scale. ATSD capabilities include optimized storage schema, built-in
61 rule engine, forecasting algorithms (Holt-Winters and ARIMA) and next-generation
62 graphics designed for high-frequency data. Primary use cases: IT infrastructure
63 monitoring, data consolidation, operational historian in OPC environments.</dd>
65 <dt><a href="http://www.benipaltechnologies.com">Benipal Technologies</a></dt>
66 <dd>We have a 35 node cluster used for HBase and Mapreduce with Lucene / SOLR
67 and katta integration to create and finetune our search databases. Currently,
68 our HBase installation has over 10 Billion rows with 100s of datapoints per row.
69 We compute over 10<sup>18</sup> calculations daily using MapReduce directly on HBase. We
72 <dt><a href="https://github.com/ermanpattuk/BigSecret">BigSecret</a></dt>
73 <dd>BigSecret is a security framework that is designed to secure Key-Value data,
74 while preserving efficient processing capabilities. It achieves cell-level
75 security, using combinations of different cryptographic techniques, in an
76 efficient and secure manner. It provides a wrapper library around HBase.</dd>
78 <dt><a href="http://caree.rs">Caree.rs</a></dt>
79 <dd>Accelerated hiring platform for HiTech companies. We use HBase and Hadoop
80 for all aspects of our backend - job and company data storage, analytics
81 processing, machine learning algorithms for our hire recommendation engine.
82 Our live production site is directly served from HBase. We use cascading for
83 running offline data processing jobs.</dd>
85 <dt><a href="http://www.celer-tech.com/">Celer Technologies</a></dt>
86 <dd>Celer Technologies is a global financial software company that creates
87 modular-based systems that have the flexibility to meet tomorrow's business
88 environment, today. The Celer framework uses Hadoop/HBase for storing all
89 financial data for trading, risk, clearing in a single data store. With our
90 flexible framework and all the data in Hadoop/HBase, clients can build new
91 features to quickly extract data based on their trading, risk and clearing
92 activities from one single location.</dd>
94 <dt><a href="http://www.explorys.net">Explorys</a></dt>
95 <dd>Explorys uses an HBase cluster containing over a billion anonymized clinical
96 records, to enable subscribers to search and analyze patient populations,
97 treatment protocols, and clinical outcomes.</dd>
99 <dt><a href="http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-of-messages/454991608919">Facebook</a></dt>
100 <dd>Facebook uses HBase to power their Messages infrastructure.</dd>
102 <dt><a href="http://www.filmweb.pl">Filmweb</a></dt>
103 <dd>Filmweb is a film web portal with a large dataset of films, persons and
104 movie-related entities. We have just started a small cluster of 3 HBase nodes
105 to handle our web cache persistency layer. We plan to increase the cluster
106 size, and also to start migrating some of the data from our databases which
107 have some demanding scalability requirements.</dd>
109 <dt><a href="http://www.flurry.com">Flurry</a></dt>
110 <dd>Flurry provides mobile application analytics. We use HBase and Hadoop for
111 all of our analytics processing, and serve all of our live requests directly
112 out of HBase on our 50 node production cluster with tens of billions of rows
113 over several tables.</dd>
115 <dt><a href="http://gumgum.com">GumGum</a></dt>
116 <dd>GumGum is an In-Image Advertising Platform. We use HBase on an 15-node
117 Amazon EC2 High-CPU Extra Large (c1.xlarge) cluster for both real-time data
118 and analytics. Our production cluster has been running since June 2010.</dd>
120 <dt><a href="http://helprace.com/help-desk/">Helprace</a></dt>
121 <dd>Helprace is a customer service platform which uses Hadoop for analytics
122 and internal searching and filtering. Being on HBase we can share our HBase
123 and Hadoop cluster with other Hadoop processes - this particularly helps in
124 keeping community speeds up. We use Hadoop and HBase on small cluster with 4
125 cores and 32 GB RAM each.</dd>
127 <dt><a href="http://hubspot.com">HubSpot</a></dt>
128 <dd>HubSpot is an online marketing platform, providing analytics, email, and
129 segmentation of leads/contacts. HBase is our primary datastore for our customers'
130 customer data, with multiple HBase clusters powering the majority of our
131 product. We have nearly 200 regionservers across the various clusters, and
132 2 hadoop clusters also with nearly 200 tasktrackers. We use c1.xlarge in EC2
133 for both, but are starting to move some of that to baremetal hardware. We've
134 been running HBase for over 2 years.</dd>
136 <dt><a href="http://www.infolinks.com/">Infolinks</a></dt>
137 <dd>Infolinks is an In-Text ad provider. We use HBase to process advertisement
138 selection and user events for our In-Text ad network. The reports generated
139 from HBase are used as feedback for our production system to optimize ad
142 <dt><a href="http://www.kalooga.com">Kalooga</a></dt>
143 <dd>Kalooga is a discovery service for image galleries. We use Hadoop, HBase
144 and Pig on a 20-node cluster for our crawling, analysis and events
147 <dt><a href="http://www.leanxcale.com/">LeanXcale</a></dt>
148 <dd>LeanXcale provides an ultra-scalable transactional & SQL database that
149 stores its data on HBase and it is able to scale to 1000s of nodes. It
150 also provides a standalone full ACID HBase with transactions across
151 arbitrary sets of rows and tables.</dd>
154 <dt><a href="http://www.mahalo.com">Mahalo</a></dt>
155 <dd>Mahalo, "...the world's first human-powered search engine". All the markup
156 that powers the wiki is stored in HBase. It's been in use for a few months now.
157 MediaWiki - the same software that power Wikipedia - has version/revision control.
158 Mahalo's in-house editors produce a lot of revisions per day, which was not
159 working well in a RDBMS. An hbase-based solution for this was built and tested,
160 and the data migrated out of MySQL and into HBase. Right now it's at something
161 like 6 million items in HBase. The upload tool runs every hour from a shell
162 script to back up that data, and on 6 nodes takes about 5-10 minutes to run -
163 and does not slow down production at all.</dd>
165 <dt><a href="http://www.meetup.com">Meetup</a></dt>
166 <dd>Meetup is on a mission to help the world’s people self-organize into local
167 groups. We use Hadoop and HBase to power a site-wide, real-time activity
168 feed system for all of our members and groups. Group activity is written
169 directly to HBase, and indexed per member, with the member's custom feed
170 served directly from HBase for incoming requests. We're running HBase
171 0.20.0 on a 11 node cluster.</dd>
173 <dt><a href="http://www.mendeley.com">Mendeley</a></dt>
174 <dd>Mendeley is creating a platform for researchers to collaborate and share
175 their research online. HBase is helping us to create the world's largest
176 research paper collection and is being used to store all our raw imported data.
177 We use a lot of map reduce jobs to process these papers into pages displayed
178 on the site. We also use HBase with Pig to do analytics and produce the article
179 statistics shown on the web site. You can find out more about how we use HBase
180 in the <a href="http://www.slideshare.net/danharvey/hbase-at-mendeley">HBase
181 At Mendeley</a> slide presentation.</dd>
183 <dt><a href="http://www.ngdata.com">NGDATA</a></dt>
184 <dd>NGDATA delivers <a href="http://www.ngdata.com/site/products/lily.html">Lily</a>,
185 the consumer intelligence solution that delivers a unique combination of Big
186 Data management, machine learning technologies and consumer intelligence
187 applications in one integrated solution to allow better, and more dynamic,
188 consumer insights. Lily allows companies to process and analyze massive structured
189 and unstructured data, scale storage elastically and locate actionable data
190 quickly from large data sources in near real time.</dd>
192 <dt><a href="http://ning.com">Ning</a></dt>
193 <dd>Ning uses HBase to store and serve the results of processing user events
194 and log files, which allows us to provide near-real time analytics and
195 reporting. We use a small cluster of commodity machines with 4 cores and 16GB
196 of RAM per machine to handle all our analytics and reporting needs.</dd>
198 <dt><a href="http://www.worldcat.org">OCLC</a></dt>
199 <dd>OCLC uses HBase as the main data store for WorldCat, a union catalog which
200 aggregates the collections of 72,000 libraries in 112 countries and territories.
201 WorldCat is currently comprised of nearly 1 billion records with nearly 2
202 billion library ownership indications. We're running a 50 Node HBase cluster
203 and a separate offline map-reduce cluster.</dd>
205 <dt><a href="http://olex.openlogic.com">OpenLogic</a></dt>
206 <dd>OpenLogic stores all the world's Open Source packages, versions, files,
207 and lines of code in HBase for both near-real-time access and analytical
208 purposes. The production cluster has well over 100TB of disk spread across
209 nodes with 32GB+ RAM and dual-quad or dual-hex core CPU's.</dd>
211 <dt><a href="http://www.openplaces.org">Openplaces</a></dt>
212 <dd>Openplaces is a search engine for travel that uses HBase to store terabytes
213 of web pages and travel-related entity records (countries, cities, hotels,
214 etc.). We have dozens of MapReduce jobs that crunch data on a daily basis.
215 We use a 20-node cluster for development, a 40-node cluster for offline
216 production processing and an EC2 cluster for the live web site.</dd>
218 <dt><a href="http://www.pnl.gov">Pacific Northwest National Laboratory</a></dt>
219 <dd>Hadoop and HBase (Cloudera distribution) are being used within PNNL's
220 Computational Biology & Bioinformatics Group for a systems biology data
221 warehouse project that integrates high throughput proteomics and transcriptomics
222 data sets coming from instruments in the Environmental Molecular Sciences
223 Laboratory, a US Department of Energy national user facility located at PNNL.
224 The data sets are being merged and annotated with other public genomics
225 information in the data warehouse environment, with Hadoop analysis programs
226 operating on the annotated data in the HBase tables. This work is hosted by
227 <a href="http://www.pnl.gov/news/release.aspx?id=908">olympus</a>, a large PNNL
228 institutional computing cluster, with the HBase tables being stored in olympus's
229 Lustre file system.</dd>
231 <dt><a href="http://www.readpath.com/">ReadPath</a></dt>
232 <dd>|ReadPath uses HBase to store several hundred million RSS items and dictionary
233 for its RSS newsreader. Readpath is currently running on an 8 node cluster.</dd>
235 <dt><a href="http://resu.me/">resu.me</a></dt>
236 <dd>Career network for the net generation. We use HBase and Hadoop for all
237 aspects of our backend - user and resume data storage, analytics processing,
238 machine learning algorithms for our job recommendation engine. Our live
239 production site is directly served from HBase. We use cascading for running
240 offline data processing jobs.</dd>
242 <dt><a href="http://www.runa.com/">Runa Inc.</a></dt>
243 <dd>Runa Inc. offers a SaaS that enables online merchants to offer dynamic
244 per-consumer, per-product promotions embedded in their website. To implement
245 this we collect the click streams of all their visitors to determine along
246 with the rules of the merchant what promotion to offer the visitor at different
247 points of their browsing the Merchant website. So we have lots of data and have
248 to do lots of off-line and real-time analytics. HBase is the core for us.
249 We also use Clojure and our own open sourced distributed processing framework,
250 Swarmiji. The HBase Community has been key to our forward movement with HBase.
251 We're looking for experienced developers to join us to help make things go even
254 <dt><a href="http://www.sematext.com/">Sematext</a></dt>
256 <a href="http://www.sematext.com/search-analytics/index.html">Search Analytics</a>,
257 a service that uses HBase to store search activity and MapReduce to produce
258 reports showing user search behaviour and experience. Sematext runs
259 <a href="http://www.sematext.com/spm/index.html">Scalable Performance Monitoring (SPM)</a>,
260 a service that uses HBase to store performance data over time, crunch it with
261 the help of MapReduce, and display it in a visually rich browser-based UI.
262 Interestingly, SPM features
263 <a href="http://www.sematext.com/spm/hbase-performance-monitoring/index.html">SPM for HBase</a>,
264 which is specifically designed to monitor all HBase performance metrics.</dd>
266 <dt><a href="http://www.socialmedia.com/">SocialMedia</a></dt>
267 <dd>SocialMedia uses HBase to store and process user events which allows us to
268 provide near-realtime user metrics and reporting. HBase forms the heart of
269 our Advertising Network data storage and management system. We use HBase as
270 a data source and sink for both realtime request cycle queries and as a
271 backend for mapreduce analysis.</dd>
273 <dt><a href="http://www.splicemachine.com/">Splice Machine</a></dt>
274 <dd>Splice Machine is built on top of HBase. Splice Machine is a full-featured
275 ANSI SQL database that provides real-time updates, secondary indices, ACID
276 transactions, optimized joins, triggers, and UDFs.</dd>
278 <dt><a href="http://www.streamy.com/">Streamy</a></dt>
279 <dd>Streamy is a recently launched realtime social news site. We use HBase
280 for all of our data storage, query, and analysis needs, replacing an existing
281 SQL-based system. This includes hundreds of millions of documents, sparse
282 matrices, logs, and everything else once done in the relational system. We
283 perform significant in-memory caching of query results similar to a traditional
284 Memcached/SQL setup as well as other external components to perform joining
285 and sorting. We also run thousands of daily MapReduce jobs using HBase tables
286 for log analysis, attention data processing, and feed crawling. HBase has
287 helped us scale and distribute in ways we could not otherwise, and the
288 community has provided consistent and invaluable assistance.</dd>
290 <dt><a href="http://www.stumbleupon.com/">Stumbleupon</a></dt>
291 <dd>Stumbleupon and <a href="http://su.pr">Su.pr</a> use HBase as a real time
292 data storage and analytics platform. Serving directly out of HBase, various site
293 features and statistics are kept up to date in a real time fashion. We also
294 use HBase a map-reduce data source to overcome traditional query speed limits
297 <dt><a href="http://www.tokenizer.org">Shopping Engine at Tokenizer</a></dt>
298 <dd>Shopping Engine at Tokenizer is a web crawler; it uses HBase to store URLs
299 and Outlinks (AnchorText + LinkedURL): more than a billion. It was initially
300 designed as Nutch-Hadoop extension, then (due to very specific 'shopping'
301 scenario) moved to SOLR + MySQL(InnoDB) (ten thousands queries per second),
302 and now - to HBase. HBase is significantly faster due to: no need for huge
303 transaction logs, column-oriented design exactly matches 'lazy' business logic,
304 data compression, !MapReduce support. Number of mutable 'indexes' (term from
305 RDBMS) significantly reduced due to the fact that each 'row::column' structure
306 is physically sorted by 'row'. MySQL InnoDB engine is best DB choice for
307 highly-concurrent updates. However, necessity to flash a block of data to
308 harddrive even if we changed only few bytes is obvious bottleneck. HBase
309 greatly helps: not-so-popular in modern DBMS 'delete-insert', 'mutable primary
310 key', and 'natural primary key' patterns become a big advantage with HBase.</dd>
312 <dt><a href="http://traackr.com/">Traackr</a></dt>
313 <dd>Traackr uses HBase to store and serve online influencer data in real-time.
314 We use MapReduce to frequently re-score our entire data set as we keep updating
315 influencer metrics on a daily basis.</dd>
317 <dt><a href="http://trendmicro.com/">Trend Micro</a></dt>
318 <dd>Trend Micro uses HBase as a foundation for cloud scale storage for a variety
319 of applications. We have been developing with HBase since version 0.1 and
320 production since version 0.20.0.</dd>
322 <dt><a href="http://www.twitter.com">Twitter</a></dt>
323 <dd>Twitter runs HBase across its entire Hadoop cluster. HBase provides a
324 distributed, read/write backup of all mysql tables in Twitter's production
325 backend, allowing engineers to run MapReduce jobs over the data while maintaining
326 the ability to apply periodic row updates (something that is more difficult
327 to do with vanilla HDFS). A number of applications including people search
328 rely on HBase internally for data generation. Additionally, the operations
329 team uses HBase as a timeseries database for cluster-wide monitoring/performance
332 <dt><a href="http://www.udanax.org">Udanax.org</a></dt>
333 <dd>Udanax.org is a URL shortener which use 10 nodes HBase cluster to store URLs,
334 Web Log data and response the real-time request on its Web Server. This
335 application is now used for some twitter clients and a number of web sites.
336 Currently API requests are almost 30 per second and web redirection requests
337 are about 300 per second.</dd>
339 <dt><a href="http://www.veoh.com/">Veoh Networks</a></dt>
340 <dd>Veoh Networks uses HBase to store and process visitor (human) and entity
341 (non-human) profiles which are used for behavioral targeting, demographic
342 detection, and personalization services. Our site reads this data in
343 real-time (heavily cached) and submits updates via various batch map/reduce
344 jobs. With 25 million unique visitors a month storing this data in a traditional
345 RDBMS is not an option. We currently have a 24 node Hadoop/HBase cluster and
346 our profiling system is sharing this cluster with our other Hadoop data
347 pipeline processes.</dd>
349 <dt><a href="http://www.videosurf.com/">VideoSurf</a></dt>
350 <dd>VideoSurf - "The video search engine that has taught computers to see".
351 We're using HBase to persist various large graphs of data and other statistics.
352 HBase was a real win for us because it let us store substantially larger
353 datasets without the need for manually partitioning the data and its
354 column-oriented nature allowed us to create schemas that were substantially
355 more efficient for storing and retrieving data.</dd>
357 <dt><a href="http://www.visibletechnologies.com/">Visible Technologies</a></dt>
358 <dd>Visible Technologies uses Hadoop, HBase, Katta, and more to collect, parse,
359 store, and search hundreds of millions of Social Media content. We get incredibly
360 fast throughput and very low latency on commodity hardware. HBase enables our
361 business to exist.</dd>
363 <dt><a href="http://www.worldlingo.com/">WorldLingo</a></dt>
364 <dd>The WorldLingo Multilingual Archive. We use HBase to store millions of
365 documents that we scan using Map/Reduce jobs to machine translate them into
366 all or selected target languages from our set of available machine translation
367 languages. We currently store 12 million documents but plan to eventually
368 reach the 450 million mark. HBase allows us to scale out as we need to grow
369 our storage capacities. Combined with Hadoop to keep the data replicated and
370 therefore fail-safe we have the backbone our service can rely on now and in
371 the future. !WorldLingo is using HBase since December 2007 and is along with
372 a few others one of the longest running HBase installation. Currently we are
373 running the latest HBase 0.20 and serving directly from it at
374 <a href="http://www.worldlingo.com/ma/enwiki/en/HBase">MultilingualArchive</a>.</dd>
376 <dt><a href="http://www.yahoo.com/">Yahoo!</a></dt>
377 <dd>Yahoo! uses HBase to store document fingerprint for detecting near-duplications.
378 We have a cluster of few nodes that runs HDFS, mapreduce, and HBase. The table
379 contains millions of rows. We use this for querying duplicated documents with
380 realtime traffic.</dd>
382 <dt><a href="http://h50146.www5.hp.com/products/software/security/icewall/eng/">HP IceWall SSO</a></dt>
383 <dd>HP IceWall SSO is a web-based single sign-on solution and uses HBase to store
384 user data to authenticate users. We have supported RDB and LDAP previously but
385 have newly supported HBase with a view to authenticate over tens of millions
386 of users and devices.</dd>
388 <dt><a href="http://www.ymc.ch/en/big-data-analytics-en?utm_source=hadoopwiki&utm_medium=poweredbypage&utm_campaign=ymc.ch">YMC AG</a></dt>
390 <li>operating a Cloudera Hadoop/HBase cluster for media monitoring purpose</li>
391 <li>offering technical and operative consulting for the Hadoop stack + ecosystem</li>
392 <li>editor of <a href="http://www.ymc.ch/en/hbase-split-visualisation-introducing-hannibal?utm_source=hadoopwiki&utm_medium=poweredbypageamp;utm_campaign=ymc.ch">Hannibal</a>, a open-source tool
393 to visualize HBase regions sizes and splits that helps running HBase in production</li>