4 * Licensed to the Apache Software Foundation (ASF) under one
5 * or more contributor license agreements. See the NOTICE file
6 * distributed with this work for additional information
7 * regarding copyright ownership. The ASF licenses this file
8 * to you under the Apache License, Version 2.0 (the
9 * "License"); you may not use this file except in compliance
10 * with the License. You may obtain a copy of the License at
12 * http://www.apache.org/licenses/LICENSE-2.0
14 * Unless required by applicable law or agreed to in writing, software
15 * distributed under the License is distributed on an "AS IS" BASIS,
16 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17 * See the License for the specific language governing permissions and
18 * limitations under the License.
22 = Coprocessor Design Improvements (link:https://issues.apache.org/jira/browse/HBASE-17732[HBASE-17732])
24 Author: Apekshit Sharma
28 This doc explains current design of Coprocessor feature in brief, few issues I noticed, and
29 suggestions on how to fix them & further improve overall design.
34 * Observer *is* Coprocessor
35 * FooService *is* CoprocessorService
39 * Coprocessor *has* Observer
40 * Coprocessor *has* Service
42 See code example in <<main.design.change.suggestions>>.
46 hooks = functions in observers. Named because third-party use these functions to “hook up” custom
47 logic to internal code paths.
52 Coprocessors are well link:http://hbase.apache.org/book.html#cp[documented in the refguide].
54 Here we give a little background information on involved classes, their responsibilities, and
55 relationship to each other.
58 ** Coprocessor (interface)
59 *** All *Observer* interfaces derive from Coprocessor interface.
60 **** Coprocessor Interface is a _Marker _Interface. It just has start/stop methods and enums for
61 stages in the Coprocessor Lifecycle.
62 ** http://hbase.apache.org/book.html#_observer_coprocessors[Observers] (interface)
63 *** Contain hooks which third-party programs can override to inject functionality in various
64 internal code paths. For e.g preCreateTable(...) will be called just before any table is created.
65 *** Current set of observers: _MasterObserver, RegionObserver, RegionServerObserver, WALObserver,
66 EndpointObserver, BulkLoadObserver._
67 ** CoprocessorEnvironment (interface)
68 *** Encapsulates a coprocessor instance and other information like versions, priority, etc.
69 *** Coprocessor implementations use it to get access to tables.
70 *** Four main implementations: _MasterEnvironment, RegionEnvironment, RegionServerEnvironment,
72 ** CoprocessorHost (abstract class)
73 *** Responsible for loading coprocessors
74 *** Four concrete sub-classes: MasterCoprocessorHost, RegionCoprocessorHost,
75 RegionServerCoprocessorHost, WALCoprocessorHost
76 *** Each host is tied to corresponding environment type using template argument ‘E’.
80 * CoprocessorEnvironment has `Coprocessor getInstance()`. Since Observer types which can be
81 handled by an environment are not statically tied to it, coprocessor hosts (which are statically
82 tied to Environment) don’t know which kind of coprocessors are relevant to them, i.e.
83 MasterCoprocessorHost is tied to MasterEnvironment, but it doesn’t know that it can only handle
84 MasterObserver(s). As a result:
85 ** Problem 1: All hosts load all observers i.e. MasterCoprocessorHost will also load RegionObserver
87 ** Problem 2: Hosts use runtime checks likes `observer instanceOf ExpectedObserver` in
88 execOperation and other functions to filter out incompatible observers.
89 ** Problem 3: Many redundant functions in every implementation of coprocessor host.
90 * Observer *extends* Coprocessor (inheritance)
91 ** Problem 4: Any third-party coprocessor which wants to use many observers will have to extend all
92 of them in same class. For eg.
93 class AccessController implements MasterObserver,
94 RegionObserver, RegionServerObserver, EndpointObserver,
95 BulkLoadObserver, AccessControlService.Interface,
97 That results in big classes with 100+ functions.
101 * There are 6 types of observers (listed in <<background>> section above), but just 4 types of
102 CoprocessorEnvironment. So some XEnvironment has to be handling multiple Observers
103 (RegionEnvironment serves RegionObserver, EndpointObserver and BulkLoadObservers). Our aim is to
104 statically tie environment to types of observers it can serve. There are two alternative choices
106 ** Option 1: Limit to 4 types of Observers. That fits nicely in our pattern-of-4
107 (4 hosts, 4 environments, 4 observers) and will make the overall design simpler. Although it may
108 look simple at surface, it’ll actually lead to a single large observer interface which will only
109 grow and may contain 100s of hooks in future (master already has 100+)
110 ** Option 2: Use multiple observers to group together similar kinds of hooks.
111 Like we have RegionObserver, EndpointObserver and BulkLoadObserver; we can have ScannerObserver,
112 AMObserver, etc instead of single MasterObserver. Benefits being
113 *** Proper encapsulation of related hooks and separation from unrelated hooks
114 *** We can give different Stability guarantees for different set of hooks. Something which’ll make
115 our CP compat management much easier.
117 I believe Option 2 to be better than Option 1, and the design changes suggested later are based on
120 * For problem 4, we should replace inheritance with composition, so developers have choice to
121 break out observer implementations into separate classes.
123 [[main.design.change.suggestions]]
124 == Main Design Change suggestions
126 * Extend pattern-of-4 up to coprocessor.
128 CoprocessorHost → Env → Coprocessor
129 * Tie each CoprocessorEnvironment to corresponding Coprocessor
130 * Use composition instead of inheritance between Coprocessor and Observers.
134 Only changing parts are mentioned here. Anything not changing is represented by “...”
138 interface Coprocessor {
142 interface CoprocessorEnvironment {
143 Coprocessor getInstance();
147 interface CoprocessorService {
148 Service getService();
151 abstract class CoprocessorHost<E extends CoprocessorEnvironment> {
155 interface RegionObserver extends Coprocessor {
159 interface BulkLoadObserver extends Coprocessor {
163 interface EndpointObserver extends Coprocessor {
172 interface Coprocessor {
176 // Extend pattern-of-4 to coprocessors.
177 interface RegionCoprocessor extends Coprocessor {
178 RegionObserver getRegionObserver();
179 BulkLoadObserver getBulkLoadObserver();
180 EndpointObserver getEndpointObserver();
181 Service getService();
184 // Tie coprocessor to environment
185 interface CoprocessorEnvironment<C extends Coprocessor> {
190 abstract class CoprocessorHost<C extends Coprocessor, E extends CoprocessorEnvironment<C>> {
194 // Doesn’t extend coprocessor
195 interface RegionObserver extends Coprocessor {
199 // Doesn’t extend coprocessor
200 interface BulkLoadObserver extends Coprocessor {
206 == How does it fix our issues?
208 * Fix #1: CoprocessorHost is now tied to types of coprocessors it can serve by template argument C.
209 During load time, it can ignore any coprocessors which don’t match.
210 * Fix #2 and #3: Pull the duplicate functions into CoprocessorHost class. Individual host subclasses
211 can use these directly. One interesting part here is ObserverGetter<C, O>. For any specific hook,
212 say in observer O, subclasses specify ObserverGetter<C, O> so that the shared methods can extract
213 observer O from coprocessor C.
214 * Fix #4: Choosing composition over inheritance, by adding getter functions in coprocessors (e.g.
215 getRegionObserver()), implementations can now break up observer implementations into separate
216 classes. For e.g. our AccessController will now just be:
217 `class AccessController implements AccessControlService.Interface, CoprocessorService`
219 [[migrating.existing.cps.to.new.design]]
220 == Migrating existing CPs to new design
222 There’s a simple and small fix that can migrate existing coprocessors to the new design. +
223 If we had the following observer in the old design: +
225 class FooObserver implements RegionObserver {
231 It can be “made to work” with the new design like this: +
233 class FooObserver implements RegionCoprocessor, RegionObserver {
234 public RegionObserver getRegionObserver() { return this; }
240 However, note that this is only a workaround to quickly migrate ~100 CPs in our code base to new
241 design without creating new classes and files. *New CPs should NOT follow this pattern.*
243 == Additional Benefit
245 * Cleaner solution to https://issues.apache.org/jira/browse/HBASE-17106[HBASE-17106].
246 * We can have multiple observer interfaces for each environment now. For e.g We can break our single
247 monolithic MasterObsever (~150 functions) to multiple observer interfaces - ScannerObserver,
249 * These observers can be assigned different compatibility guarantees. For instances, new hooks by
250 Backup feature could have been put into separate Observer and marked IS.Unstable, while features
251 which have hardened over years can be marked IS.Stable.
252 * Only the coprocessors corresponding to hosts which support endpoint service will have
253 “getService()” method. So WALCoprocessor which cannot support service doesn’t have one. That may
254 look minor thing. But if our design can cleanly convey what is and isn’t supported, that’s beautiful
255 and powerful and helpful for downstream developers.