HBASE-20137 TestRSGroups is flakey
commit1f5e93a8f85c702f65c27c6162cd10c2035e481d
authorMichael Stack <stack@apache.org>
Tue, 6 Mar 2018 05:20:23 +0000 (5 21:20 -0800)
committerMichael Stack <stack@apache.org>
Tue, 6 Mar 2018 18:55:40 +0000 (6 10:55 -0800)
treedd7c75c493cc1798d0b72410a1609eb6a1e21209
parent7889df37118a0593dfb0156c6edf1bb96d4c94b9
HBASE-20137 TestRSGroups is flakey

On failed RPC we expire the server and suspend expecting the
resultant ServerCrashProcedure to wake us back up again. In tests,
TestRSGroup hung because it failed to schedule a server expiration
because the server was already expired undergoing processing (the
test was shutting down). Deal with this case by having expire
servers return false if unable to expire. Callers will then know
where a ServerCrashProcedure has been scheduled or not.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
  Have expireServer return true if successful.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
 The log that included an exception whose message was the current
procedure as a String totally baffled me. Make it more obvious what
exception is.

M hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
 If failed expire of a server, wake our procedure -- do not suspend --
and presume ok to move region to CLOSED state (because going down or
concurrent crashed server processing ongoing).
hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/FailedRemoteDispatchException.java
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionTransitionProcedure.java
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/UnassignProcedure.java
hbase-server/src/test/java/org/apache/hadoop/hbase/master/assignment/TestAssignmentManager.java