6 TMCast (stands for Transaction MultiCast) is an implementation of a
7 transactional multicast protocol. In essence, the idea is to represent
8 each message delivery to members of a multicast group as a transaction
9 - an atomic, consistent and isolated action. A multicast transaction
10 can be viewed as an atomic transition of the group members to a new
11 state. If we define [Mo] as a set of operational (non-faulty) members
12 of the group, [Mf] as a set of faulty members of the group, [Ma] as a
13 set of members that view transition [Tn] as aborted and [Mc] as a set
14 of members that view transition [Tn] as committed, then this atomic
15 transition [Tn] should satisfy one of the following equations:
17 Mo(Tn-1) = Ma(T) U Mf(T)
18 Mo(Tn-1) = Mc(T) U Mf(T)
20 Or, in other words, after transaction T has been committed (aborted),
21 all operational (before transaction T) members are either in the
22 committed (aborted) or failed state.
24 Thus, for each member of the group, outcome of the transaction can be
25 commit, abort or a member failure. It is important for a member to
26 exhibit a failfast (error latency is less than transaction cycle)
27 behavior. Or, in other words, if a member transitioned into a wrong
28 state, it is guaranteed to fail instead of delivering a wrong result.
30 In order to achieve such an error detection in a decentralized
31 environment, certain limitations were imposed. One of the most
32 user-visible limitation is the fact that the lifetime of the group
33 with only one member is very short. This is because there is not way
34 for a member to distinguish "no members yet" case from "my link to the
35 group is down". In such a situation, the member assumes the latter
36 case. There is also a military saying that puts it quite nicely: two
37 is one, one is nothing.
41 State of Implementation
42 -----------------------
44 The current implementation is in a prototypical stage. The following
45 parts are not implemented or still under development:
47 * Handling of network partitioning (TODO)
49 * Redundant network support (TODO)
51 * Member failure detection (partial implementation)
57 There is a simple example available in examples/TMCast/Member with
58 the corresponding README.
64 Primary goals of the protocol are to (1) mask transient failures of the
65 underlying multicast protocol (or, more precisely, allow to recover
66 from transient failures) and (2) exhibit failfast behavior in cases of
69 The distinction between transient and permanent failures is based on
70 timeouts thus what can be a transient failure in one configuration of
71 the protocol could be a permanent failure in the other.
73 [Maybe talk more about a transient/permanent threshold and its effect
74 on performance/resource utilization/etc.]
76 [Maybe add a terminology section.]
78 Each member of a multicast group has its unique (group-wise) id:
82 char id[MEMBER_ID_LENGTH];
85 Each payload delivery is part of a transaction. Each transaction is
86 identified by TransactionId:
88 typedef unsigned short TransactionId;
91 Each transaction has a status code which identifies its state, as viewed by
95 typedef unsigned char TransactionStatus;
97 TransactionStatus const TS_BEGIN = 1;
98 TransactionStatus const TS_COMMIT = 2;
99 TransactionStatus const TS_ABORT = 3;
100 TransactionStatus const TS_COMMITTED = 4;
101 TransactionStatus const TS_ABORTED = 5;
103 Thus each transaction is described by its id and status:
108 TransactionStatus status;
111 The outcome of some predefined number of recent transactions is stored
114 typedef Transaction TransactionList[TL_LENGTH];
117 Each message sent to a multicast group has the following header:
121 unsigned long length;
122 unsigned long check_sum;
125 TransactionList transaction_list;
128 [Maybe describe each field here.]
130 A new member joins the group with transaction id 0 and status
133 Each member sends a periodic 'pulse' messages with some predefined interval
134 advertising its current view of the group. This includes the state of the
135 current transaction and the history of the recent transactions.
138 If a member of the group needs a payload delivery it starts a new
139 transaction by sending a message with current transaction set to
141 {++current_id, TS_BEGIN}
143 and payload appended after the header.
146 Each member joins a transaction in one of the following ways:
148 * A member that began the transaction joins it 'to commit' (TS_COMMIT)
150 * A member that received TS_BEGIN joins current transaction 'to commit'
153 * A member that received TS_COMMIT or TS_ABORT but did not receive TS_BEGIN
154 joins current transaction 'to abort' (TS_ABORT).
157 After a member has joined the transaction it starts participating in the
158 transaction's voting phase. On this phase members of the group decide the
159 fate of the transaction. Each member sends a predefined number of messages
160 where it announces its vote. In between those messages the member is receiving
161 and processing votes from other members and can be influenced by their
164 In their decision-making members follow the principle of the majority. As
165 the voting progresses (and comes close to an end) members become more and
166 more reluctant to deviate from the decision of the majority.
168 [Maybe add an equation that measures member's willingness to change
171 At the end of the voting phase each member declares the current transaction
172 either committed (TS_COMMITTED) or aborted (TS_ABORTED). If this decision does
173 not agree with the majority the member declares itself failed.
175 In addition, each member builds a 'majority view' of the transaction history
176 (based on transaction_list). If it deviates from the member's own history the
177 member declares itself failed.
179 Here are some example scenarios of how the protocol behaves in different
180 situations. Let's say we have three members of the group S, R1, R2. S
181 initiates a transaction. R1 and R2 join it.
183 Scenario 1. (two-step voting)
185 1. S initiates a transaction (TS_BEGIN)
186 2a. R1 receives TS_BEGIN, joins for commit
187 2b. R2 receives TS_BEGIN, joins for commit
188 3a. S announces TS_COMMIT (first vote)
189 3b. R1 announces TS_COMMIT (first vote)
190 3c. R2 announces TS_COMMIT (first vote)
191 4a. S announces TS_COMMIT (second vote)
192 4b. R1 announces TS_COMMIT (second vote)
193 4c. R2 announces TS_COMMIT (second vote)
194 5a. S announces TS_COMMITTED (end of vote)
195 5b. R1 announces TS_COMMITTED (end of vote)
196 5c. R2 announces TS_COMMITTED (end of vote)
199 Scenario 2. (two-step voting)
201 1. S initiates a transaction (TS_BEGIN)
202 2a. R1 receives TS_BEGIN, joins for commit
203 2b. R2 didn't receive TS_BEGIN
204 3a. S announces TS_COMMIT (first vote)
205 3b. R1 announces TS_COMMIT (first vote)
206 3c. R2 received R1's TS_COMMIT announces TS_ABORT (first vote)
207 4a. S received R2's TS_ABORT announces TS_ABORT (second vote)
208 4b. R1 received R2's TS_ABORT announces TS_ABORT (second vote)
209 4c. R2 announces TS_ABORT (second vote)
210 5a. S announces TS_ABORTED (end of vote)
211 5b. R1 announces TS_ABORTED (end of vote)
212 5c. R2 announces TS_ABORTED (end of vote)
215 Scenario 3. (three-step voting)
217 1. S initiates a transaction (TS_BEGIN)
218 2a. R1 receives TS_BEGIN, joins for commit
219 2b. R2 didn't receive TS_BEGIN
220 3a. S announces TS_COMMIT (first vote)
221 3b. R1 announces TS_COMMIT (first vote)
222 3c. R2 still didn't receive anything
223 4a. S announces TS_COMMIT (second vote)
224 4b. R1 announces TS_COMMIT (second vote)
225 4c. R2 received R1's TS_COMMIT, announces TS_ABORT (first vote)
227 5a. S received R2's TS_ABORT but it is the end of the voting phase and
228 majority (S and R1) vote for commit, announces TS_COMMIT (third vote)
229 5b. R1 received R2's TS_ABORT but it is the end of the voting phase and
230 majority (S and R1) vote for commit, announces TS_COMMIT (third vote)
231 5c. R2 announces TS_ABORT (second vote)
233 6a. S announces TS_COMMITTED (end of vote)
234 6b. R1 announces TS_COMMITTED (end of vote)
235 6c. R2 discovers that the the majority has declared current transaction
236 committed and thus declares itself failed.
240 Boris Kolpackov <boris@dre.vanderbilt.edu>