10 1. Importing bridge network statuses and bridge descriptors
11 1.1. Parsing bridge network statuses
12 1.2. Parsing bridge descriptors
13 1.3. Parsing extra-info documents
14 2. Assigning bridges to distributors
15 3. Giving out bridges upon requests
16 4. Selecting bridges to be given out based on IP addresses
17 5. Selecting bridges to be given out based on email addresses
18 6. Selecting unallocated bridges to be stored in file buckets
19 7. Displaying Bridge Information
20 8. Writing bridge assignments for statistics
24 This document specifies how BridgeDB processes bridge descriptor files
25 to learn about new bridges, maintains persistent assignments of bridges
26 to distributors, and decides which bridges to give out upon user
29 Some of the decisions here may be suboptimal: this document is meant to
30 specify current behavior as of August 2013, not to specify ideal
33 1. Importing bridge network statuses and bridge descriptors
35 BridgeDB learns about bridges by parsing bridge network statuses,
36 bridge descriptors, and extra info documents as specified in Tor's
37 directory protocol. BridgeDB parses one bridge network status file
38 first and at least one bridge descriptor file and potentially one extra
41 BridgeDB scans its files on sighup.
43 BridgeDB does not validate signatures on descriptors or networkstatus
44 files: the operator needs to make sure that these documents have come
45 from a Tor instance that did the validation for us.
47 1.1. Parsing bridge network statuses
49 Bridge network status documents contain the information of which bridges
50 are known to the bridge authority and which flags the bridge authority
52 We expect bridge network statuses to contain at least the following two
53 lines for every bridge in the given order (format fully specified in Tor's
56 "r" SP nickname SP identity SP digest SP publication SP IP SP ORPort
58 "a" SP address ":" port NL (no more than 8 instances)
61 BridgeDB parses the identity and the publication timestamp from the "r"
62 line, the OR address(es) and ORPort(s) from the "a" line(s), and the
63 assigned flags from the "s" line, specifically checking the assignment
64 of the "Running" and "Stable" flags.
65 BridgeDB memorizes all bridges that have the Running flag as the set of
66 running bridges that can be given out to bridge users.
67 BridgeDB memorizes assigned flags if it wants to ensure that sets of
68 bridges given out should contain at least a given number of bridges
71 1.2. Parsing bridge descriptors
73 BridgeDB learns about a bridge's most recent IP address and OR port
74 from parsing bridge descriptors.
75 In theory, both IP address and OR port of a bridge are also contained
76 in the "r" line of the bridge network status, so there is no mandatory
77 reason for parsing bridge descriptors. But the functionality described
78 in this section is still implemented in case we need data from the
79 bridge descriptor in the future.
81 Bridge descriptor files may contain one or more bridge descriptors.
82 We expect a bridge descriptor to contain at least the following lines in
85 "@purpose" SP purpose NL
86 "router" SP nickname SP IP SP ORPort SP SOCKSPort SP DirPort NL
87 "published" SP timestamp
88 ["opt" SP] "fingerprint" SP fingerprint NL
89 "router-signature" NL Signature NL
91 BridgeDB parses the purpose, IP, ORPort, nickname, and fingerprint
93 BridgeDB skips bridge descriptors if the fingerprint is not contained
94 in the bridge network status parsed earlier or if the bridge does not
95 have the Running flag.
96 BridgeDB discards bridge descriptors which have a different purpose
97 than "bridge". BridgeDB can be configured to only accept descriptors
98 with another purpose or not discard descriptors based on purpose at
100 BridgeDB memorizes the IP addresses and OR ports of the remaining
102 If there is more than one bridge descriptor with the same fingerprint,
103 BridgeDB memorizes the IP address and OR port of the most recently
104 parsed bridge descriptor.
105 If BridgeDB does not find a bridge descriptor for a bridge contained in
106 the bridge network status parsed before, it does not add that bridge
107 to the set of bridges to be given out to bridge users.
109 1.3. Parsing extra-info documents
111 BridgeDB learns if a bridge supports a pluggable transport by parsing
112 extra-info documents.
113 Extra-info documents contain the name of the bridge (but only if it is
114 named), the bridge's fingerprint, the type of pluggable transport(s) it
115 supports, and the IP address and port number on which each transport
116 listens, respectively.
118 Extra-info documents may contain zero or more entries per bridge. We expect
119 an extra-info entry to contain the following lines in the stated order:
121 "extra-info" SP name SP fingerprint NL
122 "transport" SP transport SP IP ":" PORT ARGS NL
124 BridgeDB parses the fingerprint, transport type, IP address, port and any
125 arguments that are specified on these lines. BridgeDB skips the name. If
126 the fingerprint is invalid, BridgeDB skips the entry. BridgeDB memorizes
127 the transport type, IP address, port number, and any arguments that are be
128 provided and then it assigns them to the corresponding bridge based on the
129 fingerprint. Arguments are comma-separated and are of the form k=v,k=v.
130 Bridges that do not have an associated extra-info entry are not invalid.
132 2. Assigning bridges to distributors
134 A "distributor" is a mechanism by which bridges are given (or not
135 given) to clients. The current distributors are "email", "https",
138 BridgeDB assigns bridges to distributors based on an HMAC hash of the
139 bridge's ID and a secret and makes these assignments persistent.
140 Persistence is achieved by using a database to map node ID to
142 Each bridge is assigned to exactly one distributor (including
143 the "unallocated" distributor).
144 BridgeDB may be configured to support only a non-empty subset of the
145 distributors specified in this document.
146 BridgeDB may be configured to use different probabilities for assigning
147 new bridges to distributors.
148 BridgeDB does not change existing assignments of bridges to
149 distributors, even if probabilities for assigning bridges to
150 distributors change or distributors are disabled entirely.
152 3. Giving out bridges upon requests
154 Upon receiving a client request, a BridgeDB distributor provides a
155 subset of the bridges assigned to it.
156 BridgeDB only gives out bridges that are contained in the most recently
157 parsed bridge network status and that have the Running flag set (see
159 BridgeDB may be configured to give out a different number of bridges
160 (typically 4) depending on the distributor.
161 BridgeDB may define an arbitrary number of rules. These rules may
162 specify the criteria by which a bridge is selected. Specifically,
163 the available rules restrict the IP address version, OR port number,
164 transport type, bridge relay flag, or country in which the bridge
165 should not be blocked.
167 4. Selecting bridges to be given out based on IP addresses
169 BridgeDB may be configured to support one or more distributors which
170 gives out bridges based on the requestor's IP address. Currently, this
171 is how the HTTPS distributor works.
172 The goal is to avoid handing out all the bridges to users in a similar
174 # Someone else should look at proposals/ideas/old/xxx-bridge-disbursement
175 # to see if this section is missing relevant pieces from it. -KL
177 BridgeDB fixes the set of bridges to be returned for a defined time
179 BridgeDB considers all IP addresses coming from the same /24 network
180 as the same IP address and returns the same set of bridges. From here on,
181 this non-unique address will be referred to as the IP address's 'area'.
182 BridgeDB divides the IP address space equally into a small number of
183 # Note, changed term from "areas" to "disjoint clusters" -MF
184 disjoint clusters (typically 4) and returns different results for requests
185 coming from addresses that are placed into different clusters.
186 # I found that BridgeDB is not strict in returning only bridges for a
187 # given area. If a ring is empty, it considers the next one. Is this
188 # expected behavior? -KL
190 # This does not appear to be the case, anymore. If a ring is empty, then
191 # BridgeDB simply returns an empty set of bridges. -MF
193 # I also found that BridgeDB does not make the assignment to areas
194 # persistent in the database. So, if we change the number of rings, it
195 # will assign bridges to other rings. I assume this is okay? -KL
196 BridgeDB maintains a list of proxy IP addresses and returns the same
197 set of bridges to requests coming from these IP addresses.
198 The bridges returned to proxy IP addresses do not come from the same
199 set as those for the general IP address space.
201 BridgeDB can be configured to include bridge fingerprints in replies
202 along with bridge IP addresses and OR ports.
203 BridgeDB can be configured to display a CAPTCHA which the user must solve
204 prior to returning the requested bridges.
206 The current algorithm is as follows. An IP-based distributor splits
207 the bridges uniformly into a set of "rings" based on an HMAC of their
208 ID. Some of these rings are "area" rings for parts of IP space; some
209 are "category" rings for categories of IPs (like proxies). When a
210 client makes a request from an IP, the distributor first sees whether
211 the IP is in one of the categories it knows. If so, the distributor
212 returns an IP from the category rings. If not, the distributor
213 maps the IP into an "area" (that is, a /24), and then uses an HMAC to
214 map the area to one of the area rings.
216 When the IP-based distributor determines from which area ring it is handing
217 out bridges, it identifies which rules it will use to choose appropriate
218 bridges. Using this information, it searches its cache of rings for one
219 that already adheres to the criteria specified in this request. If one
220 exists, then BridgeDB maps the current "epoch" (N-hour period) and the
221 IP's area (/24) to a point on the ring based on HMAC, and hands out
222 bridges at that point. If a ring does not already exist which satisfies this
223 request, then a new ring is created and filled with bridges that fulfill
224 the requirements. This ring is then used to select bridges as described.
226 "Mapping X to Y based on an HMAC" above means one of the following:
228 - We keep all of the elements of Y in some order, with a mapping
229 from all 160-bit strings to positions in Y.
230 - We take an HMAC of X using some fixed string as a key to get a
231 160-bit value. We then map that value to the next position of Y.
233 When giving out bridges based on a position in a ring, BridgeDB first
234 looks at flag requirements and port requirements. For example,
235 BridgeDB may be configured to "Give out at least L bridges with port
236 443, and at least M bridges with Stable, and at most N bridges
237 total." To do this, BridgeDB combines to the results:
239 - The first L bridges in the ring after the position that have the
241 - The first M bridges in the ring after the position that have the
242 flag stable and that it has not already decided to give out, and
243 - The first N-L-M bridges in the ring after the position that it
244 has not already decided to give out.
246 After BridgeDB selects appropriate bridges to return to the requestor, it
247 then prioritises the ordering of them in a list so that as many criteria
248 are fulfilled as possible within the first few bridges. This list is then
249 truncated to N bridges, if possible. N is currently defined as a
250 piecewise function of the number of bridges in the ring such that:
253 | 1, if len(ring) < 20
255 N = | 2, if 20 <= len(ring) <= 100
257 | 3, if 100 <= len(ring)
260 The bridges in this sublist, containing no more than N bridges, are the
261 bridges returned to the requestor.
263 5. Selecting bridges to be given out based on email addresses
265 BridgeDB can be configured to support one or more distributors that are
266 giving out bridges based on the requestor's email address. Currently,
267 this is how the email distributor works.
268 The goal is to bootstrap based on one or more popular email service's
269 sybil prevention algorithms.
270 # Someone else should look at proposals/ideas/old/xxx-bridge-disbursement
271 # to see if this section is missing relevant pieces from it. -KL
273 BridgeDB rejects email addresses containing other characters than the
274 ones that RFC2822 allows.
275 BridgeDB may be configured to reject email addresses containing other
276 characters it might not process correctly.
277 # I don't think we do this, is it worthwhile? -MF
278 BridgeDB rejects email addresses coming from other domains than a
279 configured set of permitted domains.
280 BridgeDB normalizes email addresses by removing "." characters and by
281 removing parts after the first "+" character.
282 BridgeDB can be configured to discard requests that do not have the
283 value "pass" in their X-DKIM-Authentication-Result header or does not
284 have this header. The X-DKIM-Authentication-Result header is set by
285 the incoming mail stack that needs to check DKIM authentication.
287 BridgeDB does not return a new set of bridges to the same email address
288 until a given time period (typically a few hours) has passed.
289 # Why don't we fix the bridges we give out for a global 3-hour time period
290 # like we do for IP addresses? This way we could avoid storing email
292 # The 3-hour value is probably much too short anyway. If we take longer
293 # time values, then people get new bridges when bridges show up, as
294 # opposed to then we decide to reset the bridges we give them. (Yes, this
295 # problem exists for the IP distributor). -NM
296 # I'm afraid I don't fully understand what you mean here. Can you
299 # Assuming an average churn rate, if we use short time periods, then a
300 # requestor will receive new bridges based on rate-limiting and will (likely)
301 # eventually work their way around the ring; eventually exhausting all bridges
302 # available to them from this distributor. If we use a longer time period,
303 # then each time the period expires there will be more bridges in the ring
304 # thus reducing the likelihood of all bridges being blocked and increasing
305 # the time and effort required to enumerate all bridges. (This is my
306 # understanding, not from Nick) -MF
307 # Also, we presently need the cache to prevent replays and because if a user
308 # sent multiple requests with different criteria in each then we would leak
309 # additional bridges otherwise. -MF
310 BridgeDB can be configured to include bridge fingerprints in replies
311 along with bridge IP addresses and OR ports.
312 BridgeDB can be configured to sign all replies using a PGP signing key.
313 BridgeDB periodically discards old email-address-to-bridge mappings.
314 BridgeDB rejects too frequent email requests coming from the same
317 To map previously unseen email addresses to a set of bridges, BridgeDB
320 - It normalizes the email address as above, by stripping out dots,
321 removing all of the localpart after the +, and putting it all
322 in lowercase. (Example: "John.Doe+bridges@example.COM" becomes
323 "johndoe@example.com".)
324 - It maps an HMAC of the normalized address to a position on its ring
326 - It hands out bridges starting at that position, based on the
327 port/flag requirements, as specified at the end of section 4.
329 See section 4 for the details of how bridges are selected from the ring
330 and returned to the requestor.
332 6. Selecting unallocated bridges to be stored in file buckets
334 # Kaner should have a look at this section. -NM
336 BridgeDB can be configured to reserve a subset of bridges and not give
337 them out via one of the distributors.
338 BridgeDB assigns reserved bridges to one or more file buckets of fixed
339 sizes and write these file buckets to disk for manual distribution.
340 BridgeDB ensures that a file bucket always contains the requested
341 number of running bridges.
342 If the requested number of bridges in a file bucket is reduced or the
343 file bucket is not required anymore, the unassigned bridges are
344 returned to the reserved set of bridges.
345 If a bridge stops running, BridgeDB replaces it with another bridge
346 from the reserved set of bridges.
347 # I'm not sure if there's a design bug in file buckets. What happens if
348 # we add a bridge X to file bucket A, and X goes offline? We would add
349 # another bridge Y to file bucket A. OK, but what if A comes back? We
350 # cannot put it back in file bucket A, because it's full. Are we going to
351 # add it to a different file bucket? Doesn't that mean that most bridges
352 # will be contained in most file buckets over time? -KL
354 # This should be handled the same as if the file bucket is reduced in size.
355 # If X returns, then it should be added to the appropriate distributor. -MF
357 7. Displaying Bridge Information
359 After bridges are selected using one of the methods described in
360 Sections 4 - 6, they are output in one of two formats. Bridges are
365 Pluggable transports are formatted as:
367 <transportname> SP <address:port> [SP arglist] NL
369 where arglist is an optional space-separated list of key-value pairs in
372 Previously, each line was prepended with the "bridge" keyword, such as
374 "bridge" SP <address:port> NL
376 "bridge" SP <transportname> SP <address:port> [SP arglist] NL
378 # We don't do this anymore because Vidalia and TorLauncher don't expect it.
379 # See the commit message for b70347a9c5fd769c6d5d0c0eb5171ace2999a736.
381 8. Writing bridge assignments for statistics
383 BridgeDB can be configured to write bridge assignments to disk for
384 statistical analysis.
385 The start of a bridge assignment is marked by the following line:
387 "bridge-pool-assignment" SP YYYY-MM-DD HH:MM:SS NL
389 YYYY-MM-DD HH:MM:SS is the time, in UTC, when BridgeDB has completed
390 loading new bridges and assigning them to distributors.
392 For every running bridge there is a line with the following format:
394 fingerprint SP distributor (SP key "=" value)* NL
396 The distributor is one out of "email", "https", or "unallocated".
398 Both "email" and "https" distributors support adding keys for "port",
399 "flag" and "transport". Respectively, the port number, flag name, and
400 transport types are the values. These are used to indicate that
401 a bridge matches certain port, flag, transport criteria of requests.
403 The "https" distributor also allows the key "ring" with a number as
404 value to indicate to which IP address area the bridge is returned.
406 The "unallocated" distributor allows the key "bucket" with the file
407 bucket name as value to indicate which file bucket a bridge is assigned