1 \documentclass{article
}
2 \usepackage[pdftex
]{graphicx}
9 \author{Karsten Loesing\\
{\tt karsten@torproject.org
}}
10 \title{Case study:\
\Learning whether a Tor bridge is blocked\
\by looking
11 at its aggregate usage statistics\\-- Part one --
}
14 \section{Introduction
}
16 Tor bridges
\footnote{\url{https://www.torproject.org/docs/bridges
}} are
17 relays that are not listed in the main directory.
18 Clients which cannot access the Tor network directly can try to learn a
19 few bridge addresses and use these bridges to connect to the Tor network.
20 Bridges have been introduced to impede censoring the Tor network, but in
21 the past we experienced successful blocking of bridges in a few countries.
23 In this
report we investigate whether we can learn that a bridge is
24 blocked in a given country only by looking at its reported aggregate
25 statistics on usage by country.
26 By knowing that a bridge is blocked, we can, for example, avoid giving
27 out its address to users from that country.
29 Learning whether a bridge is blocked is somewhat related to our recent
30 efforts to detect censorship of direct access to the Tor
31 network.
\footnote{\url{https://metrics.torproject.org/papers/detector-
2011-
09-
09.pdf
}}
32 The main difference is that we want to know which bridges are blocked and
33 which are not, whereas we don't care which relays are accessible in the
34 case of blocked direct access.
35 It's easy to block all relays, but it should be difficult to block all
38 This
report can only be seen as a first step towards researching bridge
40 Even if a bridge reports that it had zero users from a country, we're
41 lacking the confirmation that the bridge was really blocked.
42 There can be other reasons for low user numbers which may be completely
44 The results of this analysis should be considered when actively scanning
45 bridge reachability from inside a country, both to decide how frequently a
46 bridge should be scanned and to evaluate how reliable an analysis of
47 passive usage statistics can be.
49 \section{Bridge usage statistics
}
51 Bridges
report aggregate usage statistics on the number of connecting
53 Bridges gather these statistics by memorizing unique IP addresses of
54 connecting clients over
24 hour periods and resolving IP addresses to
55 country codes using an internal GeoIP database.
56 Archives of these statistics are available for analysis from the metrics
57 website.
\footnote{\url{https://metrics.torproject.org/data.html#bridgedesc
}}
58 Figure~
\ref{fig:bridgeextrainfo
} shows an example of bridge usage
60 This bridge observed
41 to
48 connecting clients from Saudi Arabia
61 (all numbers are rounded up to the next multiple of
8),
33 to
40
62 connecting clients from the U.S.A.,
25 to
32 from Germany,
25 to
32 from
64 These connecting clients were observed in the
24~hours (
86,
400 seconds)
65 before December
27,
2010,
14:
56:
29 UTC.
70 extra-info Unnamed A5FA7F38B02A415E72FE614C64A1E5A92BA99BBD
71 published
2010-
12-
27 18:
55:
01
73 bridge-stats-end
2010-
12-
27 14:
56:
29 (
86400 s)
74 bridge-ips sa=
48,us=
40,de=
32,ir=
32,
[...
]
77 \caption{Example of aggregate bridge usage statistics
}
78 \label{fig:bridgeextrainfo
}
81 An obvious limitation of these bridge usage statistics is that we can only
82 learn about connecting clients from bridges with at least
24 hours uptime.
83 It's still unclear how many bridge users are not included in the
84 statistics because of this, which is left for a different analysis.
86 We further decided to exclude bridges running Tor versions
0.2.2.3-alpha
88 These bridges
report similar statistics as the later Tor versions that
89 we're considering here, but do not enforce a measurement interval of
90 exactly
24 hours which would have slightly complicated the analysis.
91 We don't expect the bridge version to have an influence on bridge usage
92 or on the likelihood of the bridge to be blocked in a given country.
94 \section{Case study: China in the first half of
2010}
96 The major limitation of this analysis is that we don't have the data
97 confirming that a bridge was actually blocked.
98 We may decide on a case-by-case basis whether a blocking is a plausible
99 explanation for the change in observed users from a given country.
100 Anything more objective requires additional data, e.g., data obtained from
101 active reachability scans.
103 We decided to investigate bridge usage from China in the first half of
104 2010 as a case study.
105 Figure~
\ref{fig:bridge-users
} shows estimated daily bridge users from China
107 The huge slope in September and October
2009 is very likely a result from
108 China blocking direct access to the Tor network.
109 It seems plausible that the drops in March and May
2010 result from
110 attempts to block access to bridges, too.
111 We're going to focus only on the interval from January to June
2010 which
112 promises the most interesting results.
113 We should be able to detect these blockings in the reported statistics of
115 Obviously, it may be hard or impossible to transfer the findings from this
116 case study to other countries or situations.
119 \includegraphics[width=
\textwidth]{bridge-users.png
}
120 \caption{Estimated daily bridge users from China
}
121 \label{fig:bridge-users
}
124 \paragraph{Definition of bridge blocking
}
126 We have a few options to define when we consider a bridge to be blocked
127 from a given country on a given day.
130 \item \textbf{Absolute threshold:
}
131 The absolute number of connecting clients from a country falls below a
133 \item \textbf{Relative threshold compared to other countries:
}
134 The fraction of connecting clients from a country drops below a fixed
136 \item \textbf{Estimated interval based on history:
}
137 The absolute or relative number of connecting clients falls outside an
138 estimated interval based on the recent history.
141 For this case study we decided to stick with the simplest solution being
142 an absolute threshold.
143 We define a somewhat arbitrary threshold of
32 users to decide whether a
144 bridge is potentially blocked.
145 A blocked bridge does not necessarily
report zero users per day.
146 A likely explanation for reporting users from a country that blocks a
147 bridge is that our GeoIP is not
100~\% accurate and reports a few users
148 which in fact come from other countries.
150 The reason against using a relative threshold was that it depends on
151 development in other countries.
152 As we can see in the example of China, bridge usage can depend on the
153 abilty to directly access the Tor network.
154 A sudden increase in country $A$ could significantly lower the relative
155 usage in country $B$.
156 We should probably consider both absolute and relative thresholds in
157 future investigations.
158 Maybe we also need to take direct usage numbers into account.
160 We also didn't build our analysis upon an estimated interval based on the
161 recent history, because it's unclear how fast a bridge will be blocked
163 If it only takes the censor a few hours, the bridge may never see much use
164 from a country at all.
165 An estimate based on the bridge's history may not detect the censorship at
166 all, because it may look like a bridge with only few users from that
169 We plan to reconsider other options for deciding that a bridge is blocked
170 once we have data confirming this.
172 \paragraph{Visualization of bridge blockings
}
174 Figure~
\ref{fig:bridge-blockings
} shows a subset of the raw bridge usage
175 statistics for clients connecting from China in the first half of
2010.
176 Possible blocking events are those when the bridge reports
32 or fewer
177 connecting clients per day.
178 These events are marked with red dots.
180 We decided to only include bridges in the figure that
report at least
181 100~Chinese clients on at least one day in the whole interval.
182 Bridges with fewer users than that have a usage pattern that makes it much
183 more difficult to detect blockings at all.
184 The figure also shows only bridges reporting statistics on at least
30
185 days in the measurement interval.
188 \includegraphics[width=
\textwidth]{bridge-blockings.png
}
189 \caption{Subset of bridge usage statistics for Chinese clients in the
191 \label{fig:bridge-blockings
}
194 The single bridge usage plots indicate how difficult it is to detect
195 blockings only from usage statistics.
196 About
10 of the displayed
27 plots have a pattern similar to the expected
197 pattern from Figure~
\ref{fig:bridge-users
}.
198 The best examples are probably bridges
\verb+C037+ and
\verb+D795+.
199 Interestingly, bridge
\verb+A5FA+ was unaffected by the blocking in March
200 2010, but affected by the blocking in May
2010.
202 \paragraph{Aggregating blocking events
}
204 As the last step of this case study we want to compare observed bridge
205 users to the number of blocked bridges as detected by our simple threshold
207 We would expect most of our bridges to exhibit blockings in March
2010 and
209 Figure~
\ref{fig:bridge-users-blockings
} plots users and blocked bridges
211 The two plots indicate that our detection algorithm is at least not
215 \includegraphics[width=
\textwidth]{bridge-users-blockings.png
}
216 \caption{Estimated users and assumed bridge blockings in China in the
218 \label{fig:bridge-users-blockings
}
223 Passively collected bridge usage statistics seem to be a useful tool to
224 detect whether a bridge is blocked from a country.
225 However, the main conclusion from this analysis is that we're lacking the
226 data to conduct it usefully.
227 One way to obtain the data we need are active scans.
228 When conducting such scans, passively collected statistics may help reduce
229 the total number and frequency of scans.
230 For example, when selecting a bridge to scan, the reciprocal of the last
231 reported number of connecting clients could be used as a probability
233 Once we have better data confirming bridge blocking we shall revisit the
234 criteria for deriving the blocking from usage statistics.