1 README: Perform an ad-hoc analysis of Tor's sanitized web logs
2 ==============================================================
4 Sanitized versions of Tor's Apache web logs are available at
5 https://webstats.torproject.org/. Let's perform an ad-hoc analysis of
6 these logs to decide what graphs we'll want to put on Tor Metrics. And
7 let's perform this analysis by throwing everything into a PostgreSQL
8 database that we might later want to re-use for Tor Metrics.
10 Steps to import sanitized web logs into the database:
12 Create PostgreSQL database user webstats and database of same name:
14 $ sudo -u postgres createuser -P webstats
15 $ sudo -u postgres createdb -O webstats webstats
17 Import database schema:
19 $ psql -f webstats.sql webstats
21 Fetch sanitized web logs and put them under webstats.torproject.org/:
23 $ wget --recursive --reject "index.html*" --no-parent \
24 --accept "*201608*" https://webstats.torproject.org/
26 Fetch the following required libraries and put them in the lib/
29 - lib/commons-compress-1.9.jar
30 - lib/postgresql-jdbc3-9.2.jar
37 Log into database and run a simple query:
41 webstats=> SELECT log_date, SUM(count) AS hits FROM requests
42 NATURAL JOIN resources NATURAL JOIN files
43 WHERE method = 'GET' AND response_code = 200
44 AND (resource_string = '/' OR resource_string LIKE '/index%')
45 AND site = 'www.torproject.org' GROUP BY log_date ORDER BY log_date;