1 Subject: [PATCH] gitweb: support caching projects list
3 On repo.or.cz (permanently I/O overloaded and hosting 1050 project +
4 forks), the projects list (the default gitweb page) can take more than
5 a minute to generate. This naive patch adds simple support for caching
6 the projects list data structure so that all the projects do not need
7 to get rescanned at every page access.
9 $projlist_cache_lifetime gitweb configuration variable is introduced,
10 by default set to zero. If set to non-zero, it describes the number of
11 minutes for which the cache remains valid. Only single project root
12 per system can use the cache. Any script running with the same uid as
13 gitweb can change the cache trivially - this is for secure
16 The cache itself is stored in $cache_dir/$projlist_cache_name using
17 Storable to store() Perl data structure with the list of project
18 details. When reusing the cache, the data is retrieve()'d back into
21 To prevent contention when multiple accesses coincide with cache
22 expiration, the timeout is postponed to time()+120 when we start
23 refreshing. When showing cached version, a disclaimer is shown
24 at the top of the projects list.
26 [jn: moved from Data::Dumper to Storable for serialization of data]
28 $cache_grpshared gitweb configuration variable can be set to 1 to
29 create the cache file group-readable and group-writable to facilitate
30 external re-generation of the cache.
32 Signed-off-by: Petr Baudis <pasky@ucw.cz>
33 Signed-off-by: Jakub Narebski <jnareb@gmail.com>
34 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
36 Documentation/gitweb.conf.txt | 51 +++++++++-
37 gitweb/gitweb.perl | 212 ++++++++++++++++++++++++++++++++++++++----
38 gitweb/static/gitweb.css | 8 ++
39 3 files changed, 251 insertions(+), 20 deletions(-)
41 diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
42 index 841e7dad..2ed86bf3 100644
43 --- a/Documentation/gitweb.conf.txt
44 +++ b/Documentation/gitweb.conf.txt
45 @@ -428,7 +428,8 @@ $frontpage_no_project_list::
46 If 0, the gitweb frontpage will contain the project list; if 1 instead,
47 it will contain just the index text, search form, tag cloud (if enabled)
48 and a link to the actual project list. The page is reduced, but all
49 - projects still need to be scanned for the tag cloud construction. If the
50 + projects still need to be scanned for the tag cloud construction (but
51 + the project info cache is used if enabled, of course). If the
52 option is set to 2, not even the tag cloud will be shown; this is fastest.
53 This option is useful for sites with large amount of projects. The default
55 @@ -467,6 +468,54 @@ CPU-intensive. Note also that non Git tools can have problems with
56 patches generated with options mentioned above, especially when they
57 involve file copies (\'-C') or criss-cross renames (\'-B').
59 +These configuration variable control caching in gitweb. If you don't
60 +run gitweb installation on busy site with large number of repositories
61 +(projects) you probably don't need caching; by default caching is
64 +$projlist_cache_lifetime::
65 + Lifetime of in-gitweb cache for projects list page, in minutes.
66 + By default set to 0, which means tha projects list caching is
70 + The cached list version (cache of Perl structure, not of final
71 + output) is stored in "$cache_dir/$projlist_cache_name". $cache_dir
72 + should be writable only by processes with the same uid as gitweb
73 + (usually web server uid); if $cache_dir does not exist gitweb will
76 +$projlist_cache_name::
77 + The cached list version (cache of Perl structure, not of final
78 + output) is stored in "$cache_dir/$projlist_cache_name". Only single
79 + gitweb project root per system is supported, unless gitweb instances
80 + for different projects root have different configuration.
82 +By default $cache_dir is set to "$TMPDIR/gitweb" if $TMPDIR
83 +environment variable does exist, "/tmp/gitweb" otherwise.
84 +Default name for $projlist_cache_name is 'gitweb.index.cache';
86 +*Note* projects list cache file can be tweaked by other scripts
87 +running with the same uid as gitweb; use this ONLY at secure
91 + By default, $cache_grpshared is 0 and the cache file is accessible
92 + only by the webserver uid; however, when it is set to 1, it will
93 + also be set group-readable and group-writable. You can use that
94 + to externally trigger cache re-generation before users may have
95 + a chance to trigger it (and wait a long time). For example, you
96 + could use this script:
98 +----------------------------------------------------------------------
99 +REQUEST_METHOD=HEAD perl -e 'do "./gitweb.cgi"; END {
100 + fill_project_list_info([], "rebuild-cache") }' >/dev/null 2>&1
101 +----------------------------------------------------------------------
103 +(You need to run it in the directory of gitweb.cgi and, if
104 +gitweb_config.perl is not located in that same directory, also
105 +set GITWEB_CONFIG for getweb_config.perl to be loaded properly.)
108 Some optional features and policies
109 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
110 diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
111 index 76815232..4c122c69 100755
112 --- a/gitweb/gitweb.perl
113 +++ b/gitweb/gitweb.perl
114 @@ -19,6 +19,7 @@ use File::Find qw();
115 use File::Basename qw(basename);
116 use Time::HiRes qw(gettimeofday tv_interval);
118 +use constant GITWEB_CACHE_FORMAT => "Gitweb Cache Format 3";
119 binmode STDOUT, ':utf8';
121 if (!defined($CGI::VERSION) || $CGI::VERSION < 4.08) {
122 @@ -209,9 +210,25 @@ our $highlight_bin = "++HIGHLIGHT_BIN++";
124 # Whether to include project list on the gitweb front page; 0 means yes,
125 # 1 means no list but show tag cloud if enabled (all projects still need
126 -# to be scanned), 2 means no list and no tag cloud (very fast)
127 +# to be scanned, unless the info is cached), 2 means no list and no tag cloud
129 our $frontpage_no_project_list = 0;
131 +# projects list cache for busy sites with many projects;
132 +# if you set this to non-zero, it will be used as the cached
133 +# index lifetime in minutes
135 +# the cached list version is stored in $cache_dir/$cache_name and can
136 +# be tweaked by other scripts running with the same uid as gitweb -
137 +# use this ONLY at secure installations; only single gitweb project
138 +# root per system is supported, unless you tweak configuration!
139 +our $projlist_cache_lifetime = 0; # in minutes
140 +# FHS compliant $cache_dir would be "/var/cache/gitweb"
142 + (defined $ENV{'TMPDIR'} ? $ENV{'TMPDIR'} : '/tmp').'/gitweb';
143 +our $projlist_cache_name = 'gitweb.index.cache';
144 +our $cache_grpshared = 0;
146 # information about snapshot formats that gitweb is capable of serving
147 our %known_snapshot_formats = (
149 @@ -1243,8 +1260,13 @@ sub handle_errors_html {
151 set_message(\&handle_errors_html);
153 +our $shown_stale_message = 0;
154 +our $cache_dump = undef;
155 +our $cache_dump_mtime = undef;
159 + $shown_stale_message = 0;
160 if (!defined $action) {
162 $action = git_get_type($hash);
163 @@ -3329,29 +3351,27 @@ sub git_get_last_activity {
164 if ($lastactivity_file && open($fd, "<", "$git_dir/$lastactivity_file")) {
165 my $activity = <$fd>;
167 - return (undef, undef) unless defined $activity;
168 + return (undef) unless defined $activity;
170 - return (undef, undef) if $activity eq '';
171 + return (undef) if $activity eq '';
172 if (my $timestamp = parse_activity_date($activity)) {
173 - my $age = time - $timestamp;
174 - return ($age, age_string($age));
175 + return ($timestamp);
178 - return (undef, undef) if $quick;
179 + return (undef) if $quick;
180 open($fd, "-|", git_cmd(), 'for-each-ref',
181 '--format=%(committer)',
182 '--sort=-committerdate',
184 map { "refs/$_" } get_branch_refs ()) or return;
185 my $most_recent = <$fd>;
186 - close $fd or return;
187 + close $fd or return (undef);
188 if (defined $most_recent &&
189 $most_recent =~ / (\d+) [-+][01]\d\d\d$/) {
191 - my $age = time - $timestamp;
192 - return ($age, age_string($age));
193 + return ($timestamp);
195 - return (undef, undef);
199 # Implementation note: when a single remote is wanted, we cannot use 'git
200 @@ -5682,12 +5702,99 @@ sub project_info_needs_filling {
204 +sub git_cache_file_format {
205 + return GITWEB_CACHE_FORMAT .
206 + (gitweb_check_feature('forks') ? " (forks)" : "");
209 +sub git_retrieve_cache_file {
210 + my $cache_file = shift;
212 + use Storable qw(retrieve);
214 + if ((my $dump = eval { retrieve($cache_file) })) {
215 + return $$dump[1] if
216 + ref($dump) eq 'ARRAY' &&
218 + ref($$dump[1]) eq 'ARRAY' &&
219 + @{$$dump[1]} == 2 &&
220 + ref(${$$dump[1]}[0]) eq 'ARRAY' &&
221 + ref(${$$dump[1]}[1]) eq 'HASH' &&
222 + $$dump[0] eq git_cache_file_format();
228 +sub git_store_cache_file {
229 + my ($cache_file, $cachedata) = @_;
231 + use File::Basename qw(dirname);
233 + use POSIX qw(:fcntl_h);
234 + use Storable qw(store_fd);
236 + my $result = undef;
237 + my $cache_d = dirname($cache_file);
238 + my $mask = umask();
239 + umask($mask & ~0070) if $cache_grpshared;
240 + if ((-d $cache_d || mkdir($cache_d, $cache_grpshared ? 0770 : 0700)) &&
241 + sysopen(my $fd, "$cache_file.lock", O_WRONLY|O_CREAT|O_EXCL, $cache_grpshared ? 0660 : 0600)) {
242 + store_fd([git_cache_file_format(), $cachedata], $fd);
244 + rename "$cache_file.lock", $cache_file;
245 + $result = stat($cache_file)->mtime;
247 + umask($mask) if $cache_grpshared;
251 +sub verify_cached_project {
252 + my ($hashref, $path) = @_;
253 + return undef unless $path;
254 + delete $$hashref{$path}, return undef unless is_valid_project($path);
255 + return $$hashref{$path} if exists $$hashref{$path};
257 + # A valid project was requested but it's not yet in the cache
258 + # Manufacture a minimal project entry (path, name, description)
259 + # Also provide age, but only if it's available via $lastactivity_file
261 + my %proj = ('path' => $path);
262 + my $val = git_get_project_description($path);
263 + defined $val or $val = '';
264 + $proj{'descr_long'} = $val;
265 + $proj{'descr'} = chop_str($val, $projects_list_description_width, 5);
266 + unless ($omit_owner) {
267 + $val = git_get_project_owner($path);
268 + defined $val or $val = '';
269 + $proj{'owner'} = $val;
271 + unless ($omit_age_column) {
272 + ($val) = git_get_last_activity($path, 1);
273 + $proj{'age_epoch'} = $val if defined $val;
275 + $$hashref{$path} = \%proj;
279 +sub git_filter_cached_projects {
280 + my ($cache, $projlist, $verify) = @_;
281 + my $hashref = $$cache[1];
282 + my $sub = $verify ?
283 + sub {verify_cached_project($hashref, $_[0])} :
284 + sub {$$hashref{$_[0]}};
286 + my $c = &$sub($_->{'path'});
287 + defined $c ? ($_ = $c) : ()
291 # fills project list info (age, description, owner, category, forks, etc.)
292 # for each project in the list, removing invalid projects from
293 # returned list, or fill only specified info.
295 # Invalid projects are removed from the returned list if and only if you
296 -# ask 'age' or 'age_string' to be filled, because they are the only fields
297 +# ask 'age_epoch' to be filled, because they are the only fields
298 # that run unconditionally git command that requires repository, and
299 # therefore do always check if project repository is invalid.
301 @@ -5700,6 +5807,66 @@ sub project_info_needs_filling {
302 # NOTE: modifies $projlist, but does not remove entries from it
303 sub fill_project_list_info {
304 my ($projlist, @wanted_keys) = @_;
306 + my $rebuild = @wanted_keys && $wanted_keys[0] eq 'rebuild-cache' && shift @wanted_keys;
307 + return fill_project_list_info_uncached($projlist, @wanted_keys)
308 + unless $projlist_cache_lifetime && $projlist_cache_lifetime > 0;
312 + my $cache_lifetime = $rebuild ? 0 : $projlist_cache_lifetime;
313 + my $cache_file = "$cache_dir/$projlist_cache_name";
319 + if ($cache_lifetime && -f $cache_file) {
320 + $cache_mtime = stat($cache_file)->mtime;
321 + $cache_dump = undef if $cache_mtime &&
322 + (!$cache_dump_mtime || $cache_dump_mtime != $cache_mtime);
324 + if (defined $cache_mtime && # caching is on and $cache_file exists
325 + $cache_mtime + $cache_lifetime*60 > $now &&
326 + ($cache_dump || ($cache_dump = git_retrieve_cache_file($cache_file)))) {
328 + $cache_dump_mtime = $cache_mtime;
329 + $stale = $now - $cache_mtime;
330 + my $verify = ($action eq 'summary' || $action eq 'forks') &&
331 + gitweb_check_feature('forks');
332 + @projects = git_filter_cached_projects($cache_dump, $projlist, $verify);
334 + } else { # Cache miss.
335 + if (defined $cache_mtime) {
336 + # Postpone timeout by two minutes so that we get
337 + # enough time to do our job, or to be more exact
338 + # make cache expire after two minutes from now.
339 + my $time = $now - $cache_lifetime*60 + 120;
340 + utime $time, $time, $cache_file;
342 + my @all_projects = git_get_projects_list();
343 + my %all_projects_filled = map { ( $_->{'path'} => $_ ) }
344 + fill_project_list_info_uncached(\@all_projects);
345 + map { $all_projects_filled{$_->{'path'}} = $_ }
346 + filter_forks_from_projects_list([values(%all_projects_filled)])
347 + if gitweb_check_feature('forks');
348 + $cache_dump = [[sort {$a->{'path'} cmp $b->{'path'}} values(%all_projects_filled)],
349 + \%all_projects_filled];
350 + $cache_dump_mtime = git_store_cache_file($cache_file, $cache_dump);
351 + @projects = git_filter_cached_projects($cache_dump, $projlist);
354 + if ($cache_lifetime && $stale > 0) {
355 + print "<div class=\"stale_info\">Cached version (${stale}s old)</div>\n"
356 + unless $shown_stale_message;
357 + $shown_stale_message = 1;
363 +sub fill_project_list_info_uncached {
364 + my ($projlist, @wanted_keys) = @_;
366 my $filter_set = sub { return @_; };
368 @@ -5710,12 +5877,12 @@ sub fill_project_list_info {
369 my $show_ctags = gitweb_check_feature('ctags');
371 foreach my $pr (@$projlist) {
372 - if (project_info_needs_filling($pr, $filter_set->('age', 'age_string'))) {
373 + if (project_info_needs_filling($pr, $filter_set->('age_epoch'))) {
374 my (@activity) = git_get_last_activity($pr->{'path'});
378 - ($pr->{'age'}, $pr->{'age_string'}) = @activity;
379 + ($pr->{'age_epoch'}) = @activity;
381 if (project_info_needs_filling($pr, $filter_set->('descr', 'descr_long'))) {
382 my $descr = git_get_project_description($pr->{'path'}) || "";
383 @@ -5751,11 +5918,11 @@ sub sort_projects_list {
384 return sub { $a->{$key} cmp $b->{$key} };
387 - sub order_num_then_undef {
388 + sub order_reverse_num_then_undef {
392 - (defined $b->{$key} ? $a->{$key} <=> $b->{$key} : -1) :
393 + (defined $b->{$key} ? $b->{$key} <=> $a->{$key} : -1) :
394 (defined $b->{$key} ? 1 : 0)
397 @@ -5764,7 +5931,7 @@ sub sort_projects_list {
398 project => order_str('path'),
399 descr => order_str('descr_long'),
400 owner => order_str('owner'),
401 - age => order_num_then_undef('age'),
402 + age => order_reverse_num_then_undef('age_epoch'),
405 my $ordering = $orderings{$order};
406 @@ -5817,6 +5984,7 @@ sub git_project_list_rows {
407 $from = 0 unless defined $from;
408 $to = $#$projlist if (!defined $to || $#$projlist < $to);
412 for (my $i = $from; $i <= $to; $i++) {
413 my $pr = $projlist->[$i];
414 @@ -5857,8 +6025,14 @@ sub git_project_list_rows {
415 print "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
417 unless ($omit_age_column) {
418 - print "<td class=\"". age_class($pr->{'age'}) . "\">" .
419 - (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n";
420 + my ($age, $age_string, $age_epoch);
421 + if (defined($age_epoch = $pr->{'age_epoch'})) {
422 + $age = $now - $age_epoch;
423 + $age_string = age_string($age);
425 + $age_string = "No commits";
427 + print "<td class=\"". age_class($age) . "\">" . $age_string . "</td>\n";
429 print"<td class=\"link\">" .
430 $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary") . " | " .
431 @@ -5891,7 +6065,7 @@ sub git_project_list_body {
432 if ($tagfilter || $search_regexp);
434 my @all_fields = ('descr', 'descr_long', 'ctags', 'category');
435 - push @all_fields, ('age', 'age_string') unless($omit_age_column);
436 + push @all_fields, 'age_epoch' unless($omit_age_column);
437 push @all_fields, 'owner' unless($omit_owner);
438 @projects = fill_project_list_info(\@projects, @all_fields);
440 diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
441 index 1710b06f..1b7a01bb 100644
442 --- a/gitweb/static/gitweb.css
443 +++ b/gitweb/static/gitweb.css
444 @@ -655,6 +655,14 @@ div.remote {
445 display: inline-block;
451 + font-style: italic;
456 /* JavaScript-based timezone manipulation */
458 .popup { /* timezone selection UI */