patches: update gitweb patch set
[git-osx-installer.git] / patches / gitweb / q / gitweb-support-caching-projects-list.diff
blob7cc464a8172af24535f11af28dd3b278381d30c3
1 Subject: [PATCH] gitweb: support caching projects list
3 On repo.or.cz (permanently I/O overloaded and hosting 1050 project +
4 forks), the projects list (the default gitweb page) can take more than
5 a minute to generate. This naive patch adds simple support for caching
6 the projects list data structure so that all the projects do not need
7 to get rescanned at every page access.
9 $projlist_cache_lifetime gitweb configuration variable is introduced,
10 by default set to zero. If set to non-zero, it describes the number of
11 minutes for which the cache remains valid. Only single project root
12 per system can use the cache. Any script running with the same uid as
13 gitweb can change the cache trivially - this is for secure
14 installations only.
16 The cache itself is stored in $cache_dir/$projlist_cache_name using
17 Storable to store() Perl data structure with the list of project
18 details. When reusing the cache, the data is retrieve()'d back into
19 @projects.
21 To prevent contention when multiple accesses coincide with cache
22 expiration, the timeout is postponed to time()+120 when we start
23 refreshing. When showing cached version, a disclaimer is shown
24 at the top of the projects list.
26 [jn: moved from Data::Dumper to Storable for serialization of data]
28 $cache_grpshared gitweb configuration variable can be set to 1 to
29 create the cache file group-readable and group-writable to facilitate
30 external re-generation of the cache.
32 Signed-off-by: Petr Baudis <pasky@ucw.cz>
33 Signed-off-by: Jakub Narebski <jnareb@gmail.com>
34 Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
35 ---
36 Documentation/gitweb.conf.txt | 51 +++++++++-
37 gitweb/gitweb.perl | 212 ++++++++++++++++++++++++++++++++++++++----
38 gitweb/static/gitweb.css | 8 ++
39 3 files changed, 251 insertions(+), 20 deletions(-)
41 diff --git a/Documentation/gitweb.conf.txt b/Documentation/gitweb.conf.txt
42 index 841e7dad..2ed86bf3 100644
43 --- a/Documentation/gitweb.conf.txt
44 +++ b/Documentation/gitweb.conf.txt
45 @@ -428,7 +428,8 @@ $frontpage_no_project_list::
46 If 0, the gitweb frontpage will contain the project list; if 1 instead,
47 it will contain just the index text, search form, tag cloud (if enabled)
48 and a link to the actual project list. The page is reduced, but all
49 - projects still need to be scanned for the tag cloud construction. If the
50 + projects still need to be scanned for the tag cloud construction (but
51 + the project info cache is used if enabled, of course). If the
52 option is set to 2, not even the tag cloud will be shown; this is fastest.
53 This option is useful for sites with large amount of projects. The default
54 is 0.
55 @@ -467,6 +468,54 @@ CPU-intensive. Note also that non Git tools can have problems with
56 patches generated with options mentioned above, especially when they
57 involve file copies (\'-C') or criss-cross renames (\'-B').
59 +These configuration variable control caching in gitweb. If you don't
60 +run gitweb installation on busy site with large number of repositories
61 +(projects) you probably don't need caching; by default caching is
62 +turned off.
64 +$projlist_cache_lifetime::
65 + Lifetime of in-gitweb cache for projects list page, in minutes.
66 + By default set to 0, which means tha projects list caching is
67 + turned off.
69 +$cache_dir::
70 + The cached list version (cache of Perl structure, not of final
71 + output) is stored in "$cache_dir/$projlist_cache_name". $cache_dir
72 + should be writable only by processes with the same uid as gitweb
73 + (usually web server uid); if $cache_dir does not exist gitweb will
74 + try to create it.
76 +$projlist_cache_name::
77 + The cached list version (cache of Perl structure, not of final
78 + output) is stored in "$cache_dir/$projlist_cache_name". Only single
79 + gitweb project root per system is supported, unless gitweb instances
80 + for different projects root have different configuration.
82 +By default $cache_dir is set to "$TMPDIR/gitweb" if $TMPDIR
83 +environment variable does exist, "/tmp/gitweb" otherwise.
84 +Default name for $projlist_cache_name is 'gitweb.index.cache';
86 +*Note* projects list cache file can be tweaked by other scripts
87 +running with the same uid as gitweb; use this ONLY at secure
88 +installations!!!
90 +$cache_grpshared::
91 + By default, $cache_grpshared is 0 and the cache file is accessible
92 + only by the webserver uid; however, when it is set to 1, it will
93 + also be set group-readable and group-writable. You can use that
94 + to externally trigger cache re-generation before users may have
95 + a chance to trigger it (and wait a long time). For example, you
96 + could use this script:
98 +----------------------------------------------------------------------
99 +REQUEST_METHOD=HEAD perl -e 'do "./gitweb.cgi"; END {
100 + fill_project_list_info([], "rebuild-cache") }' >/dev/null 2>&1
101 +----------------------------------------------------------------------
103 +(You need to run it in the directory of gitweb.cgi and, if
104 +gitweb_config.perl is not located in that same directory, also
105 +set GITWEB_CONFIG for getweb_config.perl to be loaded properly.)
108 Some optional features and policies
109 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
110 diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
111 index 76815232..4c122c69 100755
112 --- a/gitweb/gitweb.perl
113 +++ b/gitweb/gitweb.perl
114 @@ -19,6 +19,7 @@ use File::Find qw();
115 use File::Basename qw(basename);
116 use Time::HiRes qw(gettimeofday tv_interval);
117 use Time::Local;
118 +use constant GITWEB_CACHE_FORMAT => "Gitweb Cache Format 3";
119 binmode STDOUT, ':utf8';
121 if (!defined($CGI::VERSION) || $CGI::VERSION < 4.08) {
122 @@ -209,9 +210,25 @@ our $highlight_bin = "++HIGHLIGHT_BIN++";
124 # Whether to include project list on the gitweb front page; 0 means yes,
125 # 1 means no list but show tag cloud if enabled (all projects still need
126 -# to be scanned), 2 means no list and no tag cloud (very fast)
127 +# to be scanned, unless the info is cached), 2 means no list and no tag cloud
128 +# (very fast)
129 our $frontpage_no_project_list = 0;
131 +# projects list cache for busy sites with many projects;
132 +# if you set this to non-zero, it will be used as the cached
133 +# index lifetime in minutes
135 +# the cached list version is stored in $cache_dir/$cache_name and can
136 +# be tweaked by other scripts running with the same uid as gitweb -
137 +# use this ONLY at secure installations; only single gitweb project
138 +# root per system is supported, unless you tweak configuration!
139 +our $projlist_cache_lifetime = 0; # in minutes
140 +# FHS compliant $cache_dir would be "/var/cache/gitweb"
141 +our $cache_dir =
142 + (defined $ENV{'TMPDIR'} ? $ENV{'TMPDIR'} : '/tmp').'/gitweb';
143 +our $projlist_cache_name = 'gitweb.index.cache';
144 +our $cache_grpshared = 0;
146 # information about snapshot formats that gitweb is capable of serving
147 our %known_snapshot_formats = (
148 # name => {
149 @@ -1243,8 +1260,13 @@ sub handle_errors_html {
151 set_message(\&handle_errors_html);
153 +our $shown_stale_message = 0;
154 +our $cache_dump = undef;
155 +our $cache_dump_mtime = undef;
157 # dispatch
158 sub dispatch {
159 + $shown_stale_message = 0;
160 if (!defined $action) {
161 if (defined $hash) {
162 $action = git_get_type($hash);
163 @@ -3329,29 +3351,27 @@ sub git_get_last_activity {
164 if ($lastactivity_file && open($fd, "<", "$git_dir/$lastactivity_file")) {
165 my $activity = <$fd>;
166 close $fd;
167 - return (undef, undef) unless defined $activity;
168 + return (undef) unless defined $activity;
169 chomp $activity;
170 - return (undef, undef) if $activity eq '';
171 + return (undef) if $activity eq '';
172 if (my $timestamp = parse_activity_date($activity)) {
173 - my $age = time - $timestamp;
174 - return ($age, age_string($age));
175 + return ($timestamp);
178 - return (undef, undef) if $quick;
179 + return (undef) if $quick;
180 open($fd, "-|", git_cmd(), 'for-each-ref',
181 '--format=%(committer)',
182 '--sort=-committerdate',
183 '--count=1',
184 map { "refs/$_" } get_branch_refs ()) or return;
185 my $most_recent = <$fd>;
186 - close $fd or return;
187 + close $fd or return (undef);
188 if (defined $most_recent &&
189 $most_recent =~ / (\d+) [-+][01]\d\d\d$/) {
190 my $timestamp = $1;
191 - my $age = time - $timestamp;
192 - return ($age, age_string($age));
193 + return ($timestamp);
195 - return (undef, undef);
196 + return (undef);
199 # Implementation note: when a single remote is wanted, we cannot use 'git
200 @@ -5682,12 +5702,99 @@ sub project_info_needs_filling {
201 return;
204 +sub git_cache_file_format {
205 + return GITWEB_CACHE_FORMAT .
206 + (gitweb_check_feature('forks') ? " (forks)" : "");
209 +sub git_retrieve_cache_file {
210 + my $cache_file = shift;
212 + use Storable qw(retrieve);
214 + if ((my $dump = eval { retrieve($cache_file) })) {
215 + return $$dump[1] if
216 + ref($dump) eq 'ARRAY' &&
217 + @$dump == 2 &&
218 + ref($$dump[1]) eq 'ARRAY' &&
219 + @{$$dump[1]} == 2 &&
220 + ref(${$$dump[1]}[0]) eq 'ARRAY' &&
221 + ref(${$$dump[1]}[1]) eq 'HASH' &&
222 + $$dump[0] eq git_cache_file_format();
225 + return undef;
228 +sub git_store_cache_file {
229 + my ($cache_file, $cachedata) = @_;
231 + use File::Basename qw(dirname);
232 + use File::stat;
233 + use POSIX qw(:fcntl_h);
234 + use Storable qw(store_fd);
236 + my $result = undef;
237 + my $cache_d = dirname($cache_file);
238 + my $mask = umask();
239 + umask($mask & ~0070) if $cache_grpshared;
240 + if ((-d $cache_d || mkdir($cache_d, $cache_grpshared ? 0770 : 0700)) &&
241 + sysopen(my $fd, "$cache_file.lock", O_WRONLY|O_CREAT|O_EXCL, $cache_grpshared ? 0660 : 0600)) {
242 + store_fd([git_cache_file_format(), $cachedata], $fd);
243 + close $fd;
244 + rename "$cache_file.lock", $cache_file;
245 + $result = stat($cache_file)->mtime;
247 + umask($mask) if $cache_grpshared;
248 + return $result;
251 +sub verify_cached_project {
252 + my ($hashref, $path) = @_;
253 + return undef unless $path;
254 + delete $$hashref{$path}, return undef unless is_valid_project($path);
255 + return $$hashref{$path} if exists $$hashref{$path};
257 + # A valid project was requested but it's not yet in the cache
258 + # Manufacture a minimal project entry (path, name, description)
259 + # Also provide age, but only if it's available via $lastactivity_file
261 + my %proj = ('path' => $path);
262 + my $val = git_get_project_description($path);
263 + defined $val or $val = '';
264 + $proj{'descr_long'} = $val;
265 + $proj{'descr'} = chop_str($val, $projects_list_description_width, 5);
266 + unless ($omit_owner) {
267 + $val = git_get_project_owner($path);
268 + defined $val or $val = '';
269 + $proj{'owner'} = $val;
271 + unless ($omit_age_column) {
272 + ($val) = git_get_last_activity($path, 1);
273 + $proj{'age_epoch'} = $val if defined $val;
275 + $$hashref{$path} = \%proj;
276 + return \%proj;
279 +sub git_filter_cached_projects {
280 + my ($cache, $projlist, $verify) = @_;
281 + my $hashref = $$cache[1];
282 + my $sub = $verify ?
283 + sub {verify_cached_project($hashref, $_[0])} :
284 + sub {$$hashref{$_[0]}};
285 + return map {
286 + my $c = &$sub($_->{'path'});
287 + defined $c ? ($_ = $c) : ()
288 + } @$projlist;
291 # fills project list info (age, description, owner, category, forks, etc.)
292 # for each project in the list, removing invalid projects from
293 # returned list, or fill only specified info.
295 # Invalid projects are removed from the returned list if and only if you
296 -# ask 'age' or 'age_string' to be filled, because they are the only fields
297 +# ask 'age_epoch' to be filled, because they are the only fields
298 # that run unconditionally git command that requires repository, and
299 # therefore do always check if project repository is invalid.
301 @@ -5700,6 +5807,66 @@ sub project_info_needs_filling {
302 # NOTE: modifies $projlist, but does not remove entries from it
303 sub fill_project_list_info {
304 my ($projlist, @wanted_keys) = @_;
306 + my $rebuild = @wanted_keys && $wanted_keys[0] eq 'rebuild-cache' && shift @wanted_keys;
307 + return fill_project_list_info_uncached($projlist, @wanted_keys)
308 + unless $projlist_cache_lifetime && $projlist_cache_lifetime > 0;
310 + use File::stat;
312 + my $cache_lifetime = $rebuild ? 0 : $projlist_cache_lifetime;
313 + my $cache_file = "$cache_dir/$projlist_cache_name";
315 + my @projects;
316 + my $stale = 0;
317 + my $now = time();
318 + my $cache_mtime;
319 + if ($cache_lifetime && -f $cache_file) {
320 + $cache_mtime = stat($cache_file)->mtime;
321 + $cache_dump = undef if $cache_mtime &&
322 + (!$cache_dump_mtime || $cache_dump_mtime != $cache_mtime);
324 + if (defined $cache_mtime && # caching is on and $cache_file exists
325 + $cache_mtime + $cache_lifetime*60 > $now &&
326 + ($cache_dump || ($cache_dump = git_retrieve_cache_file($cache_file)))) {
327 + # Cache hit.
328 + $cache_dump_mtime = $cache_mtime;
329 + $stale = $now - $cache_mtime;
330 + my $verify = ($action eq 'summary' || $action eq 'forks') &&
331 + gitweb_check_feature('forks');
332 + @projects = git_filter_cached_projects($cache_dump, $projlist, $verify);
334 + } else { # Cache miss.
335 + if (defined $cache_mtime) {
336 + # Postpone timeout by two minutes so that we get
337 + # enough time to do our job, or to be more exact
338 + # make cache expire after two minutes from now.
339 + my $time = $now - $cache_lifetime*60 + 120;
340 + utime $time, $time, $cache_file;
342 + my @all_projects = git_get_projects_list();
343 + my %all_projects_filled = map { ( $_->{'path'} => $_ ) }
344 + fill_project_list_info_uncached(\@all_projects);
345 + map { $all_projects_filled{$_->{'path'}} = $_ }
346 + filter_forks_from_projects_list([values(%all_projects_filled)])
347 + if gitweb_check_feature('forks');
348 + $cache_dump = [[sort {$a->{'path'} cmp $b->{'path'}} values(%all_projects_filled)],
349 + \%all_projects_filled];
350 + $cache_dump_mtime = git_store_cache_file($cache_file, $cache_dump);
351 + @projects = git_filter_cached_projects($cache_dump, $projlist);
354 + if ($cache_lifetime && $stale > 0) {
355 + print "<div class=\"stale_info\">Cached version (${stale}s old)</div>\n"
356 + unless $shown_stale_message;
357 + $shown_stale_message = 1;
360 + return @projects;
363 +sub fill_project_list_info_uncached {
364 + my ($projlist, @wanted_keys) = @_;
365 my @projects;
366 my $filter_set = sub { return @_; };
367 if (@wanted_keys) {
368 @@ -5710,12 +5877,12 @@ sub fill_project_list_info {
369 my $show_ctags = gitweb_check_feature('ctags');
370 PROJECT:
371 foreach my $pr (@$projlist) {
372 - if (project_info_needs_filling($pr, $filter_set->('age', 'age_string'))) {
373 + if (project_info_needs_filling($pr, $filter_set->('age_epoch'))) {
374 my (@activity) = git_get_last_activity($pr->{'path'});
375 unless (@activity) {
376 next PROJECT;
378 - ($pr->{'age'}, $pr->{'age_string'}) = @activity;
379 + ($pr->{'age_epoch'}) = @activity;
381 if (project_info_needs_filling($pr, $filter_set->('descr', 'descr_long'))) {
382 my $descr = git_get_project_description($pr->{'path'}) || "";
383 @@ -5751,11 +5918,11 @@ sub sort_projects_list {
384 return sub { $a->{$key} cmp $b->{$key} };
387 - sub order_num_then_undef {
388 + sub order_reverse_num_then_undef {
389 my $key = shift;
390 return sub {
391 defined $a->{$key} ?
392 - (defined $b->{$key} ? $a->{$key} <=> $b->{$key} : -1) :
393 + (defined $b->{$key} ? $b->{$key} <=> $a->{$key} : -1) :
394 (defined $b->{$key} ? 1 : 0)
397 @@ -5764,7 +5931,7 @@ sub sort_projects_list {
398 project => order_str('path'),
399 descr => order_str('descr_long'),
400 owner => order_str('owner'),
401 - age => order_num_then_undef('age'),
402 + age => order_reverse_num_then_undef('age_epoch'),
405 my $ordering = $orderings{$order};
406 @@ -5817,6 +5984,7 @@ sub git_project_list_rows {
407 $from = 0 unless defined $from;
408 $to = $#$projlist if (!defined $to || $#$projlist < $to);
410 + my $now = time;
411 my $alternate = 1;
412 for (my $i = $from; $i <= $to; $i++) {
413 my $pr = $projlist->[$i];
414 @@ -5857,8 +6025,14 @@ sub git_project_list_rows {
415 print "<td><i>" . chop_and_escape_str($pr->{'owner'}, 15) . "</i></td>\n";
417 unless ($omit_age_column) {
418 - print "<td class=\"". age_class($pr->{'age'}) . "\">" .
419 - (defined $pr->{'age_string'} ? $pr->{'age_string'} : "No commits") . "</td>\n";
420 + my ($age, $age_string, $age_epoch);
421 + if (defined($age_epoch = $pr->{'age_epoch'})) {
422 + $age = $now - $age_epoch;
423 + $age_string = age_string($age);
424 + } else {
425 + $age_string = "No commits";
427 + print "<td class=\"". age_class($age) . "\">" . $age_string . "</td>\n";
429 print"<td class=\"link\">" .
430 $cgi->a({-href => href(project=>$pr->{'path'}, action=>"summary")}, "summary") . " | " .
431 @@ -5891,7 +6065,7 @@ sub git_project_list_body {
432 if ($tagfilter || $search_regexp);
433 # fill the rest
434 my @all_fields = ('descr', 'descr_long', 'ctags', 'category');
435 - push @all_fields, ('age', 'age_string') unless($omit_age_column);
436 + push @all_fields, 'age_epoch' unless($omit_age_column);
437 push @all_fields, 'owner' unless($omit_owner);
438 @projects = fill_project_list_info(\@projects, @all_fields);
440 diff --git a/gitweb/static/gitweb.css b/gitweb/static/gitweb.css
441 index 1710b06f..1b7a01bb 100644
442 --- a/gitweb/static/gitweb.css
443 +++ b/gitweb/static/gitweb.css
444 @@ -655,6 +655,14 @@ div.remote {
445 display: inline-block;
448 +div.stale_info {
449 + display: block;
450 + text-align: right;
451 + font-style: italic;
452 + margin-top: 6px;
453 + margin-right: 8px;
456 /* JavaScript-based timezone manipulation */
458 .popup { /* timezone selection UI */