Added Test directory and TestUpload.html
[CGIscriptor.git] / CGIservlet.pl
blob313c328bab0b5bd8d1ce97fe2242c629a2fd0be5
1 #! /usr/bin/perl
3 # Put the full path to perl on the first line, run the program with
4 # `perl CGIservlet`, or put a symbolic link to perl in
5 # the startup directory if you need a special version of perl.
7 if(grep(/\-\-help/i, @ARGV))
9 print << 'ENDOFHELPTEXT';
10 # CGIservlet:
11 # A HTTPd "connector" for running CGI scripts on unix systems as WWW
12 # accessible Web sites. The servlet starts a true HTTP daemon that channels
13 # HTTP requests to forked daughter processes. CGIservlet.pl is NOT a
14 # full fledged server. Moreover, this servlet is definitely NOT intended
15 # as a replacement of a real server (e.g., Apache). It's design goal was
16 # SIMPLICITY, and not mileage.
18 # Note that a HTTP server can be accessed on your local machine WITHOUT
19 # internet access (but WITH a DNS?):
20 # use "http://localhost[:port]/[path]" or "http://127.0.0.1[:port]/[path]"
21 # as the URL. It is also easy to restrict access to the servlet to localhost
22 # users (i.e., the computer running the servlet).
24 # Suggested uses:
25 # - A testbed for CGI-scripts and document-trees outside the primary server.
26 # When developing new scripts and services, you don't want to mess up your
27 # current Web-site. CGIservlet is an easy way to start a temporary (private)
28 # server. CGIservlet allows to test separate HTTP server components, e.g.,
29 # user authentication, in isolation.
31 # - A special purpose temporary server (WWW everywhere/anytime).
32 # We run identification and other experiments over the inter-/intra-net using
33 # CGI-scripts. This means a lot of development and changes and only little
34 # actual run-time. The people doing this do not want "scripting" access to our
35 # departmental server with all its restrictions and security. So we need a
36 # small, lightweigth, easy-to-configure server that can be run by each
37 # investigator on her own account (and risk).
39 # - Interactive WWW presentations.
40 # Not everyone is content with the features of "standard" office presentation
41 # software. HTML and its associated browsers are an alternative (especially
42 # under Linux). However, you need a server to realize the full interactive
43 # nature of the WWW. CGIservlet with the necessary scripts can be run from
44 # a floppie (a Web server in 100 kB). The CGIservlet can actually run a
45 # (small) web site from RAM, without disk access (if you DO NOT use the
46 # 2>pid.log redirection on startup).
47 # With the "localhost" or "127.0.0.1" id in your browser you can use the
48 # servlet standalone.
50 # When the servlet is started with the -r option, only requests from "localhost"
51 # or "127.0.0.1" are accepted (default) or from addresses indicated after the
52 # -r switch.
54 # Running demo's and more information can be found at
55 # http://www.fon.hum.uva.nl/rob/OSS/OSS.html
58 ############################################################################
60 # Changes (document ALL changes with date, name and email here):
62 # 22 Jul 2003 - Plain output using binary print io. `cat ...`
63 # 22 Jul 2003 - Added 'use CGI::Carp qw(fatalsToBrowser);' line
64 # for debugging. Standard this is commented out
65 # for security reasons (suggested by Jochen_Hayek@ACM.org).
66 # 22 Jul 2003 - Added error checking to doarg (suggested by Jochen_Hayek@ACM.org)
67 # 22 Jul 2003 - Removed SERVER_PORT from HTTP_HOST (Bug found by Jochen_Hayek@ACM.org)
68 # 22 Jul 2003 - Updated documentation. Added CGIservlet directory
69 # to the search path of CGIservletSETUP.pl
70 # 20 May 2003 - Made sure recycled (double) pid's do not mess up the
71 # @brood list and added a --help switch.
72 # 20 May 2003 - Added a maximum running time for child processes
73 # with command line switch -xterm.
74 # 15 Jan 2002 - Version 1.3
75 # 19 Oct 2001 - Included browsing of directories and a new -s
76 # security switch. With security toggled of
77 # directories can be browsed and all mime-types
78 # are served, either as 'text/plain' or as
79 # 'application/octed-stream'.
80 # 18 May 2001 - Added some HTTP header lines.
81 # 13 Jun 2000 - Included the possibility to add POST request
82 # to GET query-strings (and change the request
83 # method). The -l ($Maxlength) maximum length
84 # option now covers POST requests too.
85 # 8 Dec 1999 - Included hooks for compression when running from RAM.
86 # 2 Dec 1999 - Autoflush enabled.
87 # 2 Dec 1999 - Allow running a Web Site from RAM.
88 # 2 Dec 1999 - Changed the behavior of CGIservletSETUP. CGIservlet
89 # will eval ALL setup files, the one in the CGIscriptor
90 # subdirectory (if any) AND the one in the current
91 # directory. (also added a close(SETUP) command)
92 # 26 Nov 1999 - Added some minimal security for 'automatic', out of
93 # the box installation.
94 # 26 Nov 1999 - Made the text/osshell mime-type functional (i.e.,
95 # without any scripts, implement a dynamic web server)
96 # Linited to '.cgi' extension.
97 # 26 Nov 1999 - Added aliasing of URL paths, both one-to-one lookups
98 # and full regular expression, i.e., $Path =~ s/.../.../g
99 # replace commands
100 # 28 Sep 1999 - Made all client supplied HTTP parameter names lowercase
101 # to handle inconsistencies in case use.
102 # 29 Jul 1999 - Allowed for a SETUP configuration file 'CGIservletSETUP.pl'.
103 # Use $beginarg from the 'CGIscriptor/' directory if it exists.
104 # (R.J.J.H.vanSon@uva.nl)
107 ############################################################################
109 # Known bugs
111 # 23 Mar 2000 - An odd server side network error is reported by Netscape
112 # when a Post is initiated from a Javascript Submit of a
113 # <FORM>. This was found on Red Hat 6.1 Linux with perl 5.00503,
114 # 5.00503 and 5.6.0. But not on IRIX or Red Hat 5.0, 7.x.
116 ############################################################################
119 # Inner workings:
120 # Whenever an HTTP request is received, the specified CGI script is
121 # started inside a child process as if it was inside a real server (e.g.,
122 # Apache). The evironment variables are set more or less as in Apache.
123 # Note that CGIservlet only uses a SINGLE script for ALL requests.
124 # No attemps for security are made, it is the script's responsibility to
125 # check access rights and the validity of the request.
126 # When no scripts are given, CGIservlet runs as a bare bone WWW server
127 # configurable to execute scripts (the default setting is as a
128 # STATIC server).
130 # Author and copyright (c) :
131 # Rob van Son
132 # email:
133 # R.J.J.H.vanSon@uva.nl
134 # Institute of Phonetic Sciences/ACLC
135 # University of Amsterdam
137 # copying freely from the mhttpd server by Jerry LeVan (levan@eagle.eku.edu)
138 # Date: July 22, 2003
139 # Version:1.301
140 # Env: Perl 5.002 and later
143 ################################################################################
145 # LICENSE #
147 # This program is free software; you can redistribute it and/or #
148 # modify it under the terms of the GNU General Public License #
149 # as published by the Free Software Foundation; either version 2 #
150 # of the License, or (at your option) any later version. #
152 # This program is distributed in the hope that it will be useful, #
153 # but WITHOUT ANY WARRANTY; without even the implied warranty of #
154 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
155 # GNU General Public License for more details. #
157 # You should have received a copy of the GNU General Public License #
158 # along with this program; if not, write to the Free Software #
159 # Foundation, Inc., 59 Temple Place - Suite 330, #
160 # Boston, MA 02111-1307, USA. #
162 ################################################################################
164 # Note: CGIservlet.pl was directly inspired by Jerry LeVan's
165 # (levan@eagle.eku.edu) simple mhttpd server which again was
166 # inspired by work of others. CGIservlet is used as a bare bones
167 # socket server for a single CGI script at a time.
169 # Use: CGIservlet.pl -<switch> <argument> 2>pid.log & (sh/bash)
170 # CGIservlet.pl -<switch> <argument> >&pid.log & (csh)
172 # The servlet prints out pid and port number on STDERR. It is
173 # adviced to store these in a separate file (this will become the
174 # error log).
175 # NOTE: When running CGIservlet from a Memmory Image (i.e. RAM),
176 # do NOT redirect the error output to a file, but use something
177 # like MAILTO or /dev/null!
179 # Stop: sh pid.log (kills the server process)
181 # The first line in the file that receives STDERR output is a command
182 # to stop CGIservlet.
184 # examples:
185 # CGIservlet.pl -p 2345 -d /cgi-bin/CGIscriptor.pl -t /WWW 2>pid.log &
186 # CGIservlet.pl -p 8080 -b 'require "CGIscriptor.pl";' -t $PWD -e \
187 # 'Handle_Request();' 2>pid.log &
189 # The following example settings implement a static WWW server using 'cat'
190 # (and prohibiting Queries):
191 # -p 8008
192 # -t `pwd`
193 # -b ''
194 # -e
195 # '$ENV{QUERY_STRING}="";$ENV{PATH_INFO}=~/\.([\w]+)$/; "Content-type: ".$mimeType{uc($1)}."\n\n";'
196 # -d 'cat -u -s'
197 # -w '/index.html'
198 # -c 32
199 # -l 512
201 # This is identical to the (static) behaviour of CGIservlet when
202 # -e '' -d '' -x '' is used.
203 # The CGIservlet command should be run from the intended server-root directory.
205 # Another setting will use a package 'CGIscriptor.pl' with a function
206 # 'HandleRequest()' to implement an interactive WWW server with inline
207 # Perl scripting:
208 # -p 8080
209 # -t `pwd`
210 # -b 'require "CGIscriptor.pl";'
211 # -e 'HandleRequest();'
212 # -d ''
213 # -w '/index.html'
214 # -c 32
215 # -l 32767
217 # Look below or in the CGIservletSETUP.pl file for the current default
218 # settings.
221 # ###############################################################################
223 # There are many switches to tailor the workings of CGIservlet.pl.
224 # Some are fairly esoteric and you should only look for them if you
225 # need something special urgently. When building a Web site,
226 # the specific options you need will "suggest" themselves (e.g., port
227 # number, script, or server-root directory). Most default settings
228 # should work fine.
230 # You can add your own configuration in a file called
231 # 'CGIservletSETUP.pl'. This file will be executed ("eval"-ed)
232 # after the default setup, but before the command line options take
233 # effect. CGIservlet looks for the SETUP file in the startup directory
234 # and in the CGIscriptor subdirectory.
235 # (Note that the $beginarg variable is evaluated AFTER the setup file).
237 # In any case, it is best to change the default settings instead of
238 # using the option switches. All defaults are put in a single block.
240 # switches and arguments:
241 # Realy important
242 # -p[ort] port number
243 # For example -p 2345
244 # Obviously the port CGIservlet listenes to. Suggested Default: -p 8008
246 # -a[lias] Alias1 RealURL1 ...
247 # For example -a '/Stimulus.aifc' '/catAIFC.xmr'
248 # Replaces the given Alias URL path by its real URL path. Accepts full
249 # regular expressions too (identified by NON-URL characters).
250 # That is, on each request it performs (in order):
251 # if($AliasTranslation{$Path})
252 # {
253 # $Path = $AliasTranslation{$Path};
255 # elsif(@RegAliasTranslation)
256 # {
257 # my $i;
258 # for($i=0; $i<scalar(@RegAliasTranslation); ++$i)
259 # {
260 # my $Alias = $RegAliasTranslation[$i];
261 # my $RealURL = $RegURLTranslation[$i];
262 # last if ($Path =~ s#$Alias#$RealURL#g);
263 # };
264 # };
265 # The effects can be quite drastic, so be
266 # carefull. Note also, that entering many Regular Expression
267 # aliases could slow down your servlet. Checking stops after
268 # the first match.
269 # Full regular expression alias translations are done in the
270 # order given! They are recognized as Aliases containing
271 # regexp's (i.e., non-URL) operator characters like '^' and
272 # '$'.
273 # Note: The command line is NOT a good place for entering
274 # Aliases, change the code below or add aliases to
275 # CGIservletSETUP.pl.
277 # --help
278 # Prints the manual
280 # Script related
281 # -b[egin] perl commands
282 # For example -b 'require "CGIscriptor.pl";' or
283 # 'require "/WWW/cgi-bin/XMLelement.pl";'
284 # Perl commands evaluated at server startup
286 # -d[o] perl script file
287 # For example -d '/WWW/cgi-bin/CGIscriptor.pl'
288 # The actual CGI-script started as a perl {do "scriptfile"} command.
289 # The PATH_INFO and the QUERY are pushed on @ARGV.
291 # -x shell command
292 # -qx shell command
293 # -exec shell command
294 # OS shell script or command, e.g., -x 'CGIscriptor.pl' or
295 # -x '/WWW/cgi-bin/my-script'
296 # The actual CGI-script started as `my-script \'$Path\' \'$QueryString\'`.
297 # -qx and -exec[ute] are aliases of -x. For security reasons, Paths or
298 # queries containing '-quotes are rejected.
300 # -e[val] perl commands
301 # For example -e 'Handle_Request();'
302 # The argument is evaluated as perl code. The actual CGI-script
303 # can be loaded once with -b 'require module.pm' and you only have to
304 # call the central function(s).
306 # WWW-tree related
307 # -t[extroot] path
308 # For example -t "$PWD" or -t "/WWW/documents"
309 # The root of the server hierachy. Defaults to the working directory
310 # at startup time (`pwd`)
312 # -w[elcome] filepath
313 # For example -w "/index.html" (default)
314 # The default welcome page used when no path is entered. Note that
315 # this path can point to anything (or nothing real at all).
317 # Security related
318 # The following arguments supply some rudimentary security. It is the
319 # responsibility of the script to ensure that the requests are indeed
320 # "legal".
322 # -c[hildren] maximum child processes
323 # For example -c 32
324 # The maximum number of subprocesses started. If there are more requests,
325 # the oldest requests are "killed". This should take care of "zombie"
326 # processes and server overloading. Note that new requests will be
327 # serviced irrespective of the success of killing it's older siblings.
329 # -xtime maximum running time of a child
330 # For example -xtime 36000
331 # The maximum time a child may run in seconds. After a new request has
332 # been servised, all children that have run for longer than this time
333 # will be killed. This stops runaway processes, often connected to
334 # web-crawlers.
336 # -l[ength] maximum length of HTTP request in bytes
337 # For example -l 32768
338 # This prevents overloading the server with enormous queries. Reading of
339 # requests simply stops when this limit is reached. This DOES affect
340 # POST requests. If the combined length of the COMPLETE HTTP request,
341 # including headers, exceeds this limit, the whole request is dropped.
343 # -r[estrict] [Remote-address [Remote-host]]
344 # For example -r 127.0.0.1 (default of -r)
345 # A space separated list of client IP addresses and/or domain names that
346 # should be serviced. Default, i.e., '-r' without any addresses or domain
347 # names, is the localhost IP address '127.0.0.1'.
348 # When using CGIservlet for local purposes only (e.g., development or a
349 # presentation), it would be unsafe to allow others to access the servlet.
350 # If -r is used (or the corresponding @RemoteAddr or @RemoteHost lists are
351 # filled in the code below), all requests from clients whose Remote-address
352 # or Remote-host do not match the indicated addresses will be rejected.
353 # Partial addresses and domain names are allowed. Matching is done according
354 # to Remote-addr =~ /^\Q$pattern\E/ (front to back) and
355 # Remote-host =~ /\Q$pattern\E$/ (back to front)
357 # -s[ecure]
358 # No arguments.
359 # A toggle switch that blocks all access to files with undefined
360 # mime-types (or to serve ascii files as "text/plain"), and blocking directory
361 # browsing. Defaults to blocking what is not explicitely allowed.
363 # -m[emory]
364 # No arguments.
365 # Reads complete Web site into memory and runs from this image.
366 # Set $UseRAMimage = 1; to activate memory-only running.
367 # Under certain circumstance, this can improve security.
368 # Note, however, that running osshellscripts from this image
369 # makes any "security" related claims very shaky.
371 # Speedup
372 # -n[oname]
373 # No arguments.
374 # Retrieving the domain name of the Client (i.e., Remote-host) is a
375 # very slow process and normally useless. To skip it, enter this
376 # option. Note that you cannot use '-r Remote-host' anymore after
377 # you enter -n, only IP addresses will work.
379 # Configuration with the CGIservletSETUP.pl file
381 # You can add your own configuration in a file
382 # called 'CGIservletSETUP.pl'. This file will be executed ("eval"-ed)
383 # after the default setup, but before the command line options take
384 # effect. CGIservlet looks for the SETUP file in the startup directory
385 # and in the CGIservlet and CGIscriptor subdirectories.
386 # (Note that the $beginarg variable is evaluated even later).
388 # Changing POST to GET requests
390 # CGIservlet normally only handles requests with the GET method. Processing
391 # the input from POST requests is left to the reading application. POST
392 # requests add some extra complexity to processing requests. Sometimes,
393 # the reading application doesn't handle POST requests. CGIservlet
394 # already has to manage the HTTP request. Therefore, it can easily
395 # handle the POST request. If the variable $POSTtoGET is set to any
396 # non-false value, the content of whole POST request is added to the
397 # QUERY_STRING environment variable (preceeded by a '&' if necessary).
398 # The content-length is set to 0. If $POSTtoGET equals 'GET', the method
399 # will also be changed to 'GET'.
401 # remarks:
402 # All of the arguments of -d, -e, and -x are processed sequentially
403 # in this order. This might not be what you want so you should be
404 # carefull when using multiple executable arguments.
405 # If none of the executable arguments is DEFINED (i.e., they are entered
406 # as -d '' -e '' -x ''), each request is treated as a simple
407 # text-retrieval. THIS CAN BE A SECURITY RISK!
409 # The wiring of an interactive web-server, which also calls shell
410 # scripts with the extension '.cgi', is in place. You can
411 # "activate" it by changing the "my $ExecuteOSshell = 0;" line to
412 # "my $ExecuteOSshell = 1;".
413 # If you have trouble doing this, it might be a good idea
414 # to reconsider using a dynamic web server. Executing shell
415 # scripts inside a web server is a rather dangerous practise.
417 # CGIservlet can run its "standard" web server from memory.
418 # At startup, all files are read into a hash table. Upon
419 # request, the contents of the file are placed in the
420 # environment variable: CGI_FILE_CONTENTS.
421 # No further disk access is necessary. This means that:
422 # 1 CGIservlet can run a WWW site from a removable disk,
423 # e.g., a floppy
424 # 2 The web servlet can run without any read or write privilege.
425 # 3 The integrity of the Web-site contents can be secured at the
426 # level you want
428 # To compres the memory (RAM) immage, you should hook the
429 # compression function to
430 # $CompressRAMimage = sub { return shift;};
431 # and the decompression function to
432 # $DecompressRAMimage = sub { return shift;};
435 ENDOFHELPTEXT
436 exit;
438 ###################################################################################
440 require 5.002;
441 use strict; # Should realy be used!
442 use Socket;
443 use Carp; # could come in handy (can be missed, I think)
445 # For debugging: uncommenting the use-line below will send
446 # nicely formanted output to the client. However, it is
447 # generally not a good idea to enable clients to test your
448 # scripts and look for holes (SECURITY).
449 # use CGI::Carp qw(fatalsToBrowser);
451 $| = 1; # Autoflush (i'm not sure whether this is usefull)
453 my $version = "1.301";
454 my $program = "CGIservlet.pl";
456 ##################################################################
458 # print some information to STDERR, e.g., the process number #
460 ##################################################################
461 sub logmsg { print STDERR "kill -KILL $$;exit;\n", # Stop CGIservlet
462 "$0 $$: @_ at ", scalar localtime, "\n" }
464 ############################################################
466 # Parse arguments (you can define DEFAULT VALUES here) #
468 ############################################################
470 my $port = 8008; # The port number
472 # Add POST requests to the QUERY_STRING, change method to
473 # GET if the value is 'GET'
474 my $POSTtoGET = 0; # Add POST requests to the query string
476 # (Fast) direct translation of full URL paths
477 my %AliasTranslation = (); # Alias => RealURL pairs (ONLY paths)
478 # Regular expression alias translation, in order of application
479 # (this can be quite slow)
480 my @RegAliasTranslation = ('(^|.*/)CVS(/.*|$)'); # Full regular expression alias/url pairs: URL
481 my @RegURLTranslation = ('/index.html'); # Full regular expression alias/url pairs: PATH
483 my $textroot = $ENV{'PWD'} || `pwd`; # current working directory
484 chomp($textroot); # Remove nasty newline, if present
485 my $doarg = ''; # do "filename",
487 my $beginarg = ''; # eval($Argument) at the start of the program
488 my $evalarg = ''; # eval($Argument) for each request
489 my $execarg = ''; # execute `command \'$textroot$Path\' \'$QueryString\'`
491 my $welcome = '/index.html'; # Default path
493 # Rudimentary security, overflow detection
494 my $MaxBrood = 32; # Maximum number of running children
495 my $MaxTime = 36000; # Maximum time a child may run in seconds
496 my $MaxLength = 2**15; # Maximum Request Length
497 my $Secure = 1; # Block browsing directories and text files or not
499 # If one of the following lists contains any client addresses or names, all others are
500 # blocked (be carefull, your site will be inaccessible if you misspell them).
501 my @RemoteHost = (); # Accepted Hosts, suggest: localhost
502 my @RemoteAddr = (); # Accepted IP addresses, suggest: @RemoteAddr=('127.0.0.1')
503 my $DefaultRemoteAddr = '127.0.0.1'; # default, use localhost IP address
504 my $NONAME = 0; # if 1, do NOT ask for REMOTE_HOST (faster)
506 # Store the whole Web Site in a hash table and use this RAM memory image (if non-zero)
507 my $UseRAMimage = 0;
508 # Empty function handlers for data compression
509 # In general, these must be redefined in the $beginarg
510 my $CompressRAMimage = sub { return shift;};
511 my $DecompressRAMimage = sub { return shift;};
513 # Execute shell CGI scripts when no -d, -e, or -x are supplied
514 my $ExecuteOSshell = 0; # Do you REALY want this? It is dangerous
516 #################################################################
518 # Configure CGIservlet with a setup file (overides the #
519 # default settings, but not the command line options). #
520 # Note that, if it exists, the setup file in the CGIscriptor #
521 # subdirectory is processed EVEN if there is a SETUP file #
522 # in the current directory. #
524 #################################################################
525 # There exists a CGIservlet subdirectory and it contains
526 # a CGIservletSETUP.pl file
527 if((-e './CGIservlet/CGIservletSETUP.pl') &&
528 open(SETUP, '<./CGIservlet/CGIservletSETUP.pl'))
530 # Get the setup code
531 my $SetupCode = join("", <SETUP>);
532 # 'Eval' is used to ensure that the values are entered in the current
533 # package (contrary to what 'do' and 'require' do).
534 (eval $SetupCode) || die "$! $@\n";
535 close(SETUP);
537 # There exists a CGIscriptor subdirectory and it contains
538 # a CGIservletSETUP.pl file
539 if((-e './CGIscriptor/CGIservletSETUP.pl') &&
540 open(SETUP, '<./CGIscriptor/CGIservletSETUP.pl'))
542 # Get the setup code
543 my $SetupCode = join("", <SETUP>);
544 # 'Eval' is used to ensure that the values are entered in the current
545 # package (contrary to what 'do' and 'require' do).
546 (eval $SetupCode) || die "$! $@\n";
547 close(SETUP);
549 # There is a CGIservletSETUP.pl file in the current directory
550 if((-e './CGIservletSETUP.pl') &&
551 open(SETUP, '<./CGIservletSETUP.pl'))
553 # Get the setup code
554 my $SetupCode = join("", <SETUP>);
555 # 'Eval' is used to ensure that the values are entered in the current
556 # package (contrary to what 'do' and 'require' do).
557 (eval $SetupCode) || die "-e $SetupCode: $! $@\n";
558 close(SETUP);
561 ######################################
563 # process arguments and defaults #
565 ######################################
567 while ($_ = shift(@ARGV))
569 # With switches
570 if(/\-p/is) # Port
572 $port = shift(@ARGV);
574 elsif(/\-d/is) # Do
576 $doarg = shift(@ARGV);
578 elsif(/\-(x|qx|exec)/is) # Execute
580 $execarg = shift(@ARGV);
582 elsif(/\-b/is) # Begin
584 $beginarg = shift(@ARGV);
586 elsif(/\-e/is) # Evaluate
588 $evalarg = shift(@ARGV);
590 elsif(/\-t/is) # Textroot
592 $textroot = shift(@ARGV);
594 elsif(/\-w/is) # Default welcome page
596 $welcome = shift(@ARGV);
598 elsif(/\-c/is) # Maximum Children
600 $MaxBrood = shift(@ARGV) || $MaxBrood;
602 elsif(/\-xtime/is) # Maximum running time
604 $MaxTime = shift(@ARGV) || $MaxTime;
606 elsif(/\-l/is) # Maximum Length
608 $MaxLength = shift(@ARGV) || $MaxLength;
610 elsif(/\-m/is) # Run from RAM
612 $UseRAMimage = 1;
614 elsif(/\-a/is) # Aliases
616 while(@ARGV && $ARGV[0] !~ /^\-/) # while not a parameter
618 my $Alias = shift(@ARGV);
619 my $RealURL = $ARGV[0] !~ /^\-/ ? shift(@ARGV) : "";
620 next unless $Alias && $RealURL;
621 # Store the alias
622 # Simple straight translations
623 unless($Alias =~ m/[\Q^$*&@!\?(){}[];:\E]/)
625 $AliasTranslation{$Alias} = $RealURL;
627 else # Full regular expressions
629 push(@RegAliasTranslation, $Alias);
630 push(@RegURLTranslation, $RealURL);
635 elsif(/\-r/is) # Remote host or address
637 while(@ARGV && $ARGV[0] !~ /^\-/) # while not a parameter
639 my $Remote = shift(@ARGV);
640 if($Remote =~ /[\d\.]+/) # A host IP address
642 push(@RemoteAddr, $Remote);
644 else # A host domain name, less secure
646 push(@RemoteHost, $Remote);
650 # Use the default Remote Host (Client) IP address (e.g., localhost)
651 # if no addresses or domain names are entered.
652 push(@RemoteAddr, $DefaultRemoteAddr) unless @RemoteAddr || @RemoteHost;
654 elsif(/\-s/is) # Secure or not
656 $Secure = !$Secure; # Toggle blocking directory browsing and ASCII file access
658 elsif(/\-n/is) # Do NOT extract Remote host
660 $NONAME = 1;
662 else # perform unreliable magick without switches
664 if(/^[0-9]+$/ && $_ > 1024) # A (large) number must be a port
666 $port = $_;
668 elsif(-T && /\.pl$/) # Text file with extension .pl is a Perl file
670 $doarg = $_;
672 elsif(-T && /\.pm$/) # Text file with extension .pm is a Perl module file
674 $beginarg = $_;
676 elsif(-x) # Executables can be executed
678 $execarg = $_;
680 elsif(-d) # A directory can only be the root
682 $textroot = $_;
684 elsif(-T && /^\// && /\.html$/) # An html file path is the default path
686 $welcome = $_;
688 elsif(-T) # A text file is something to do
690 $doarg = $_;
692 elsif(/[\s\{\`\[\@\%]/) # I give up, just try it
694 $evalarg = shift(@ARGV);
699 ################################################
701 # All argument values are known. #
702 # Initialize environment variables. #
703 # (should be accessible to eval($beginarg)) #
705 ################################################
707 # Initialize %ENV
708 $ENV{'SERVER_SOFTWARE'} = "$program $version";
709 $ENV{'GATEWAY_INTERFACE'} = "CGI/1.1";
710 $ENV{'SERVER_PORT'} = "$port";
711 $ENV{'CGI_HOME'} = $textroot;
712 $ENV{'SERVER_ROOT'} = $textroot; # Server Root Directory
713 $ENV{'DOCUMENT_ROOT'} = $textroot; # Server Root Directory
714 $ENV{'SCRIPT_NAME'} = $doarg.$execarg.$evalarg; # Combine executable arguments
716 ################################################
718 # The initial argument should be evaluated #
720 ################################################
722 eval($beginarg) if $beginarg;
724 ################################################
726 # The initial argument has been evaluated #
728 ################################################
730 # Socket related code
731 my $proto = getprotobyname('tcp');
732 $port = $1 if $port =~ /(\d+)/; # untaint port number
734 socket(Server, PF_INET, SOCK_STREAM, $proto) || die "socket: $!";
735 setsockopt(Server, &SOL_SOCKET, &SO_REUSEADDR,
736 pack("l", 1)) || die "setsockopt: $!";
737 bind(Server, sockaddr_in($port, INADDR_ANY)) || die "bind: $!";
738 listen(Server,SOMAXCONN) || die "listen: $!";
741 # Report start of server
742 logmsg "server started on port $port";
744 # Set up SIG vector (every signal will kill the process that receives it)
745 $SIG{CHLD} = 'IGNORE';
746 $SIG{'KILL'} = "SigHandler";
747 $SIG{'TERM'} = "SigHandler";
748 $SIG{'QUIT'} = "SigHandler";
749 $SIG{'HUP'} = "SigHandler";
751 # Define text mime types served if no scripts are defined
752 # Note that the "text/osshell" mime-type is executed by CGIservlet ITSELF!
753 # You should remove it if you don't want that!
754 my %mimeType = (
755 'HTML'=> "text/html",
756 'TXT' => "text/plain",
757 'PL' => "text/plain", # This is incorrect, of course
758 'JPG' => "image/jpeg",
759 'JPEG' => "image/jpeg",
760 'GIF' => "image/gif",
761 'AU' => "audio/basic",
762 'AIF' => "audio/aiff",
763 'AIFC' => "audio/aiff",
764 'AIFF' => "audio/aiff",
765 'GZ' => "application/gzip",
766 'TGZ' => "application/tar",
767 #'CGI' => "text/osshell", # Executes SERVER side shell scripts, HIGHLY DANGEROUS
768 'WAV' => "audio/wav",
769 'OGG' => "audio/x-vorbis",
770 'PDF' => "application/pdf",
771 'PS' => "application/postscript"
774 ################################################
776 # Fill the RAM image of the web site #
778 ################################################
780 my %WWWramImage = ();
781 if($UseRAMimage)
783 my $TotalSize = 0;
784 my @WWWfilelist = `find $textroot ! -type l ! -type d -print`;
785 my $WWWfile;
786 foreach $WWWfile (@WWWfilelist)
788 chomp($WWWfile);
789 # Skip unsupported file types
790 $WWWfile =~ /\.(\w+)$/;
791 my $WWWfileExtension = uc($1);
792 next unless $mimeType{$WWWfileExtension};
793 # Store GnuZipped image of file
794 $WWWramImage{$WWWfile} = "";
795 open(FILEIN, "<$WWWfile") || die "$WWWfile could not be opened: $!\n";
796 my $Buffer;
797 while(sysread(FILEIN, $Buffer, 1024))
799 $WWWramImage{$WWWfile} .= $Buffer;
801 # Apply compression
802 my $CompressedPtr = &$CompressRAMimage(\${WWWramImage{$WWWfile}});
803 $WWWramImage{$WWWfile} = $$CompressedPtr;
804 $TotalSize += length($WWWramImage{$WWWfile});
807 # Report size of Web RAM image
808 print STDERR "Total number of $TotalSize bytes read in memory image\n";
811 ################################################
813 # The RAM image of the web site has been #
814 # filled. #
816 ################################################
818 # Map HTTP request parameters to Environment variables
819 # HTTP request => Environment variable
820 my %HTTPtype = (
821 'content-length' => 'CONTENT_LENGTH', # Necessary for POST
822 'user-agent' => 'HTTP_USER_AGENT',
823 'accept' => 'HTTP_ACCEPT',
824 'content-type' => 'CONTENT_TYPE',
825 'auth-type' => 'AUTH_TYPE',
826 'ident' => 'REMOTE_IDENT',
827 'referer' => 'HTTP_REFERER',
828 'user' => 'REMOTE_USER',
829 'address' => 'REMOTE_ADDR',
830 'connection' => 'HTTP_CONNECTION',
831 'accept-language' => 'HTTP_ACCEPT_LANGUAGE',
832 'accept-encoding' => 'HTTP_ACCEPT_ENCODING',
833 'accept-charset' => 'HTTP_ACCEPT_CHARSET',
834 'host' => 'HTTP_HOST'
837 ###############################################################################
839 # Now we start with the real work. When there is a request, get the required #
840 # values and fork a child to service it. #
842 ###############################################################################
844 my @brood = ();
845 my %StartTime = (); # Start time of the children
846 my $child;
848 # When someone knocks on the door
849 for (;;)
851 my $paddr;
853 if(!($paddr = accept(Client,Server)) ) # Knock knock
855 exit 1; # This went terrribly wrong
858 # Fork to child and parent
859 if(($child =fork()) == 0)
861 # this is the child
862 my ($port,$iaddr) = sockaddr_in($paddr);
863 my $address = inet_ntoa($iaddr); # The IP address of the Client
864 # The following is EXTREMELY slow and generally unnecessary.
865 # Use -n or set $NONAME = 1; if you don't need it.
866 my $name = $NONAME ? '' : gethostbyaddr($iaddr,AF_INET);
867 my @Input = ();
870 # Before doing anything else, check whether the client should be
871 # served at all.
872 # Is IP addr on the list?
873 if(@RemoteAddr && !grep(/^\Q$address\E/, @RemoteAddr))
875 print STDERR "Reject $address $name\n";
876 exit 1;
878 # Is name on the list?
879 if(@RemoteHost && !grep(/\Q$name\E$/, @RemoteHost))
881 print STDERR "Reject $name $address\n";
882 exit 1;
886 # Grab a line without using buffered input... Important for
887 # Post methods since they have to read the Client input stream.
889 my $string = "";
890 my $ch = "";
891 my $HTTPlength = 0;
892 alarm 120 ; # prevent deadly spin if other end goes away
893 while(sysread(Client, $ch, 1)>0)
895 $string .= $ch;
896 ++$HTTPlength;
897 last if $HTTPlength > $MaxLength; # Protect against overflow
899 next if $ch eq "\r"; # skip <cr>
900 if($ch eq "\n")
902 last unless $string =~ /\S/; # stop if empty line
903 push (@Input, split(' ', $string)); # Collect input in list
904 $string = "";
907 alarm 0; # clear alarm
909 # Extract input arguments
910 my $method = shift(@Input);
911 my $Request = shift(@Input);
912 my $protocol = shift(@Input);
913 my ($Path, $QueryString) = split('\?', $Request);
915 # Get rest of Input
916 my $HTTPparameter;
917 my %HTTPtable = ();
918 while($HTTPparameter = lc(shift(@Input)))
920 chop($HTTPparameter);
921 $HTTPtable{$HTTPparameter} = "";
922 while(@Input && $Input[0] !~ /\:$/)
924 $HTTPtable{$HTTPparameter} .= " " if $HTTPtable{$HTTPparameter};
925 $HTTPtable{$HTTPparameter} .= shift(@Input);
928 # Host can get the :SERVER_PORT appended. Set the correct SERVER_PORT
929 # and remove it from the host.
930 if($HTTPtable{'host'})
932 # Store current port number
933 if($HTTPtable{'host'} =~ /\:(\d+)\s*$/)
935 $ENV{'SERVER_PORT'} = $1;
937 # Remove port number from host
938 $HTTPtable{'host'} =~ s/\:(\d+)\s*$//g;
941 # Translate the Aliases
942 $Path = GetAlias($Path);
944 # HTTP servers should always add the default path
945 $Path = $welcome if !$Path || $Path eq '/'; # The common default path
947 # Set fixed environment variables
948 $ENV{'PATH_INFO'} = "$Path";
949 $ENV{'QUERY_STRING'} = "$QueryString";
950 $ENV{'PATH_TRANSLATED'} = "$textroot$Path";
951 $ENV{'SERVER_PROTOCOL'} = "$protocol";
952 $ENV{'REQUEST_METHOD'} = "$method";
953 $ENV{'REMOTE_ADDR'} = "$address"; # The IP address of the Client
954 $ENV{'REMOTE_HOST'} = "$name";
956 # Load all request information in the %ENV.
957 # MUST be done with a pre-defined list of parameter names (security).
958 foreach $HTTPparameter (keys(%HTTPtype))
960 my $Label = $HTTPtype{$HTTPparameter};
961 # The following adds environment variables FROM THE REQUEST.
962 # It is a VERY, VERY bad idea to just use the client supplied
963 # parameter names!
964 $ENV{$Label} = $HTTPtable{$HTTPparameter} unless exists($ENV{$Label});
965 # (The last part prevents overwriting existing environment variables)
968 # SECURITY: Check length of POST request. Stop if request is too long
969 die if $HTTPlength + $ENV{'CONTENT_LENGTH'} > $MaxLength;
971 # If POST requests are unwanted, they can be added tot the query string
972 # NOTE: the method is set to GET if $POSTtoGET equals 'GET', otherwise,
973 # the method stays POST and only the content length is set to 0
974 if($POSTtoGET && $ENV{'REQUEST_METHOD'} =~ /^POST$/i)
976 my $POSTlength = $ENV{'CONTENT_LENGTH'} || 0;
977 my $ReadBytes = 1;
979 # Add '&' if there is a query string already
980 if($ENV{'QUERY_STRING'})
982 # Before we add something to the string, check length again
983 die if $HTTPlength + $ENV{'CONTENT_LENGTH'} + 1 > $MaxLength;
984 # Now add the '&'
985 $ENV{'QUERY_STRING'} .= '&';
988 # Read Client
989 while($POSTlength > 0 && $ReadBytes > 0)
991 my $Read = "";
992 $ReadBytes = sysread(Client, $Read, $POSTlength);
993 $ENV{'QUERY_STRING'} .= $Read;
994 $POSTlength -= $ReadBytes;
997 # All has been read, the content length becomes 0
998 $ENV{'CONTENT_LENGTH'} = 0;
999 # Method can change
1000 $ENV{'REQUEST_METHOD'} = 'GET' if $POSTtoGET eq 'GET';
1004 # Connect STDOUT and STDIN to the client
1005 open(STDIN, "<&Client");
1006 open(STDOUT, ">&Client");
1007 print STDOUT "HTTP/1.1 200 OK\n"; # Supply HTTP protocol information
1008 print STDOUT "Date: ".gmtime()." GMT\n"; # Current date
1009 print STDOUT "Server: $program $version\n"; # This program
1010 print STDOUT "Connection: close\n"; # Don't allow persistent connections
1012 # Start processing of request (note that ALL scripts will be executed if
1013 # present, i.e., if -d, -x, and -e are entered, they are alle processed).
1015 # If in memory-only mode, store the requested file in an environment
1016 # variable: CGI_FILE_CONTENTS
1017 undef($ENV{'CGI_FILE_CONTENTS'}); # Make sure the ENV var doesn't exist
1018 if($UseRAMimage)
1020 my $DecompressedPtr = &$DecompressRAMimage(\${WWWramImage{"$textroot$Path"}});
1021 $ENV{'CGI_FILE_CONTENTS'} = $$DecompressedPtr;
1022 # Decompression does not seem to work
1025 # do perl script
1026 @ARGV = ("$textroot$Path", $QueryString);
1027 # This was suggested by Jochen_Hayek@ACM.org
1028 if($doarg)
1030 # The perl script should do the printing
1031 my ($return) = do "$doarg";
1033 warn "couldn't parse $doarg: $@" if $@;
1034 warn "couldn't $doarg: $!" unless defined $return;
1035 warn "couldn't run $doarg" unless $return;
1038 # evaluate perl command
1039 print STDOUT eval($evalarg) if $evalarg;
1041 # execute shell command
1042 if($execarg)
1044 my $shellscript = $execarg;
1046 # Attempts to use Paths or Queries containing '-quotes are rejected.
1047 # Executing these would compromise security.
1048 die "Quotes in path: $textroot$Path\n" if "$textroot$Path" =~ /\'/;
1049 $shellscript .= " '$textroot$Path'" if $Path;
1051 die "Quotes in query: $QueryString\n" if $QueryString =~ /\'/;
1052 $shellscript .= " '$QueryString'" if $QueryString;
1053 $shellscript = qx{$shellscript};
1054 print STDOUT $shellscript;
1057 # Output files if no scripts are given (actually, this should be
1058 # handled by a script). Unknown mimetypes are killed.
1059 # This is more or less a functional (dynamic) Web server in itself.
1060 unless($doarg || $execarg || $evalarg) # Request not already handled
1062 die ".. trick: $address $name $Path $QueryString\n"
1063 if $Path =~ m@\.\./@ ; # No tricks!
1065 # Handle mime-types and directory browsing
1066 $Path =~ /\.([\w]+)$/; # Get extension
1067 my $extension = uc($1);
1068 my $browse = ($Path =~ m@/\s*$@ || -d "$textroot$Path") ? 1 : 0;
1069 my $mime = $browse ? "" : $mimeType{$extension};
1071 # Serve up text and binary files unless they the $Secure option is given
1072 $mime = "text/plain" if !$mime && !$browse && (-T "$textroot$Path") && !$Secure;
1073 $mime = "application/octet-stream" if !$mime && !$browse && (-B "$textroot$Path") && !$Secure;
1075 # Remove final / in directory paths
1076 $Path =~ s@/\s*$@@g;
1078 # Block illegal mime-types
1079 die "Illegal mime type:$extension\n" unless $mime || $browse; # illegal mime's are killed
1081 # Print out the document
1082 if(($mime eq 'text/osshell') && $ExecuteOSshell) # Don't use this unless you know what you're doing
1084 # Note that CGI scripts must supply their own content type
1085 # Some rudimentary security tests
1086 # Kill child if the path contains any non-URL characters
1087 die "ATTACK: ADDR:$ENV{'REMOTE_ADDR'} HOST:$ENV{'REMOTE_HOST'} URL=$Path '$QueryString'\n"
1088 if $Path =~ m@[^\w\-\.\/]@; # Exclusive list of allowed characters
1089 # If you want to execute server side shell scripts, use the 'text/osshell'
1090 # mime-type (see above) but remember that there is NO SECURITY implemented
1091 # whatsoever.
1092 # IF YOU DIDN'T GET THE MESSAGE YET, YOU COULD NOW OPEN YOUR COMPUTER TO THE WHOLE
1093 # INTERNET TO PLAY WITH!
1094 # Plain Web site from DISK
1095 unless($UseRAMimage)
1097 print STDOUT `$textroot$Path`; # This is Russian Roulette
1099 else # Use a RAM image of the web site
1101 my $ShellInterpreter = '/usr/bin/sh';
1102 if($ENV{'CGI_FILE_CONTENTS'} =~ /^\#\!\s*([^\r\n]+)/isg)
1104 $ShellInterpreter = $1;
1106 # Execute shell script
1107 open(RAMOUT, "| $ShellInterpreter") || die "ERROR open RAMOUT $ShellInterpreter $textroot$Path $! $@\n";
1108 (print RAMOUT $ENV{'CGI_FILE_CONTENTS'}) || die "ERROR print RAMOUT $ShellInterpreter $textroot$Path $! $@\n";
1109 close(RAMOUT);
1112 elsif($mime)
1114 # Content-type and document
1115 print STDOUT "Content-type: $mime\n\n";
1116 # Plain Web site from DISK
1117 unless($UseRAMimage)
1119 my $String = "";
1120 my $number_of_bytes = 0;
1121 open(BINARY, "<$textroot$Path") || die "$textroot$Path: $!";
1123 # read and write block of 1024 bytes
1124 while($number_of_bytes = sysread(BINARY, $String, 1024))
1126 syswrite(STDOUT, $String, $number_of_bytes); # Actually print the file content
1128 close(BINARY);
1130 # Alternative output using the UNIX shell
1131 # print STDOUT `cat '$textroot$Path'`; # lazy, let the OS do the work
1133 else # Use a RAM image of the web site
1135 print STDOUT $ENV{'CGI_FILE_CONTENTS'};
1139 elsif($browse && !$Secure) # Block directory browsing in the Secure setup
1141 # Content-type and document
1142 print STDOUT "Content-type: text/html\n\n";
1143 opendir(BROWSE, "$textroot$Path") || die "<$textroot$Path: $!\n";
1145 print "<HTML>\n<HEAD>\n<TITLE>$Path</TITLE></HEAD>\n<BODY>\n<H1>$Path</H1>\n<pre>\n<dl>";
1147 my $DirEntry;
1148 foreach $DirEntry (sort {lc($a) cmp lc($b)} readdir(BROWSE))
1150 my $CurrentPath = $Path;
1151 # Handle '..'
1152 if($DirEntry eq '..')
1154 my $ParentDir = $CurrentPath;
1155 $ParentDir =~ s@/[^/]+$@@g;
1156 $ParentDir = '/' unless $ParentDir;
1157 print "<dt> <a href='$ParentDir'><h3>Parent directory</h3></a></dt>\n";
1159 next if $DirEntry !~ /[^\.\/\\\:]/;
1161 # Get aliases
1162 my $Alias = GetAlias("$CurrentPath/$DirEntry");
1163 if($Alias ne "$CurrentPath/$DirEntry")
1165 $Alias =~ m@/([^/]+)$@;
1166 $CurrentPath = $`;
1167 $DirEntry = $1;
1170 my $Date = localtime($^T - (-M "$textroot$CurrentPath/$DirEntry")*3600*24);
1171 my $Size = -s "$textroot$CurrentPath/$DirEntry";
1172 $Size = sprintf("%6.0F kB", $Size/1024);
1173 my $Type = `file $textroot$CurrentPath/$DirEntry`;
1174 $Type =~ s@\s*$textroot$CurrentPath/$DirEntry\s*\:\s*@@ig;
1175 chomp($Type);
1176 print "<dt> <a href='$CurrentPath/$DirEntry'>";
1177 printf("%-40s", $DirEntry."</a>");
1178 print "\t$Size\t$Date\t$Type</dt>\n";
1180 close(BROWSE);
1181 print "</dl></pre></BODY>\n</HTML>\n";
1186 close(STDOUT) || die "STDOUT: $!\n";
1187 close(STDIN) || die "STDIN: $!\n";
1188 close(Client) || die "Client: $!\n";
1190 exit 0; # Kill Child
1192 else
1195 # parent code...some systems will have to worry about waiting
1196 # before they can actually close the link to the Client
1197 my $current_time = time();
1199 # Determine which of the children are actually still alive
1200 # and kill those that have run for too long (probably not connected anymore)
1201 my @old_brood = @brood;
1202 @brood = (); # empty brood
1203 foreach (@old_brood)
1205 # Kill the child if it runs for longer than MaxTime
1206 if(($StartTime{$_} - $current_time) > $MaxTime)
1208 kill "KILL", $_;
1211 # Store children that are alive
1212 if(kill (0, $_)) # Alive?
1214 push(@brood, $_);
1216 else
1218 delete($StartTime{$_});
1222 # Weed out overflow of children (zombies etc.), keep pid for
1223 # removing the StartTime later on
1224 my $oldest;
1225 for($oldest=0; $oldest < scalar(@brood)-$MaxBrood; ++$oldest)
1227 kill "KILL", $brood[$oldest] if $brood[$oldest]; # Remove
1230 # Child pid could be recycled, i.e., $child could be stored
1231 # in @brood already. Remove it
1232 @brood = grep($_ != $child, @brood);
1234 # Push new child on the list
1235 push (@brood, $child);
1236 $StartTime{$child} = $current_time;
1238 close Client; # This is it, ready!
1242 # Interupt handler for shutting down
1243 sub SigHandler
1245 my $sig = shift;
1246 exit 1;
1249 # Subroutine for Aliases
1250 # Uses Global variables: %AliasTranslation, @RegAliasTranslation, and @RegURLTranslation
1251 sub GetAlias # ($Path)->AliasURL
1253 my $Path = shift;
1255 # Translate the Aliases
1256 if($AliasTranslation{$Path})
1258 $Path = $AliasTranslation{$Path};
1260 elsif(@RegAliasTranslation)
1262 my $i;
1263 for($i=0; $i<scalar(@RegAliasTranslation); ++$i)
1265 my $Alias = $RegAliasTranslation[$i];
1266 my $RealURL = $RegURLTranslation[$i];
1267 last if ($Path =~ s#$Alias#$RealURL#g);
1270 return $Path;
1273 =head1 NAME
1275 CGIservlet - a HTTPd "connector" for running CGI scripts on unix systems as WWW
1276 accessible Web sites.
1278 =head1 DESCRIPTION
1280 The servlet starts a true HTTP daemon that channels
1281 HTTP requests to forked daughter processes. Can run
1282 a (small) WWW-site from memory.
1284 =head1 README
1286 Whenever an HTTP request is received, the specified CGI script is
1287 started inside a child process as if it was inside a real server (e.g.,
1288 Apache). The evironment variables are set more or less as in Apache.
1289 Note that CGIservlet only uses a SINGLE script for ALL requests.
1290 No attemps for security are made, it is the script's responsibility to
1291 check access rights and the validity of the request.
1292 Can store the files of Web site in memory and serve them
1293 on request.
1295 =head1 PREREQUISITES
1297 This script requires the C<strict>, Socket and Carp modules.
1299 =head1 COREQUISITES
1301 =pod OSNAMES
1303 Unix
1305 =pod SCRIPT CATEGORIES
1310 =cut