4 this program is inspired by the functionality of GNU parallel, but tries
5 to keep low overhead and follow the UNIX philosophy of doing one thing well.
10 basically, it works by processing stdin, launching one process per line.
11 the actual line can be passed to the started program as an argv.
12 this allows for easy parallelization of standard unix tasks.
14 it is possible to save the current processed line, so when the task is killed
15 it can be continued later.
20 you have a list of things, and a tool that processes a single thing.
22 cat things.list | jobflow -threads=8 -exec ./mytask {}
24 seq 100 | jobflow -threads=100 -exec echo {}
26 cat urls.txt | jobflow -threads=32 -exec wget {}
28 find . -name '*.bmp' | jobflow -threads=8 -exec bmp2jpeg {.}.bmp {.}.jpg
30 run jobflow without arguments to see a list of possible command line options,
31 and argument permutations.
33 Comparison with GNU parallel
34 ----------------------------
36 GNU parallel is written in perl, which has the following disadvantages:
37 - requires a perl installation
38 even though most people already have perl installed anyway, installing it
39 just for this purpose requires up to 50 MB storage (and potentially up to
40 several hours of time to compile it from source on slow devices)
41 - requires a lot of time on startup (parsing sources, etc)
42 - requires a lot of memory (typically between 5-60 MB)
44 jobflow OTOH is written in C, which has numerous advantages.
45 - once compiled to a tiny static binary, can be used without 3rd party stuff
46 - very little and constant memory usage (typically a few KB)
48 - much higher execution speed
50 apart from the chosen language and related performance differences, the
51 following other differences exist between GNU parallel and jobflow:
53 + supports rlimits passed to started processes
54 - doesn't support ssh (usage of remote cpus)
55 - doesn't support all kinds of argument permutations:
56 while GNU parallel has a rich set of options to permute the input,
57 this doesn't adhere to the UNIX philosophy.
58 jobflow can achieve the same result by passing the unmodified input
59 to a user-created script that does the required permutations with other
62 available command line options
63 ------------------------------
65 -skip N -threads N -resume -statefile=/tmp/state -delayedflush
66 -delayedspinup N -buffered -joinoutput -limits mem=16M,cpu=10
72 N=number of entries to skip
73 -threads N (alternative: -j N)
75 N=number of parallel processes to spawn
78 resume from last jobnumber stored in statefile
81 use XXX as the EOF marker on stdin
82 if the marker is encountered, behave as if stdin was closed
83 not compatible with pipe/bulk mode
87 saves last launched jobnumber into a file
90 only write to statefile whenever all processes are busy,
94 N=maximum amount of milliseconds
95 ...to wait when spinning up a fresh set of processes
96 a random value between 0 and the chosen amount is used to delay initial
98 this can be handy to circumvent an I/O lockdown because of a burst of
99 activity on program startup
103 store the stdout and stderr of launched processes into a temporary file
104 which will be printed after a process has finished.
105 this prevents mixing up of output of different processes.
108 if -buffered, write both stdout and stderr into the same file.
109 this saves the chronological order of the output, and the combined output
110 will only be printed to stdout.
113 do bulk copies with a buffer of N bytes. only usable in pipe mode.
114 this passes (almost) the entire buffer to the next scheduled job.
115 the passed buffer will be truncated to the last line break boundary,
116 so jobs always get entire lines to work with.
117 this option is useful when you have huge input files and relatively short
118 task runtimes. by using it, syscall overhead can be reduced to a minimum.
119 N must be a multiple of 4KB. the suffixes G/M/K are detected.
120 actual memory allocation will be twice the amount passed.
121 note that pipe buffer size is limited to 64K on linux, so anything higher
122 than that probably doesn't make sense.
123 -limits [mem=N,cpu=N,stack=N,fsize=N,nofiles=N]
125 sets the rlimit of the new created processes.
126 see "man setrlimit" for an explanation. the suffixes G/M/K are detected.
127 -exec command with args
129 everything past -exec is treated as the command to execute on each line of
130 stdin received. the line can be passed as an argument using {}.
131 {.} passes everything before the last dot in a line as an argument.
132 it is possible to use multiple substitutions inside a single argument,
133 but currently only of one type.
134 if -exec is omitted, input will merely be dumped to stdout (like cat).
142 you may override variables used in the Makefile and set optimization
143 CFLAGS and similar thing using a file called `config.mak`, e.g.:
145 echo "CFLAGS=-O2 -g" > config.mak