use blocking wait() instead of sleeping
when jobflow was created, it was mainly used for network jobs whose
runtimes were dominated by network latency, so it went unnoticed that
the polling loop with an arbitrary sleeptime is a very suboptimal design
especially when running many short-running jobs with a small process count.
an example is bulk conversion of the posix manpages package[0] with gzip.
converting the contained ~1100 manpages with a single-process shell script
took like 0.7s, but jobflow with -threads=1 wasted a horrible 25 seconds
for the same task.
the old code looped through all jobs and tested each one in a non-blocking
way wether it terminated yet, then went to sleep.
now we just wait until any subprocess terminates.
this has the desired effect that we never sleep a nanosecond longer than
necessary, and indeed the gzip conversion with a single thread now takes
exactly the same time than the shell script, and with two threads (and
2 physical CPUs) takes exactly the half of the time.
[0] http://www.kernel.org/pub/linux/docs/man-pages/man-pages-posix/