prevent reqid mismatches (and queryworker death)
On certain errors, the queryworker may send two "ERR" responses, causing
the ProcManager to terminate the queryworker upon reading the second
response if the queryworker is immediately fed another query.
This can affect busy setups, but is also easy to reproduce with a single
queryworker that's receiving a pipelined request to an
invalid/non-existent domain:
(
printf 'list_keys domain=\r\nlist_keys domain=\r\n'
sleep 2
) | socat - TCP:127.0.0.1:7001
The queryworker strace will look like this (writing 4 lines):
write(14, "4981-1 0.0005 ERR no_domain No+domain+provided\r\n", 48) = 48
write(14, "4981-1 ERR domain_not_found Domain+not+found\r\n", 46) = 46
write(14, "4981-2 0.0005 ERR no_domain No+domain+provided\r\n", 48) = 48
write(14, "4981-2 ERR domain_not_found Domain+not+found\r\n", 46) = 46
And a message like this will appear for "!watch" users:
Worker responded with id <undef> (line: [4981-1 ERR domain_not_found Domain+not+found]), but expected id 4981-2, killing
This is because ProgManager immediately calls NoteIdleQueryWorker upon
writing the first ERR response to the client (at the end of
HandleQueryWorkerResponse). This means the idle query worker may
immediately start processing a second request before the ProcManager has
a chance to process the second ERR response line (from the first
request).
Preventing err_line() from calling send_to_parent() with "ERR"
if querystarttime is undef prevents this issue, but there may be
better ways to fix this bug. A similar, preventative fix may be
appropriate for ok_line().