usbi_handle_disconnect: Fix race condition leading to double completion
It took me quite a while to debug this, here is a step by step for the race
which I believe is happening in some cases:
1) app calls libusb_submit_transfer
2) libusb_submit_transfer locks itransfer->lock
3) libusb_submit_transfer adds the transfer to the flying list
4) *thread switch*
5) other thread notices POLL_ERR on device fd, calls usbi_handle_disconnect
6) usbi_handle_disconnect find the transfer which is in progress of being
submitted in the flying list
7) usbi_handle_disconnect calls usbi_backend->clear_transfer_priv on the
transfer, this blocks waiting on itransfer->lock
8) *thread switch*
9) libusb_submit_transfer actually tries to submit the transfer now,
calls usbi_backend->submit_transfer, which fails with -ENODEV
10) libusb_submit_transfer *removes* the transfer from the flying list,
unlocks itransfer->lock and returns an error to its caller
11) the caller frees the transfer, meaning the to_cancel pointer in
usbi_handle_disconnect now points to free-ed memory, for extra mayhem
12) *thread switch*
13) usbi_handle_disconnect calls usbi_handle_transfer_completion
14) usbi_handle_transfer_completion tries to remove the transfer from
the flying list *for the 2nd time*
But the first call done from libusb_submit_transfer has already done
this. libusb's list_del looks like this:
static inline void list_del(struct list_head *entry)
{
entry->next->prev = entry->prev;
entry->prev->next = entry->next;
entry->next = entry->prev = NULL;
}
So the first call sets it next and prev to NULL, and then the 2nd call
tries to deref next -> BOOM
For an example backtrace caused by this, see:
https://bugs.freedesktop.org/show_bug.cgi?id=55619#c7
This patch fixes this by letting libusb_submit keep the flying transfers list
locked during submission, so the submission flow changes from:
1) lock flying transfers
add to flying transfers
unlock
2) submit
3) on submission error:
lock flying transfers
remove from flying transfers
unlock
to:
1) lock flying transfers
2) add to flying transfers
3) submit
4) on submission error:
remove from flying transfers
5) unlock
This means that the os backends submit handler now gets called with the
flying transfers lock held! I've looked at all the backends and this should
not be a problem. Only the windows and win-ce backends care about the
flying transfers list at all, and then only in their handle_events handler.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>