Zpool can start allocating from metaslab before TRIMs have completed
commit2bba9fd479f5dce01df31bceb532c5a9e9d5c5ca
authorJason King <jasonbking@users.noreply.github.com>
Thu, 12 Oct 2023 18:01:54 +0000 (12 13:01 -0500)
committerBrian Behlendorf <behlendorf1@llnl.gov>
Thu, 12 Oct 2023 18:05:20 +0000 (12 11:05 -0700)
tree92c198529439c1bc3d5c27c3cdb170c14dd33b68
parent30ee2ee8ecabe75a5a011e2355747114df7f7bee
Zpool can start allocating from metaslab before TRIMs have completed

When doing a manual TRIM on a zpool, the metaslab being TRIMmed is
potentially re-enabled before all queued TRIM zios for that metaslab
have completed. Since TRIM zios have the lowest priority, it is
possible to get into a situation where allocations occur from the
just re-enabled metaslab and cut ahead of queued TRIMs to the same
metaslab.  If the ranges overlap, this will cause corruption.

We were able to trigger this pretty consistently with a small single
top-level vdev zpool (i.e. small number of metaslabs) with heavy
parallel write activity while performing a manual TRIM against a
somewhat 'slow' device (so TRIMs took a bit of time to complete).
With the patch, we've not been able to recreate it since. It was on
illumos, but inspection of the OpenZFS trim code looks like the
relevant pieces are largely unchanged and so it appears it would be
vulnerable to the same issue.

Reviewed-by: Igor Kozhukhov <igor@dilos.org>
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Jason King <jking@racktopsystems.com>
Illumos-issue: https://www.illumos.org/issues/15939
Closes #15395
module/zfs/vdev_trim.c