Add slow disk diagnosis to ZED
commitc1c26a77ff38770b80ed1c97aea867d3ad9bf6ee
authorDon Brady <don.brady@delphix.com>
Thu, 8 Feb 2024 17:19:52 +0000 (8 10:19 -0700)
committerTony Hutter <hutter2@llnl.gov>
Mon, 29 Apr 2024 20:50:05 +0000 (29 13:50 -0700)
tree694326890ff3acb5e4fc2b7122f438a9e02429ce
parentdb65272aef3d380d2bd1c94907826f2b9ec9205e
Add slow disk diagnosis to ZED

Slow disk response times can be indicative of a failing drive. ZFS
currently tracks slow I/Os (slower than zio_slow_io_ms) and generates
events (ereport.fs.zfs.delay).  However, no action is taken by ZED,
like is done for checksum or I/O errors.  This change adds slow disk
diagnosis to ZED which is opt-in using new VDEV properties:
  VDEV_PROP_SLOW_IO_N
  VDEV_PROP_SLOW_IO_T

If multiple VDEVs in a pool are undergoing slow I/Os, then it skips
the zpool_vdev_degrade().

Sponsored-By: OpenDrives Inc.
Sponsored-By: Klara Inc.
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Allan Jude <allan@klarasystems.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Co-authored-by: Rob Wing <rob.wing@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes #15469
29 files changed:
cmd/zed/agents/fmd_api.c
cmd/zed/agents/fmd_api.h
cmd/zed/agents/fmd_serd.c
cmd/zed/agents/fmd_serd.h
cmd/zed/agents/zfs_diagnosis.c
cmd/zed/agents/zfs_retire.c
cmd/zinject/zinject.c
cmd/zpool/zpool_main.c
include/sys/fm/fs/zfs.h
include/sys/fs/zfs.h
include/sys/vdev_impl.h
lib/libzfs/libzfs.abi
lib/libzfs/libzfs_pool.c
lib/libzfs/libzfs_util.c
man/man7/vdevprops.7
man/man7/zpoolconcepts.7
man/man8/zinject.8
module/zcommon/zpool_prop.c
module/zfs/vdev.c
module/zfs/zfs_fm.c
module/zfs/zio_inject.c
tests/runfiles/linux.run
tests/zfs-tests/tests/Makefile.am
tests/zfs-tests/tests/functional/cli_root/zpool_get/vdev_get.cfg
tests/zfs-tests/tests/functional/events/cleanup.ksh
tests/zfs-tests/tests/functional/events/zed_slow_io.ksh [new file with mode: 0755]
tests/zfs-tests/tests/functional/events/zed_slow_io_many_vdevs.ksh [new file with mode: 0755]
tests/zfs-tests/tests/functional/fault/cleanup.ksh
tests/zfs-tests/tests/functional/fault/setup.ksh