Documentation/driver-api/ioctl.rst

   1 ======================
   2 ioctl based interfaces
   3 ======================
   4
   5 ioctl() is the most common way for applications to interface
   6 with device drivers. It is flexible and easily extended by adding new
   7 commands and can be passed through character devices, block devices as
   8 well as sockets and other special file descriptors.
   9
  10 However, it is also very easy to get ioctl command definitions wrong,
  11 and hard to fix them later without breaking existing applications,
  12 so this documentation tries to help developers get it right.
  13
  14 Command number definitions
  15 ==========================
  16
  17 The command number, or request number, is the second argument passed to
  18 the ioctl system call. While this can be any 32-bit number that uniquely
  19 identifies an action for a particular driver, there are a number of
  20 conventions around defining them.
  21
  22 ``include/uapi/asm-generic/ioctl.h`` provides four macros for defining
  23 ioctl commands that follow modern conventions: ``_IO``, ``_IOR``,
  24 ``_IOW``, and ``_IOWR``. These should be used for all new commands,
  25 with the correct parameters:
  26
  27 _IO/_IOR/_IOW/_IOWR
  28    The macro name specifies how the argument will be used.  It may be a
  29    pointer to data to be passed into the kernel (_IOW), out of the kernel
  30    (_IOR), or both (_IOWR).  _IO can indicate either commands with no
  31    argument or those passing an integer value instead of a pointer.
  32    It is recommended to only use _IO for commands without arguments,
  33    and use pointers for passing data.
  34
  35 type
  36    An 8-bit number, often a character literal, specific to a subsystem
  37    or driver, and listed in Documentation/userspace-api/ioctl/ioctl-number.rst
  38
  39 nr
  40   An 8-bit number identifying the specific command, unique for a give
  41   value of 'type'
  42
  43 data_type
  44   The name of the data type pointed to by the argument, the command number
  45   encodes the ``sizeof(data_type)`` value in a 13-bit or 14-bit integer,
  46   leading to a limit of 8191 bytes for the maximum size of the argument.
  47   Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as that
  48   will lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t).
  49   _IO does not have a data_type parameter.
  50
  51
  52 Interface versions
  53 ==================
  54
  55 Some subsystems use version numbers in data structures to overload
  56 commands with different interpretations of the argument.
  57
  58 This is generally a bad idea, since changes to existing commands tend
  59 to break existing applications.
  60
  61 A better approach is to add a new ioctl command with a new number. The
  62 old command still needs to be implemented in the kernel for compatibility,
  63 but this can be a wrapper around the new implementation.
  64
  65 Return code
  66 ===========
  67
  68 ioctl commands can return negative error codes as documented in errno(3);
  69 these get turned into errno values in user space. On success, the return
  70 code should be zero. It is also possible but not recommended to return
  71 a positive 'long' value.
  72
  73 When the ioctl callback is called with an unknown command number, the
  74 handler returns either -ENOTTY or -ENOIOCTLCMD, which also results in
  75 -ENOTTY being returned from the system call. Some subsystems return
  76 -ENOSYS or -EINVAL here for historic reasons, but this is wrong.
  77
  78 Prior to Linux 5.5, compat_ioctl handlers were required to return
  79 -ENOIOCTLCMD in order to use the fallback conversion into native
  80 commands. As all subsystems are now responsible for handling compat
  81 mode themselves, this is no longer needed, but it may be important to
  82 consider when backporting bug fixes to older kernels.
  83
  84 Timestamps
  85 ==========
  86
  87 Traditionally, timestamps and timeout values are passed as ``struct
  88 timespec`` or ``struct timeval``, but these are problematic because of
  89 incompatible definitions of these structures in user space after the
  90 move to 64-bit time_t.
  91
  92 The ``struct __kernel_timespec`` type can be used instead to be embedded
  93 in other data structures when separate second/nanosecond values are
  94 desired, or passed to user space directly. This is still not ideal though,
  95 as the structure matches neither the kernel's timespec64 nor the user
  96 space timespec exactly. The get_timespec64() and put_timespec64() helper
  97 functions can be used to ensure that the layout remains compatible with
  98 user space and the padding is treated correctly.
  99
 100 As it is cheap to convert seconds to nanoseconds, but the opposite
 101 requires an expensive 64-bit division, a simple __u64 nanosecond value
 102 can be simpler and more efficient.
 103
 104 Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,
 105 as returned by ktime_get_ns() or ktime_get_ts64().  Unlike
 106 CLOCK_REALTIME, this makes the timestamps immune from jumping backwards
 107 or forwards due to leap second adjustments and clock_settime() calls.
 108
 109 ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps that
 110 need to be persistent across a reboot or between multiple machines.
 111
 112 32-bit compat mode
 113 ==================
 114
 115 In order to support 32-bit user space running on a 64-bit machine, each
 116 subsystem or driver that implements an ioctl callback handler must also
 117 implement the corresponding compat_ioctl handler.
 118
 119 As long as all the rules for data structures are followed, this is as
 120 easy as setting the .compat_ioctl pointer to a helper function such as
 121 compat_ptr_ioctl() or blkdev_compat_ptr_ioctl().
 122
 123 compat_ptr()
 124 ------------
 125
 126 On the s390 architecture, 31-bit user space has ambiguous representations
 127 for data pointers, with the upper bit being ignored. When running such
 128 a process in compat mode, the compat_ptr() helper must be used to
 129 clear the upper bit of a compat_uptr_t and turn it into a valid 64-bit
 130 pointer.  On other architectures, this macro only performs a cast to a
 131 ``void __user *`` pointer.
 132
 133 In an compat_ioctl() callback, the last argument is an unsigned long,
 134 which can be interpreted as either a pointer or a scalar depending on
 135 the command. If it is a scalar, then compat_ptr() must not be used, to
 136 ensure that the 64-bit kernel behaves the same way as a 32-bit kernel
 137 for arguments with the upper bit set.
 138
 139 The compat_ptr_ioctl() helper can be used in place of a custom
 140 compat_ioctl file operation for drivers that only take arguments that
 141 are pointers to compatible data structures.
 142
 143 Structure layout
 144 ----------------
 145
 146 Compatible data structures have the same layout on all architectures,
 147 avoiding all problematic members:
 148
 149 * ``long`` and ``unsigned long`` are the size of a register, so
 150   they can be either 32-bit or 64-bit wide and cannot be used in portable
 151   data structures. Fixed-length replacements are ``__s32``, ``__u32``,
 152   ``__s64`` and ``__u64``.
 153
 154 * Pointers have the same problem, in addition to requiring the
 155   use of compat_ptr(). The best workaround is to use ``__u64``
 156   in place of pointers, which requires a cast to ``uintptr_t`` in user
 157   space, and the use of u64_to_user_ptr() in the kernel to convert
 158   it back into a user pointer.
 159
 160 * On the x86-32 (i386) architecture, the alignment of 64-bit variables
 161   is only 32-bit, but they are naturally aligned on most other
 162   architectures including x86-64. This means a structure like::
 163
 164     struct foo {
 165         __u32 a;
 166         __u64 b;
 167         __u32 c;
 168     };
 169
 170   has four bytes of padding between a and b on x86-64, plus another four
 171   bytes of padding at the end, but no padding on i386, and it needs a
 172   compat_ioctl conversion handler to translate between the two formats.
 173
 174   To avoid this problem, all structures should have their members
 175   naturally aligned, or explicit reserved fields added in place of the
 176   implicit padding. The ``pahole`` tool can be used for checking the
 177   alignment.
 178
 179 * On ARM OABI user space, structures are padded to multiples of 32-bit,
 180   making some structs incompatible with modern EABI kernels if they
 181   do not end on a 32-bit boundary.
 182
 183 * On the m68k architecture, struct members are not guaranteed to have an
 184   alignment greater than 16-bit, which is a problem when relying on
 185   implicit padding.
 186
 187 * Bitfields and enums generally work as one would expect them to,
 188   but some properties of them are implementation-defined, so it is better
 189   to avoid them completely in ioctl interfaces.
 190
 191 * ``char`` members can be either signed or unsigned, depending on
 192   the architecture, so the __u8 and __s8 types should be used for 8-bit
 193   integer values, though char arrays are clearer for fixed-length strings.
 194
 195 Information leaks
 196 =================
 197
 198 Uninitialized data must not be copied back to user space, as this can
 199 cause an information leak, which can be used to defeat kernel address
 200 space layout randomization (KASLR), helping in an attack.
 201
 202 For this reason (and for compat support) it is best to avoid any
 203 implicit padding in data structures.  Where there is implicit padding
 204 in an existing structure, kernel drivers must be careful to fully
 205 initialize an instance of the structure before copying it to user
 206 space.  This is usually done by calling memset() before assigning to
 207 individual members.
 208
 209 Subsystem abstractions
 210 ======================
 211
 212 While some device drivers implement their own ioctl function, most
 213 subsystems implement the same command for multiple drivers.  Ideally the
 214 subsystem has an .ioctl() handler that copies the arguments from and
 215 to user space, passing them into subsystem specific callback functions
 216 through normal kernel pointers.
 217
 218 This helps in various ways:
 219
 220 * Applications written for one driver are more likely to work for
 221   another one in the same subsystem if there are no subtle differences
 222   in the user space ABI.
 223
 224 * The complexity of user space access and data structure layout is done
 225   in one place, reducing the potential for implementation bugs.
 226
 227 * It is more likely to be reviewed by experienced developers
 228   that can spot problems in the interface when the ioctl is shared
 229   between multiple drivers than when it is only used in a single driver.
 230
 231 Alternatives to ioctl
 232 =====================
 233
 234 There are many cases in which ioctl is not the best solution for a
 235 problem. Alternatives include:
 236
 237 * System calls are a better choice for a system-wide feature that
 238   is not tied to a physical device or constrained by the file system
 239   permissions of a character device node
 240
 241 * netlink is the preferred way of configuring any network related
 242   objects through sockets.
 243
 244 * debugfs is used for ad-hoc interfaces for debugging functionality
 245   that does not need to be exposed as a stable interface to applications.
 246
 247 * sysfs is a good way to expose the state of an in-kernel object
 248   that is not tied to a file descriptor.
 249
 250 * configfs can be used for more complex configuration than sysfs
 251
 252 * A custom file system can provide extra flexibility with a simple
 253   user interface but adds a lot of complexity to the implementation.