usr/src/man/man5/byteorder.5

   1 .\"
   2 .\" This file and its contents are supplied under the terms of the
   3 .\" Common Development and Distribution License ("CDDL"), version 1.0.
   4 .\" You may only use this file in accordance with the terms of version
   5 .\" 1.0 of the CDDL.
   6 .\"
   7 .\" A full copy of the text of the CDDL should have accompanied this
   8 .\" source.  A copy of the CDDL is also available via the Internet at
   9 .\" http://www.illumos.org/license/CDDL.
  10 .\"
  11 .\"
  12 .\" Copyright 2016 Joyent, Inc.
  13 .\"
  14 .Dd January 31, 2016
  15 .Dt BYTEORDER 5
  16 .Os
  17 .Sh NAME
  18 .Nm byteorder ,
  19 .Nm endian
  20 .Nd byte order and endianness
  21 .Sh DESCRIPTION
  22 Integer values which occupy more than 1 byte in memory can be laid out
  23 in different ways on different platforms.
  24 In particular, there is a major split between those which place the least
  25 significant byte of an integer at the lowest address, and those which place the
  26 most significant byte there instead.
  27 As this difference relates to which end of the integer is found in memory first,
  28 the term
  29 .Em endian
  30 is used to refer to a particular byte order.
  31 .Pp
  32 A platform is referred to as using a
  33 .Em big-endian
  34 byte order when it places the most significant byte at the lowest
  35 address, and
  36 .Em little-endian
  37 when it places the least significant byte first.
  38 Some platforms may also switch between big- and little-endian mode and run code
  39 compiled for either.
  40 .Pp
  41 Historically, there have also been some systems that utilized
  42 .Em middle-endian
  43 byte orders for integers larger than 2 bytes.
  44 Such orderings are not in common use today.
  45 .Pp
  46 Endianness is also of particular importance when dealing with values
  47 that are being read into memory from an external source.
  48 For example, network protocols such as IP conventionally define the fields in a
  49 packet as being always stored in big-endian byte order.
  50 This means that a little-endian machine will have to perform transformations on
  51 these fields in order to process them.
  52 .Ss Examples
  53 To illustrate endianness in memory, let us consider the decimal integer
  54 2864434397.
  55 This number fits in 32 bits of storage (4 bytes).
  56 .Pp
  57 On a big-endian system, this integer would be written into memory as
  58 the bytes 0xAA, 0xBB, 0xCC, 0xDD, in order from lowest memory address to
  59 highest.
  60 .Pp
  61 On a little-endian system, it would be written instead as the bytes
  62 0xDD, 0xCC, 0xBB, 0xAA, in that order.
  63 .Pp
  64 If both the big- and little-endian systems were asked to store this
  65 integer at address 0x100, we would see the following in each of their
  66 memory:
  67 .Bd -literal
  68
  69                     Big-Endian
  70
  71         ++------++------++------++------++
  72         || 0xAA || 0xBB || 0xCC || 0xDD ||
  73         ++------++------++------++------++
  74             ^^      ^^      ^^      ^^
  75           0x100   0x101   0x102   0x103
  76             vv      vv      vv      vv
  77         ++------++------++------++------++
  78         || 0xDD || 0xCC || 0xBB || 0xAA ||
  79         ++------++------++------++------++
  80
  81                   Little-Endian
  82 .Ed
  83 .Pp
  84 It is particularly important to note that even though the byte order is
  85 different between these two machines, the bit ordering within each byte,
  86 by convention, is still the same.
  87 .Pp
  88 For example, take the decimal integer 4660, which occupies in 16 bits (2
  89 bytes).
  90 .Pp
  91 On a big-endian system, this would be written into memory as 0x12, then
  92 0x34.
  93 .Pp
  94 On a little-endian system, it would be written as 0x34, then 0x12.
  95 Note that this is not at all the same as seeing 0x43 then 0x21 in memory --
  96 only the bytes are re-ordered, not any bits (or nybbles) within them.
  97 .Pp
  98 As before, storing this at address 0x100:
  99 .Bd -literal
 100                     Big-Endian
 101
 102                 ++------++------++
 103                 || 0x12 || 0x34 ||
 104                 ++------++------++
 105                     ^^      ^^
 106                   0x100   0x101
 107                     vv      vv
 108                 ++------++------++
 109                 || 0x34 || 0x12 ||
 110                 ++------++------++
 111
 112                    Little-Endian
 113 .Ed
 114 .Pp
 115 This example shows how an eight byte number, 0xBADCAFEDEADBEEF is stored
 116 in both big and little-endian:
 117 .Bd -literal
 118                         Big-Endian
 119
 120     +------+------+------+------+------+------+------+------+
 121     | 0xBA | 0xDC | 0xAF | 0xFE | 0xDE | 0xAD | 0xBE | 0xEF |
 122     +------+------+------+------+------+------+------+------+
 123        ^^     ^^     ^^     ^^     ^^     ^^     ^^     ^^
 124      0x100  0x101  0x102  0x103  0x104  0x105  0x106  0x107
 125        vv     vv     vv     vv     vv     vv     vv     vv
 126     +------+------+------+------+------+------+------+------+
 127     | 0xEF | 0xBE | 0xAD | 0xDE | 0xFE | 0xAF | 0xDC | 0xBA |
 128     +------+------+------+------+------+------+------+------+
 129
 130                        Little-Endian
 131
 132 .Ed
 133 .Pp
 134 The treatment of different endian values would not be complete without
 135 discussing
 136 .Em PDP-endian ,
 137 which is also known as
 138 .Em middle-endian .
 139 While the PDP-11 was a 16-bit little-endian system, it laid out 32-bit
 140 values in a different way from current little-endian systems.
 141 First, it would divide a 32-bit number into two 16-bit numbers.
 142 Each 16-bit number would be stored in little-endian; however, the two 16-bit
 143 words would be stored with the larger 16-bit word appearing first in memory,
 144 followed by the latter.
 145 .Pp
 146 The following image illustrates PDP-endian and compares it against
 147 little-endian values.
 148 Here, we'll start with the value 0xAABBCCDD and show how the four bytes for it
 149 will be laid out, starting at 0x100.
 150 .Bd -literal
 151                     PDP-Endian
 152
 153         ++------++------++------++------++
 154         || 0xBB || 0xAA || 0xDD || 0xCC ||
 155         ++------++------++------++------++
 156             ^^      ^^      ^^      ^^
 157           0x100   0x101   0x102   0x103
 158             vv      vv      vv      vv
 159         ++------++------++------++------++
 160         || 0xDD || 0xCC || 0xBB || 0xAA ||
 161         ++------++------++------++------++
 162
 163                   Little-Endian
 164
 165 .Ed
 166 .Ss Network Byte Order
 167 The term 'network byte order' refers to big-endian ordering, and
 168 originates from the IEEE.
 169 Early disagreements over which byte ordering to use for network traffic prompted
 170 RFC1700 to define that all IETF-specified network protocols use big-endian
 171 ordering unless noted explicitly otherwise.
 172 The Internet protocol family (IP, and thus TCP and UDP etc) particularly adhere
 173 to this convention.
 174 .Ss Determining the System's Byte Order
 175 The operating system supports both big-endian and little-endian CPUs.
 176 To make it easier for programs to determine the endianness of the platform they
 177 are being compiled for, functions and macro constants are provided in the system
 178 header files.
 179 .Pp
 180 The endianness of the system can be obtained by including the header
 181 .In sys/types.h
 182 and using the pre-processor macros
 183 .Sy _LITTLE_ENDIAN
 184 and
 185 .Sy _BIG_ENDIAN .
 186 See
 187 .Xr types.h 3HEAD
 188 for more information.
 189 .Pp
 190 Additionally, the header
 191 .In endian.h
 192 defines an alternative means for determining the endianness of the
 193 current system.
 194 See
 195 .Xr endian.h 3HEAD
 196 for more information.
 197 .Pp
 198 illumos runs on both big- and little-endian systems.
 199 When writing software for which the endianness is important, one must always
 200 check the byte order and convert it appropriately.
 201 .Ss Converting Between Byte Orders
 202 The system provides two different sets of functions to convert values
 203 between big-endian and little-endian.
 204 They are defined in
 205 .Xr byteorder 3C
 206 and
 207 .Xr endian 3C .
 208 .Pp
 209 The
 210 .Xr byteorder 3SOCKET
 211 family of functions convert data between the host's native byte order
 212 and big- or little-endian.
 213 The functions operate on either 16-bit, 32-bit, or 64-bit values.
 214 Functions that convert from network byte order to the host's byte order
 215 start with the string
 216 .Sy ntoh ,
 217 while functions which convert from the host's byte order to network byte
 218 order, begin with
 219 .Sy hton .
 220 For example, to convert a 32-bit value, a long, from network byte order
 221 to the host's, one would use the function
 222 .Xr ntohl 3SOCKET .
 223 .Pp
 224 These functions have been standardized by POSIX.
 225 However, the 64-bit variants,
 226 .Xr ntohll 3SOCKET
 227 and
 228 .Xr htonll 3SOCKET
 229 are not standardized and may not be found on other systems.
 230 For more information on these functions, see
 231 .Xr byteorder 3SOCKET .
 232 .Pp
 233 The second family of functions,
 234 .Xr endian 3C ,
 235 provide a means to convert between the host's byte order
 236 and big-endian and little-endian specifically.
 237 While these functions are similar to those in
 238 .Xr byteorder 3C ,
 239 they more explicitly cover different data conversions.
 240 Like them, these functions operate on either 16-bit, 32-bit, or 64-bit values.
 241 When converting from big-endian, to the host's endianness, the functions
 242 begin with
 243 .Sy betoh .
 244 If instead, one is converting data from the host's native endianness to
 245 another, then it starts with
 246 .Sy htobe .
 247 When working with little-endian data, the prefixes
 248 .Sy letoh
 249 and
 250 .Sy htole
 251 convert little-endian data to the host's endianness and from the host's
 252 to little-endian respectively.
 253 .Pp
 254 These functions are not standardized and the header they appear in varies
 255 between the BSDs and GNU/Linux.
 256 Applications that wish to be portable, shoulda instead use the
 257 .Xr byteorder 3C
 258 functions.
 259 .Pp
 260 All of these functions in both families simply return their input when
 261 the host's native byte order is the same as the desired order.
 262 For example, when calling
 263 .Xr htonl 3SOCKET
 264 on a big-endian system the original data is returned with no conversion
 265 or modification.
 266 .Sh SEE ALSO
 267 .Xr endian 3C ,
 268 .Xr endian.h 3HEAD ,
 269 .Xr inet 3HEAD ,
 270 .Xr byteorder 3SOCKET