manual/io.texi

   1 @node I/O Overview, I/O on Streams, Pattern Matching, Top
   2 @chapter Input/Output Overview
   3
   4 Most programs need to do either input (reading data) or output (writing
   5 data), or most frequently both, in order to do anything useful.  The GNU
   6 C library provides such a large selection of input and output functions
   7 that the hardest part is often deciding which function is most
   8 appropriate!
   9
  10 This chapter introduces concepts and terminology relating to input
  11 and output.  Other chapters relating to the GNU I/O facilities are:
  12
  13 @itemize @bullet
  14 @item
  15 @ref{I/O on Streams}, which covers the high-level functions
  16 that operate on streams, including formatted input and output.
  17
  18 @item
  19 @ref{Low-Level I/O}, which covers the basic I/O and control
  20 functions on file descriptors.
  21
  22 @item
  23 @ref{File System Interface}, which covers functions for operating on
  24 directories and for manipulating file attributes such as access modes
  25 and ownership.
  26
  27 @item
  28 @ref{Pipes and FIFOs}, which includes information on the basic interprocess
  29 communication facilities.
  30
  31 @item
  32 @ref{Sockets}, covering a more complicated interprocess communication
  33 facility with support for networking.
  34
  35 @item
  36 @ref{Low-Level Terminal Interface}, which covers functions for changing
  37 how input and output to terminal or other serial devices are processed.
  38 @end itemize
  39
  40
  41 @menu
  42 * I/O Concepts::       Some basic information and terminology.
  43 * File Names::         How to refer to a file.
  44 @end menu
  45
  46 @node I/O Concepts, File Names,  , I/O Overview
  47 @section Input/Output Concepts
  48
  49 Before you can read or write the contents of a file, you must establish
  50 a connection or communications channel to the file.  This process is
  51 called @dfn{opening} the file.  You can open a file for reading, writing,
  52 or both.
  53 @cindex opening a file
  54
  55 The connection to an open file is represented either as a stream or as a
  56 file descriptor.  You pass this as an argument to the functions that do
  57 the actual read or write operations, to tell them which file to operate
  58 on.  Certain functions expect streams, and others are designed to
  59 operate on file descriptors.
  60
  61 When you have finished reading to or writing from the file, you can
  62 terminate the connection by @dfn{closing} the file.  Once you have
  63 closed a stream or file descriptor, you cannot do any more input or
  64 output operations on it.
  65
  66 @menu
  67 * Streams and File Descriptors::    The GNU Library provides two ways
  68                                      to access the contents of files.
  69 * File Position::                   The number of bytes from the
  70                                      beginning of the file.
  71 @end menu
  72
  73 @node Streams and File Descriptors, File Position,  , I/O Concepts
  74 @subsection Streams and File Descriptors
  75
  76 When you want to do input or output to a file, you have a choice of two
  77 basic mechanisms for representing the connection between your program
  78 and the file: file descriptors and streams.  File descriptors are
  79 represented as objects of type @code{int}, while streams are represented
  80 as @code{FILE *} objects.
  81
  82 File descriptors provide a primitive, low-level interface to input and
  83 output operations.  Both file descriptors and streams can represent a
  84 connection to a device (such as a terminal), or a pipe or socket for
  85 communicating with another process, as well as a normal file.  But, if
  86 you want to do control operations that are specific to a particular kind
  87 of device, you must use a file descriptor; there are no facilities to
  88 use streams in this way.  You must also use file descriptors if your
  89 program needs to do input or output in special modes, such as
  90 nonblocking (or polled) input (@pxref{File Status Flags}).
  91
  92 Streams provide a higher-level interface, layered on top of the
  93 primitive file descriptor facilities.  The stream interface treats all
  94 kinds of files pretty much alike---the sole exception being the three
  95 styles of buffering that you can choose (@pxref{Stream Buffering}).
  96
  97 The main advantage of using the stream interface is that the set of
  98 functions for performing actual input and output operations (as opposed
  99 to control operations) on streams is much richer and more powerful than
 100 the corresponding facilities for file descriptors.  The file descriptor
 101 interface provides only simple functions for transferring blocks of
 102 characters, but the stream interface also provides powerful formatted
 103 input and output functions (@code{printf} and @code{scanf}) as well as
 104 functions for character- and line-oriented input and output.
 105
 106 Since streams are implemented in terms of file descriptors, you can
 107 extract the file descriptor from a stream and perform low-level
 108 operations directly on the file descriptor.  You can also initially open
 109 a connection as a file descriptor and then make a stream associated with
 110 that file descriptor.
 111
 112 In general, you should stick with using streams rather than file
 113 descriptors, unless there is some specific operation you want to do that
 114 can only be done on a file descriptor.  If you are a beginning
 115 programmer and aren't sure what functions to use, we suggest that you
 116 concentrate on the formatted input functions (@pxref{Formatted Input})
 117 and formatted output functions (@pxref{Formatted Output}).
 118
 119 If you are concerned about portability of your programs to systems other
 120 than GNU, you should also be aware that file descriptors are not as
 121 portable as streams.  You can expect any system running ANSI C to
 122 support streams, but non-GNU systems may not support file descriptors at
 123 all, or may only implement a subset of the GNU functions that operate on
 124 file descriptors.  Most of the file descriptor functions in the GNU
 125 library are included in the POSIX.1 standard, however.
 126
 127 @node File Position,  , Streams and File Descriptors, I/O Concepts
 128 @subsection File Position
 129
 130 One of the attributes of an open file is its @dfn{file position}
 131 that keeps track of where in the file the next character is to be read
 132 or written.  In the GNU system, the file position is simply an integer
 133 representing the number of bytes from the beginning of the file.
 134
 135 The file position is normally set to the beginning of the file when it
 136 is opened, and each time a character is read or written, the file
 137 position is incremented.  In other words, access to the file is normally
 138 @dfn{sequential}.
 139 @cindex file position
 140 @cindex sequential-access files
 141
 142 Ordinary files permit read or write operations at any position within
 143 the file.  Some other kinds of files may also permit this.  Files which
 144 do permit this are sometimes referred to as @dfn{random-access} files.
 145 You can change the file position using the @code{fseek} function on a
 146 stream (@pxref{File Positioning}) or the @code{lseek} function on a file
 147 descriptor (@pxref{I/O Primitives}).  If you try to change the file
 148 position on a file that doesn't support random access, you get an error.
 149 @cindex random-access files
 150
 151 Streams and descriptors that are opened for @dfn{append access} are
 152 treated specially for output: output to such files is @emph{always}
 153 appended sequentially to the @emph{end} of the file, regardless of the
 154 file position.  But, the file position is still used to control where in
 155 the file reading is done.
 156 @cindex append-access files
 157
 158 If you'll think about it, you'll realize that several programs can read
 159 a given file at the same time.  In order for each program to be able to
 160 read the file at its own pace, each program must have its own file
 161 pointer, which is not affected by anything the other programs do.
 162
 163 In fact, each opening of a file creates a separate file position.
 164 Thus, if you open a file twice even in the same program, you get two
 165 streams or descriptors with independent file positions.
 166
 167 By contrast, if you open a descriptor and then duplicate it to get
 168 another descriptor, these two descriptors share the same file position:
 169 changing the file position of one descriptor will affect the other.
 170
 171 @node File Names,  , I/O Concepts, I/O Overview
 172 @section File Names
 173
 174 In order to open a connection to a file, or to perform other operations
 175 such as deleting a file, you need some way to refer to the file.  Nearly
 176 all files have names that are strings---even files which are actually
 177 devices such as tape drives or terminals.  These strings are called
 178 @dfn{file names}.  You specify the file name to say which file you want
 179 to open or operate on.
 180
 181 This section describes the conventions for file names and how the
 182 operating system works with them.
 183 @cindex file name
 184
 185 @menu
 186 * Directories::                 Directories contain entries for files.
 187 * File Name Resolution::        A file name specifies how to look up a file.
 188 * File Name Errors::            Error conditions relating to file names.
 189 * File Name Portability::       File name portability and syntax issues.
 190 @end menu
 191
 192
 193 @node Directories, File Name Resolution,  , File Names
 194 @subsection Directories
 195
 196 In order to understand the syntax of file names, you need to understand
 197 how the file system is organized into a hierarchy of directories.
 198
 199 @cindex directory
 200 @cindex link
 201 @cindex directory entry
 202 A @dfn{directory} is a file that contains information to associate other
 203 files with names; these associations are called @dfn{links} or
 204 @dfn{directory entries}.  Sometimes, people speak of ``files in a
 205 directory'', but in reality, a directory only contains pointers to
 206 files, not the files themselves.
 207
 208 @cindex file name component
 209 The name of a file contained in a directory entry is called a @dfn{file
 210 name component}.  In general, a file name consists of a sequence of one
 211 or more such components, separated by the slash character (@samp{/}).  A
 212 file name which is just one component names a file with respect to its
 213 directory.  A file name with multiple components names a directory, and
 214 then a file in that directory, and so on.
 215
 216 Some other documents, such as the POSIX standard, use the term
 217 @dfn{pathname} for what we call a file name, and either
 218 @dfn{filename} or @dfn{pathname component} for what this manual calls a
 219 file name component.  We don't use this terminology because a ``path''
 220 is something completely different (a list of directories to search), and
 221 we think that ``pathname'' used for something else will confuse users.
 222 We always use ``file name'' and ``file name component'' (or sometimes
 223 just ``component'', where the context is obvious) in GNU documentation.
 224
 225 You can find more detailed information about operations on directories
 226 in @ref{File System Interface}.
 227
 228 @node File Name Resolution, File Name Errors, Directories, File Names
 229 @subsection File Name Resolution
 230
 231 A file name consists of file name components separated by slash
 232 (@samp{/}) characters.  Multiple successive @samp{/} characters are
 233 equivalent to a single @samp{/} character.
 234
 235 @cindex file name resolution
 236 The process of determining what file a file name refers to is called
 237 @dfn{file name resolution}.  This is performed by examining the
 238 components that make up a file name in left-to-right order, and locating
 239 each successive component in the directory named by the previous
 240 component.  Of course, each of the files that are referenced as
 241 directories must actually exist, be directories instead of regular
 242 files, and have the appropriate permissions to be accessible by the
 243 process; otherwise the file name resolution fails.
 244
 245 @cindex root directory
 246 @cindex absolute file name
 247 If a file name begins with a @samp{/}, the first component in the file
 248 name is located in the @dfn{root directory} of the process.  Such a file
 249 name is called an @dfn{absolute file name}.
 250
 251 @cindex relative file name
 252 Otherwise, the first component in the file name is located in the
 253 current working directory (@pxref{Working Directory}).  This kind of
 254 file name is called a @dfn{relative file name}.
 255
 256 @cindex parent directory
 257 The file name components @file{.} (``dot'') and @file{..} (``dot-dot'')
 258 have special meanings.  Every directory has entries for these file name
 259 components.  The file name component @file{.} refers to the directory
 260 itself, while the file name component @file{..} refers to its
 261 @dfn{parent directory} (the directory that contains the link for the
 262 directory in question).
 263
 264 Here are some examples of file names:
 265
 266 @table @file
 267 @item /a
 268 The file named @file{a}, in the root directory.
 269
 270 @item /a/b
 271 The file named @file{b}, in the directory named @file{a} in the root directory.
 272
 273 @item a
 274 The file named @file{a}, in the current working directory.
 275
 276 @item /a/./b
 277 This is the same as @file{/a/b}.
 278
 279 @item ./a
 280 The file named @file{a}, in the current working directory.
 281
 282 @item ../a
 283 The file named @file{a}, in the parent directory of the current working
 284 directory.
 285 @end table
 286
 287 A file name that names a directory may optionally end in a @samp{/}.  You
 288 can specify a file name of @file{/} to refer to the root directory, but
 289 you can't have an empty file name.  If you want to refer to the current
 290 working directory, use a file name of @file{.} or @file{./}.
 291
 292 Unlike some other operating systems, the GNU system doesn't have any
 293 built-in support for file types (or extensions) or file versions as part
 294 of its file name syntax.  Many programs and utilities use conventions
 295 for file names---for example, files containing C source code usually
 296 have names suffixed with @samp{.c}---but there is nothing in the file
 297 system itself that enforces this kind of convention.
 298
 299 @node File Name Errors, File Name Portability, File Name Resolution, File Names
 300 @subsection File Name Errors
 301
 302 @cindex file name syntax errors
 303 @cindex usual file name syntax errors
 304
 305 Functions that accept file name arguments usually detect these
 306 @code{errno} error conditions relating to file name syntax.  These
 307 errors are referred to throughout this manual as the @dfn{usual file
 308 name syntax errors}.
 309
 310 @table @code
 311 @item EACCES
 312 The process does not have search permission for a directory component
 313 of the file name.
 314
 315 @item ENAMETOOLONG
 316 This error is used when either the the total length of a file name is
 317 greater than @code{PATH_MAX}, or when an individual file name component
 318 has a length greater than @code{NAME_MAX}.  @xref{Limits for Files}.
 319
 320 In the GNU system, there is no imposed limit on overall file name
 321 length, but some file systems may place limits on the length of a
 322 component.
 323
 324 @item ENOENT
 325 This error is reported when a file referenced as a directory component
 326 in the file name doesn't exist.  It also is used when an empty file name
 327 string is supplied.
 328
 329 @item ENOTDIR
 330 A file that is referenced as a directory component in the file name
 331 exists, but it isn't a directory.
 332 @end table
 333
 334
 335 @node File Name Portability,  , File Name Errors, File Names
 336 @subsection Portability of File Names
 337
 338 The rules for the syntax of file names discussed in @ref{File Names},
 339 are the rules normally used by the GNU system and by other POSIX
 340 systems.  However, other operating systems may use other conventions.
 341
 342 There are two reasons why it can be important for you to be aware of
 343 file name portability issues:
 344
 345 @itemize @bullet
 346 @item
 347 If your program makes assumptions about file name syntax, or contains
 348 embedded literal file name strings, it is more difficult to get it to
 349 run under other operating systems that use different syntax conventions.
 350
 351 @item
 352 Even if you are not concerned about running your program on machines
 353 that run other operating systems, it may still be possible to access
 354 files that use different naming conventions.  For example, you may be
 355 able to access file systems on another computer running a different
 356 operating system over a network, or read and write disks in formats used
 357 by other operating systems.
 358 @end itemize
 359
 360 The ANSI C standard says very little about file name syntax, only that
 361 file names are strings.  In addition to varying restrictions on the
 362 length of file names and what characters can validly appear in a file
 363 name, different operating systems use different conventions and syntax
 364 for concepts such as structured directories and file types or
 365 extensions.  Some concepts such as file versions might be supported in
 366 some operating systems and not by others.
 367
 368 The POSIX.1 standard allows implementations to put additional
 369 restrictions on file name syntax, concerning what characters are
 370 permitted in file names and on the length of file name and file name
 371 component strings.  However, in the GNU system, you do not need to worry
 372 about these restrictions; any character except the null character is
 373 permitted in a file name string, and there are no limits on the length
 374 of file name strings.
 375
 376