1 <!DOCTYPE HTML PUBLIC
"-//W3O//DTD W3 HTML 2.0//EN">
2 <!-- This collection of hypertext pages is Copyright 1995-7 by Steve Summit. -->
3 <!-- This material may be freely redistributed and used -->
4 <!-- but may not be republished or sold without permission. -->
7 <link rev=
"owner" href=
"mailto:scs@eskimo.com">
8 <link rev=
"made" href=
"mailto:scs@eskimo.com">
9 <title>17.1: Text Data Files
</title>
10 <link href=
"sx3.html" rev=precedes
>
11 <link href=
"sx3b.html" rel=precedes
>
12 <link href=
"sx3.html" rev=subdocument
>
15 <H2>17.1: Text Data Files
</H2>
19 are not always as compact
20 or as efficient to read and write
22 It can be a bit more work to set up the code which reads and writes them.
23 But they have some powerful advantages:
25 you can look at them using ordinary text editors and other tools.
26 If program A is writing a data file
27 which program B is supposed to
28 be able to read but cannot,
29 you can immediately look at the file
30 to see if it's in the correct format
31 and so determine whether it's program A's or B's fault.
32 If program A has not been written yet,
33 you can easily create a data file by hand to test program B with.
34 Text files are automatically portable between machines,
35 even those where integers and other data types
36 are of different sizes or are laid out differently in memory.
37 Because they're not expected to have the rigid formats of binary files,
38 it tends to be more natural to arrange
40 that as the data file format changes slightly,
41 newer (or older) versions of the software
42 can read older (or newer) versions of the data file.
43 Text data files are the focus of this chapter;
44 they're what I use all the time,
45 and they're what I recommend you use
46 unless you have compelling reasons not to.
47 </p><p>When we're using text data files, we acknowledge
48 that the
<em>internal
</em> and
<em>external
</em> representations
51 For example, a value of type
<TT>int
</TT>
52 will usually be represented internally as a
2- or
4-byte
56 that integer will be represented as a string of characters
57 representing its decimal or hexadecimal value.
58 Converting back and forth
59 between the internal and external representations
61 To go from the internal representation to the external,
62 we'll almost always use
<TT>printf
</TT> or
<TT>fprintf
</TT>;
64 to convert an
<TT>int
</TT> we might use
<TT>%d
</TT> or
<TT>%x
</TT> format.
65 To convert from the external representation back to the internal,
66 we could use
<TT>scanf
</TT> or
<TT>fscanf
</TT>,
67 or read the characters in some other way
70 <TT>atoi
</TT>,
<TT>strtol
</TT>, or
<TT>sscanf
</TT>.
71 </p><p>We have a great many options
72 when it comes to performing this mapping,
74 when converting between the internal and external representations.
75 Our choice may be determined by the layout we want the data file to have,
76 or by what's easiest to implement,
77 or by some combination of these factors.
78 Some of the choices are pretty arbitrary;
80 what matters most is obviously that
81 the reading and writing code ``match'',
83 that the data file writing code write the data in the right format
84 such that the data file reading code can accurately read it.
85 For the rest of this section,
86 we'll explore several ways of writing and reading data
87 to and from text data files,
88 using various combinations of the stdio functions
89 (and perhaps one or two of our own).
90 </p><p>Suppose we had an array of integers:
94 and suppose it had been filled up with values,
95 and suppose we wanted to write them out to a data file.
96 We could write them all on one line, separated by spaces:
98 fprintf(ofp,
"%d %d %d %d %d %d %d %d %d %d\n",
99 a[
0], a[
1], a[
2], a[
3], a[
4], a[
5],
100 a[
6], a[
7], a[
8], a[
9]);
102 We could write them on
10 separate lines:
104 for(i =
0; i
< 10; i++)
105 fprintf(ofp,
"%d\n", a[i]);
107 Realizing that the loop is easier and more flexible,
108 we could go back to writing them all on one line, using a loop:
110 for(i =
0; i
< 10; i++)
111 fprintf(ofp,
"%d ", a[i]);
114 If we were worried about that trailing space at the end of the line,
115 we could arrange to eliminate it:
117 for(i =
0; i
< 10; i++)
121 fprintf(ofp,
"%d", a[i]);
125 Recognizing that
<TT>fprintf
</TT> is overkill
126 for printing single, fixed characters,
127 we could replace two of the calls with
<TT>putc
</TT>:
129 for(i =
0; i
< 10; i++)
133 fprintf(ofp,
"%d", a[i]);
137 </p><p>When it came time to read the numbers in,
138 we would have at least as many choices.
139 We could read the ten values all at once, using
<TT>fscanf
</TT>:
141 int r = fscanf(ifp,
"%d %d %d %d %d %d %d %d %d %d",
142 &a[
0],
&a[
1],
&a[
2],
&a[
3],
&a[
4],
&a[
5],
143 &a[
6],
&a[
7],
&a[
8],
&a[
9]);
145 fprintf(stderr,
"error in data file\n");
147 Since the
<TT>scanf
</TT> family treats all whitespace
148 (spaces, tabs, and newlines)
150 this code would read either the format with all the numbers on one line,
151 or the format with one number per line.
152 Notice that we check
<TT>fscanf
</TT>'s return value,
153 to make sure that it successfully read in
154 all the numbers we expected it to.
155 Since data files come in from the outside world,
156 it's possible for them to be corrupted,
157 and programs should not blindly read them assuming that they're perfect.
158 A program that crashes when it attempts to read a damaged data file
159 is terribly frustrating;
160 a program that diagnoses the problem is much more polite.
161 </p><p>We could also read the data file a line at a time,
162 converting the text to integers via other means.
163 If the integers were stored one per line,
164 we could use code like this:
169 for(i =
0; i
< 10; i++)
171 if(fgets(line, MAXLINE, ifp) == NULL)
173 fprintf(stderr,
"error in data file\n");
180 our own
<TT>getline
</TT> or
<TT>fgetline
</TT> function
181 instead of
<TT>fgets
</TT>.)
182 If the integers were stored all on one line,
183 we could use the
<TT>getwords
</TT> function from chapter
10
184 to separate the numbers at the whitespace boundaries:
188 if(fgets(line, MAXLINE, ifp) == NULL)
189 fprintf(stderr,
"error in data file\n");
190 else if(getwords(line, av,
10) !=
10)
191 fprintf(stderr,
"error in data file\n");
193 for(i =
0; i
< 10; i++)
197 </p><p>Suppose, now, that
198 there were not always
10 elements in the array
<TT>a
</TT>;
199 suppose we had a separate integer variable
<TT>na
</TT>
200 to record how many elements the array
<TT>a
</TT> currently contains.
201 When writing the data out,
202 we would certainly then
204 we might also want to precede the data by the count,
205 in case that will make it easier for the reading program:
207 fprintf(ofp,
"%d\n", na);
208 for(i =
0; i
< na; i++)
209 fprintf(ofp,
"%d\n", a[i]);
211 We could also print all
215 fprintf(ofp,
"%d", na);
216 for(i =
0; i
< na; i++)
217 fprintf(ofp,
" %d ", a[i]);
219 (Notice that the presence of the extra value at the beginning of the line
220 makes the space separator game easier to play.)
221 </p><p>Now, when reading the data in, we would simply read the count first,
223 Using
<TT>fscanf
</TT>:
225 if(fscanf(ifp,
"%d",
&na) !=
1)
227 fprintf(stderr,
"error in data file\n");
233 fprintf(stderr,
"too many items in data file\n");
237 for(i =
0; i
< na; i++)
239 if(fscanf(ifp,
"%d",
&a[i]) !=
1)
241 fprintf(stderr,
"error in data file\n");
247 the code to read the array from the data file is part of a function,
248 and that when we detect an error,
249 we return early from the function.
251 we would probably return some error code to the caller.)
252 </p><p>If we chose to use
<TT>fgets
</TT>
253 (or
<TT>fgetline
</TT>),
254 the code might look like this for data on separate lines:
256 if(fgets(line, MAXLINE, ifp) == NULL)
258 fprintf(stderr,
"error in data file\n");
264 fprintf(stderr,
"too many items in data file\n");
268 for(i =
0; i
< na; i++)
270 if(fgets(line, MAXLINE, ifp) == NULL)
272 fprintf(stderr,
"error in data file\n");
278 Or, if the data were all on one line, like this:
283 if(fgets(line, MAXLINE, ifp) == NULL)
285 fprintf(stderr,
"error in data file\n");
289 ac = getwords(line, av,
10);
292 fprintf(stderr,
"error in data file\n");
298 fprintf(stderr,
"too many items in data file\n");
303 fprintf(stderr,
"error in data file\n");
306 for(i =
0; i
< na; i++)
307 a[i] = atoi(av[i+
1]);
309 </p><p>But sometimes, you don't need to save the count
312 the reading program can deduce the number of items
313 from the number of items in the file.
314 If the file contains
<em>only
</em> the integers in this array,
315 then we can simply read integers until we reach end-of-file.
316 For example, using
<TT>fscanf
</TT>:
319 while(na
< 10 && fscanf(ifp,
"%d",
&a[na]) ==
1)
322 (This code is deceptively simple;
323 we haven't carefully dealt with appropriate error messages
324 for a data file with more than
10 values,
325 or a data file with a non-numeric ``value''
326 for which
<TT>fscanf
</TT> returns
0.)
327 </p><p>Again, we could also use
<TT>fgets
</TT>.
328 If the data is on separate lines:
331 while(na
< 10 && fgets(line, MAXLINE, ifp) != NULL)
332 a[na++] = atoi(line);
334 If the data is all on one line:
336 if(fgets(line, MAXLINE, ifp) == NULL)
338 fprintf(stderr,
"error in data file\n");
341 na = getwords(line, av,
10);
344 fprintf(stderr,
"too many items in data file\n");
347 for(i =
0; i
< na; i++)
350 Notice that this last implementation does not require
351 that the file consist of
353 data for the array
<TT>a
</TT>.
354 One
<em>line
</em> of the file consists of data for the array
<TT>a
</TT>,
355 but other lines of the file could contain other data.
356 </p><p>We could also scatter
<TT>a
</TT>'s data on multiple lines,
357 without using an explicit count,
358 and with the ability for the file to contain other data as well,
359 if we marked the end of the array data with an explicit marker in the file,
360 rather than assuming that the array's data continued until end-of-file.
361 For example, we could write the data out like this:
363 for(i =
0; i
< na; i++)
364 fprintf(ofp,
"%d\n", a[i]);
365 fprintf(ofp,
"end\n");
367 and read it like this:
370 while(fgets(line, MAXLINE, ifp) != NULL)
372 if(strncmp(line,
"end",
3) ==
0)
376 fprintf(stderr,
"too many items in data file\n");
379 a[na++] = atoi(line);
382 (There's just one nuisance here in checking for the ``end'' marker:
383 <TT>fgets
</TT> leaves the
<TT>\n
</TT> in the line it reads,
384 so a simple
<TT>strcmp
</TT> against
<TT>"end"</TT> would fail.
385 Here we use
<TT>strncmp
</TT>, which compares at most
<TT>n
</TT> characters,
386 and we pass the third argument,
<TT>n
</TT>, as
3.
387 Other solutions would be
388 to use
<TT>strcmp
</TT> against the string
<TT>"end\n"</TT>,
389 or to strip the
<TT>\n
</TT> somehow,
390 or to use our old
<TT>getline
</TT> or
<TT>fgetline
</TT>
392 since they strip the
<TT>\n
</TT> for us.)
393 </p><p>Now that we've seen many
395 options for writing and reading the array,
396 how do you decide which to use?
397 Should you use
<TT>fscanf
</TT>,
398 or the slightly more
<I>ad hoc
</I> methods
399 involving
<TT>fgets
</TT>,
<TT>getwords
</TT>,
<TT>atoi
</TT>, etc?
400 It's largely a matter of personal preference.
401 In the code fragments we've looked at so far,
402 the ones using
<TT>fscanf
</TT> have seemed shorter,
403 although in some cases that was because
404 they weren't doing as much error checking
405 as the ones that used
<TT>fgets
</TT>.
407 the methods using
<TT>fgets
</TT> will allow somewhat more flexibility,
408 as we saw when checking for the explicit ``end'' marker,
410 which would have been difficult or impossible
411 using
<TT>scanf
</TT> or
<TT>fscanf
</TT>.
412 </p><p>Now let's move to another example,
413 a user-defined data structure.
414 Suppose we have this structure:
423 To write an instance of this structure out,
424 we could simply print its fields on one line:
428 fprintf(ofp,
"%d %g %s\n", x.i, x.f, x.s);
432 fprintf(ofp,
"%d\n", x.i);
433 fprintf(ofp,
"%g\n", x.f);
434 fprintf(ofp,
"%s\n", x.s);
438 fprintf(ofp,
"%d\n%g\n%s\n", x.i, x.f, x.s);
440 (We use
<TT>%g
</TT> format for the
<TT>float
</TT> field
441 because
<TT>%g
</TT> tends to print
442 the most accurate representation in the smallest space,
443 e.g.
<TT>1.23e6
</TT> instead of
<TT>1230000</TT>
444 and
<TT>1.23e-6</TT> instead of
<TT>0.00000123</TT> or
<TT>0.000001</TT>.)
445 </p><p>To read this structure back in,
446 we could again either use
<TT>fscanf
</TT>,
447 or
<TT>fgets
</TT> and some other functions.
448 As before,
<TT>fscanf
</TT> seems easier:
450 if(fscanf(ifp,
"%d %g %s",
&x.i,
&x.f,
&x.s) !=
3)
452 fprintf(stderr,
"error in data file\n");
456 Here we have a problem, though:
457 what if the third, string field contains a space?
458 In the
<TT>scanf
</TT> family,
459 the
<TT>%s
</TT> format stops reading at whitespace,
460 so if
<TT>x.s
</TT> had contained the string
<TT>"Hello, world!"</TT>,
461 it would be read back in as
<TT>"Hello,"</TT>.
463 we could fix it by using the less-obvious format string
464 <TT>"%d %g %[^\n]"</TT>,
465 where
<TT>%[^\n]
</TT> means
466 ``match any string of characters not including
<TT>\n
</TT>''.
467 But we also have another problem:
468 what if the string is longer
469 than the
20 characters we allocated for the
<TT>s
</TT> field?
470 We could fix this by using
<TT>%
20s
</TT> or
<TT>%
20[^\n]
</TT>,
471 although we'd have to remember to change
472 the
<TT>scanf
</TT> format string
473 if we ever changed the size of the array.
474 </p><p>Let's leave
<TT>fscanf
</TT> for a moment
476 look at our other alternatives.
477 If we'd printed the data all on one line, we could use
479 #include
<stdlib.h
> /* for atof() */
483 if(fgets(line, MAXLINE, ifp) == NULL)
485 fprintf(stderr,
"error in data file\n");
488 if(getwords(line, av,
3) !=
3)
490 fprintf(stderr,
"error in data file\n");
495 strcpy(x.s, av[
2]); /* XXX */
498 on the question of what happens if the string contains a space,
499 because it happens that our version of
<TT>getwords
</TT>
500 (see chapter
10, p.
13)
505 if there are more words in the string than we told it to find,
506 i.e. more than the third argument to
<TT>getwords
</TT>
507 which gives the size of the
<TT>av
</TT> array.
508 Here, we told it it could only look for
3 words,
509 so if the string contains spaces,
510 making the line appear to have
4 or more words,
511 words
3,
4, etc. will all be pointed to by
<TT>av[
2]
</TT>.
512 However, we still have the problem
513 that we haven't guarded against overflow of
<TT>x.s
</TT>
514 if the third (plus fourth, etc.) word on the data line
515 is longer than
20 characters.
516 (The comment
<TT>/* XXX */
</TT> is a traditional marker which means
517 ``this line is inadequate
518 and definitely won't work reliably in all situations
519 but for one reason or another
520 the person writing it is
521 not going to take the trouble to do it right just yet.'')
522 </p><p>If the data is written on three lines,
524 we obviously have to call
<TT>fgets
</TT> three times to read it:
526 if(fgets(line, MAXLINE, ifp) == NULL)
527 { fprintf(stderr,
"error in data file\n"); return; }
530 if(fgets(line, MAXLINE, ifp) == NULL)
531 { fprintf(stderr,
"error in data file\n"); return; }
534 if(fgets(line, MAXLINE, ifp) == NULL)
535 { fprintf(stderr,
"error in data file\n"); return; }
536 strcpy(x.s, line); /* XXX */
538 Now the last line has two problems:
539 besides the lingering problem of overflow
540 (if the line is more than
18 characters long),
541 we have the problem that
<TT>fgets
</TT> retains the
<TT>\n
</TT>
542 (which is why
<TT>x.s
</TT> will overflow if
543 the line is longer than
18 characters, not
19).
544 In this case, one way to fix the overflow problem
545 would be to have
<TT>fgets
</TT> read into
<TT>x.s
</TT> directly:
547 if(fgets(x.s,
20, ifp) == NULL)
548 { fprintf(stderr,
"error in data file\n"); return; }
550 If we didn't want to have to remember
551 to change that
20 in the call to
<TT>fgets
</TT>
552 if we ever re-sized the array,
553 we could get clever and write
554 <TT>fgets(x.s, sizeof(x.s), ifp)
</TT>.
555 Also, we might as well figure out how to get rid of that pesky
<TT>\n
</TT>.
556 One way is by calling the standard library function
<TT>strchr
</TT>,
557 which searches for a certain character in a string.
560 will require that we
<TT>#include
<string.h
></TT>,
561 and declare an extra
<TT>char *
</TT> variable:
563 #include
<string.h
>
565 p = strchr(x.s, '\n');
569 <TT>strchr
</TT> returns a pointer to the character that it finds,
570 or a null pointer if it doesn't find the character.
571 If there's a
<TT>\n
</TT> in the line at all,
572 we know it's at the end,
573 so it's safe to overwrite it with a
<TT>\
0</TT>,
574 making the string one character shorter.
575 (Since we know that the
<TT>\n
</TT> is at the end,
580 which finds a character starting from the right.)
581 </p><p>For any of the methods we've been using so far,
582 what if one day we add a new field to the structure
<TT>s
</TT>?
583 Obviously, we'll have to rewrite the code which writes the structure out
584 and also the code which reads it in.
585 Also, unless we're careful,
586 the modified code won't be able to read in
587 any data files we might happen to have lying around
588 which were written before the structure was changed.
589 Depending on the nature of the data file and the way it's used,
590 this can be a real problem.
592 it's possible to write a utility program
593 to convert the old data files to the new format,
594 but it can be a nuisance to write that program,
595 and it can be a
<em>real
</em> nuisance to track down
596 all of the old data files that need converting.)
598 when a data file format must be changed,
599 it's often a good idea if the
602 can be made to automatically detect and read old-format files as well.
603 (Automatic detection isn't a strict necessity,
604 but it's certainly a nicety.)
606 it's
<em>much
</em> easier to write a new
& improved data file reader,
607 that can read both old and new formats,
608 if the possibility was thought of
609 back when the original data file format was designed.
610 </p><p>One thing that helps a lot is if data file formats have version numbers,
611 and if each data file begins with a number,
612 in a simple format and known location
613 which won't change even if the rest of the format changes,
614 indicating which version of the format this file uses.
615 Having a file format version number
616 at the beginning of each data file leads to two immediate advantages:
617 <OL><li>Whenever a new program reads a data file,
618 it can immediately and unambiguously decide how it's going to read it,
619 whether it can use its new
& improved reading routines
620 or whether it might have to fall back
621 on its backwards-compatibility, old-style reader.
622 <li>If there is a suite of several programs,
623 all of which read the same data files,
624 and if for some reason
625 there's an old version of one of the programs still in use,
626 the old program can print an unambiguous message
628 ``this is a new data file which I am too old to read'',
629 rather than printing the
630 (misleading, in this case)
631 ``error in data file''
633 </OL></p><p>Another technique
634 which can be immensely useful
635 and which we'll explore next
636 is to define a data file format in such a way
637 that the overall format doesn't change
638 even if new data is added to it.
639 </p><p>It's easy to see why
640 the simple data file fragments we've been looking at so far
641 are not resilient in the face of newly-introduced data fields.
642 In the case of
<TT>struct s
</TT>,
643 the reader always assumed that
644 the first field in the data file was
<TT>i
</TT>,
645 the second field was
<TT>f
</TT>,
646 and the third field was
<TT>s
</TT>.
647 If we ever add any new fields,
648 unless we're careful to add them at the end of the file
649 (and lucky on top of that),
650 the simpleminded reader will get confused.
651 </p><p>One powerful way of getting around this problem
652 is to
<dfn>tag
</dfn> each piece of data in the file,
653 so that the reader knows unambiguously what it is.
655 suppose that we wrote instances of our
<TT>struct s
</TT> out like this:
657 fprintf(ofp,
"i %d\n", x.i);
658 fprintf(ofp,
"f %g\n", x.f);
659 fprintf(ofp,
"s %s\n", x.s);
661 Now, each line begins with a little code which identifies it.
662 (The code in the data file happens to match
663 the name of the corresponding structure member,
664 but that's not necessary,
666 of getting the compiler to make any correspondence automatically.)
667 </p><p>If we simply modified one of our previous file-reading code fragments
668 to read this new, tagged format,
669 we might quickly end up with a mess.
670 We'd be continually checking the tag on the line we just read
671 against the tag we expected to read,
672 and constantly printing error messages or trying to resynchronize.
674 there's no reason to expect the lines to come in a certain order,
676 it turns out that it's easier to read such a file a line at a time,
677 without that assumption,
678 taking each line as it comes
681 worrying what order the lines come in.
682 Here is how we might do it:
684 x.i =
0; x.f =
0.0; x.s[
0] = '\
0';
686 while(fgets(line, MAXLINE, ifp) != NULL)
690 ac = getwords(line, av,
2);
693 if(strcmp(av[
0],
"i") ==
0)
695 else if(strcmp(av[
0],
"f") ==
0)
697 else if(strcmp(av[
0],
"s") ==
0)
698 strcpy(x.s, av[
1]); /* XXX */
701 This example also throws in a few new little features:
702 a line beginning with
<TT>#
</TT> is ignored,
703 so we will be able to place comment lines in data files
704 by beginning them with
<TT>#
</TT>.
705 The code also ignores blank lines
706 (those for which
<TT>getwords
</TT> returns
0).
707 </p><p>We're now treating the ``data file''
708 almost like a ``command file''--the
709 first word on each line is almost like a ``command''
710 telling us to do something:
711 <TT>i
</TT> means store this value in
<TT>x.i
</TT>;
712 <TT>f
</TT> means store this value in
<TT>x.f
</TT>,
714 Since we don't have any easy way
715 of telling whether we ever got around to setting a particular field,
716 we initialize each one to an appropriate default value
718 Notice that we did
<em>not
</em> have a last line
719 in the
<TT>if
</TT>/
<TT>else
</TT>/
<TT>if
</TT>/
<TT>else
</TT> chain
722 else fprintf(stderr,
"error in data file\n");
724 Instead, we quietly
<em>ignore
</em> lines we don't recognize!
725 This strategy is admittedly on the simpleminded side,
726 and it would not be adequate under all circumstances,
727 but it means that an old program can read a new data file
728 containing fields it's never heard of.
729 The old program will still be able to pluck out the data
730 it does recognize and can use,
731 while (deliberately) ignoring the (new) data it doesn't know about.
732 </p><p>This code is not perfect.
733 We still have the same sorts of problems with that string field,
<TT>s
</TT>:
734 it might contain spaces,
735 which we get around (this time)
736 by calling
<TT>getwords
</TT> with a second argument of
2,
739 the first word on the line
740 end up ``in''
<TT>av[
1]
</TT>.
741 Also, the code does not check
742 to see that there actually was a second word on the line
743 before using it to set
<TT>x.i
</TT>,
<TT>x.f
</TT>, or
<TT>x.s
</TT>.
745 we could fix that by complaining
746 if
<TT>getwords
</TT> did not return
2.)
747 </p><p>Finally, we still have the potential for overflow,
748 and we might as well grit our teeth now and figure out how to fix it.
749 Since we already initialized
<TT>x.s
</TT> to the empty string
750 with the assignment
<TT>x.s[
0] = '\
0'
</TT>,
751 one way around the problem
752 is to replace the call to
<TT>strcpy
</TT> with a call to
<TT>strncat
</TT>:
755 else if(strcmp(av[
0],
"s") ==
0)
756 strncat(x.s, av[
1],
19);
758 (or, again, perhaps
<TT>strncat(x.s, av[
1], sizeof(x.s)-
1)
</TT>).
759 The
<TT>strcat
</TT> and
<TT>strncat
</TT> functions
760 are slightly misleadingly named:
761 what they actually do is
<em>append
</em>
762 the second string you hand them
763 (i.e. the second argument)
764 to the first, in place.
765 In the case of
<TT>strncat
</TT>,
766 it never copies more than
<TT>n
</TT> characters,
767 where
<TT>n
</TT> is its third argument,
768 although it does always append a
<TT>\
0</TT>,
769 which is why we tell it to copy at most
19 characters, not
20.
771 (Since
<TT>x1.s
</TT> starts out empty,
772 there's definitely room for
19,
773 although we would still have to worry about the possibility
774 of a corrupted data file which contained two
<TT>s
</TT> lines.
775 You might wonder why we couldn't simply use
<TT>strncpy
</TT>,
776 but it turns out that,
777 for obscure historical reasons,
778 <TT>strncpy
</TT> does
<em>not
</em> always append the
<TT>\
0</TT>.)
779 </p><p>Although it has a few imperfections
780 (which are easily remedied, and are left as exercises)
783 (using
<TT>fgets
</TT>,
785 and an
<TT>if
</TT>/
<TT>strcmp
</TT>/
<TT>else
</TT>... chain)
786 is an excellent basis
787 for a flexible, robust data file reader.
788 </p><p>One footnote about the troublesome string field,
<TT>s
</TT>:
789 to get around the problem of fixed-size arrays,
790 you might one day decide
791 to declare the
<TT>s
</TT> field of
<TT>struct s
</TT>
792 as a pointer rather than a fixed-size array.
793 You would have to be careful while reading, however.
794 It might seem that you could just write,
797 x.s = av[
1]; /* assumes char *s, but also WRONG */
799 but this would
<em>not
</em> work;
800 remember that whenever you use pointers
801 you have to worry about memory allocation.
802 If you assigned
<TT>x.s
</TT> in that way,
803 where would be the memory that it points to?
805 wherever
<TT>av[
1]
</TT> points,
806 which is back into the
<TT>line
</TT> array.
807 Not only is that (probably) a local array,
808 valid only while the file-reading functions are active,
809 but it's also overwritten with each new line in the data file.
810 You'll obviously want
<TT>x.s
</TT>
811 to retain a useful pointer value
812 pointing to the text read from the file,
813 which means that you'll still have to make a copy,
814 after allocating some memory.
815 In this case, you might do
817 x.s = malloc(strlen(av[
1]) +
1);
819 { fprintf(stderr,
"out of memory\n"); return; }
823 the problems we've been having with field
<TT>s
</TT> are fundamental.
826 time you use text formats
827 which are based on whitespace-separated ``words,''
828 string fields which might
<em>contain
</em> spaces are always
835 <a href=
"sx3.html" rev=precedes
>prev
</a>
836 <a href=
"sx3b.html" rel=precedes
>next
</a>
837 <a href=
"sx3.html" rev=subdocument
>up
</a>
838 <a href=
"top.html">top
</a>
841 This page by
<a href=
"http://www.eskimo.com/~scs/">Steve Summit
</a>
842 //
<a href=
"copyright.html">Copyright
</a> 1996-
1999
843 //
<a href=
"mailto:scs@eskimo.com">mail feedback
</a>