C/the.ansi.c.programming.language/notes.accompany.ansi.c/sx7c.html

   1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
   2 <!-- This collection of hypertext pages is Copyright 1995, 1996 by Steve Summit. -->
   3 <!-- This material may be freely redistributed and used -->
   4 <!-- but may not be republished or sold without permission. -->
   5 <html>
   6 <head>
   7 <link rev="owner" href="mailto:scs@eskimo.com">
   8 <link rev="made" href="mailto:scs@eskimo.com">
   9 <title>section 4.3: External Variables</title>
  10 <link href="sx7b.html" rev=precedes>
  11 <link href="sx7d.html" rel=precedes>
  12 <link href="sx7.html" rev=subdocument>
  13 </head>
  14 <body>
  15 <H2>section 4.3: External Variables</H2>
  16
  17 <p>The word ``external'' is, roughly speaking, equivalent to ``global.''
  18 </p><p>page 74
  19 </p><p>A program with
  20 ``too many data connections between functions''
  21 hasn't managed to achieve
  22 the desirable attributes
  23 we were talking about earlier,
  24 in particular that a function's
  25 ``interface to the rest of the program is clean and narrow.''
  26 Another bit of jargon you may hear is the word ``coupling,''
  27 which refers to how much one piece of a program has to know
  28 about another.
  29 </p><p>In general,
  30 as we have mentioned,
  31 the connections between functions should generally be few and well-defined,
  32 in which case they will be amenable to regular old function arguments,
  33 and you won't be tempted to pass lots of data around in global variables.
  34 (On the other hand,
  35 global variables are fine for some things,
  36 such as
  37 configuration information which the whole program cares about
  38 and which is set just once at program startup and then doesn't change.)
  39 </p><p>The word ``lifetime'' refers to how long a variable and its value stick around.
  40 (The jargon term is ``duration.'')
  41 So far, we've seen that global variables persist for the life of the program,
  42 while local variables last only as long
  43 as the functions defining them are active.
  44 However, lifetime (duration) is a separate and orthogonal concept from scope;
  45 we'll soon be meeting local variables which persist for the life of the program.
  46 </p><p>Deep sentence:
  47 <blockquote>Thus if two functions must share some data,
  48 yet neither calls the other,
  49 it is often most convenient
  50 if the shared data is kept in external variables
  51 rather than passed in and out via arguments.
  52 </blockquote>(Later,
  53 though,
  54 we'll learn about data structures
  55 which can make it more convenient
  56 to pass certain data around via function arguments,
  57 so we'll have less reason for using external variables
  58 for these sorts of purposes.)
  59 </p><p>``Reverse Polish'' is used by some
  60 (earlier, all)
  61 Hewlett-Packard calculators.
  62 (The name is based on the nationality of the mathematician who
  63 studied and formalized this notation.)
  64 It may seem strange at first,
  65 but it's natural if you observe that you need both numbers (operands)
  66 before you can carry out an operation on them.
  67 (This fact is one of the reasons that reverse Polish notation
  68 is ``easier to implement.'')
  69 </p><p>The calculator example is a bit long and a bit involved,
  70 but I urge you to work through and understand it.
  71 A calculator is something that everyone's likely to be familiar with;
  72 it's interesting to see how one might work inside;
  73 and the techniques used here are generally useful in all sorts of programs.
  74 </p><p>A ``stack'' is simply a last-in, first-out list.
  75 You ``push'' data items onto a stack,
  76 and whenever you ``pop'' an item from the stack,
  77 you get the one most recently pushed.
  78 </p><p>pages 76-79
  79 </p><p>The code for the calculator may seem daunting at first,
  80 but it's much easier to follow if you look at each part in isolation
  81 (as good functions are meant to be looked at),
  82 and notice that the routines fall into three levels.
  83 At the top level is the calculator itself,
  84 which resides in the function <TT>main</TT>.
  85 The main function calls three lower-level functions:
  86 <TT>push</TT>, <TT>pop</TT>, and <TT>getop</TT>.
  87 <TT>getop</TT>, in turn,
  88 is written in terms of the still lower-level functions
  89 <TT>getch</TT> and <TT>ungetch</TT>.
  90 </p><p>A few details of the communication among these functions deserve mention.
  91 The <TT>getop</TT> routine actually returns two values.
  92 Its formal return value is a character
  93 representing the next operation to be performed.
  94 Usually,
  95 that character is just
  96 the character the user typed,
  97 that is, <TT>+</TT>, <TT>-</TT>, <TT>*</TT>, or <TT>/</TT>.
  98 In the case of a number typed by the user,
  99 the special code <TT>NUMBER</TT> is returned
 100 (which happens to be <TT>#define</TT>d to be the character <TT>'0'</TT>,
 101 but that's arbitrary).
 102 A return value of <TT>NUMBER</TT>
 103 indicates that an entire string of digits has been typed,
 104 and the string itself is copied into the array <TT>s</TT>
 105 passed to <TT>getop</TT>.
 106 In this case,
 107 therefore,
 108 the array <TT>s</TT> is the second return value.
 109 </p><p>In some printings,
 110 the second line on page 76 reads
 111 <pre>   #include &lt;math.h&gt; /* for atof() */
 112 </pre>which is incorrect;
 113 it should be
 114 <pre>   #include &lt;stdlib.h&gt;       /* for atof() */
 115 </pre></p><p>page 77
 116 </p><p>Make sure you understand why the code
 117 <pre>   push(pop() - pop());    /* WRONG */
 118 </pre>might not work correctly.
 119 </p><p>``The representation can be hidden''
 120 means that the declarations of these variables can follow
 121 <TT>main</TT> in the file, such that main can't ``see'' them
 122 (that is, can't attempt to refer to them).
 123 Furthermore,
 124 as we'll see,
 125 the declarations might be moved to a separate source file,
 126 and <TT>main</TT> won't care.
 127 </p><p>pages 77-78
 128 </p><p>Note that <TT>getop</TT> does not incorporate the
 129 functionality of <TT>atoi</TT> or <TT>atof</TT>--it
 130 collects and returns the digits as a string,
 131 and <TT>main</TT> calls <TT>atof</TT> to convert the string
 132 to a floating-point number (prior to pushing it on the stack).
 133 (There's nothing profound about this arrangement;
 134 there's no particular reason why
 135 <TT>getop</TT> couldn't have been set up to do the conversion itself.)
 136 </p><p>The reasons for using a routine like <TT>ungetch</TT> are good and sufficient,
 137 but they may not be obvious at first.
 138 The essential motivation,
 139 as the authors explain,
 140 is that when we're reading a string of digits,
 141 we don't know when we've reached the end of the string of digits
 142 until we've read a non-digit,
 143 and that non-digit is not part of the string of digits,
 144 so we really shouldn't have read it yet,
 145 after all.
 146 The rest of the program is set up based on the assumption that
 147 one call to <TT>getop</TT> will return the string of digits,
 148 and the <em>next</em> call will return whatever operator
 149 followed the string of digits.
 150 </p><p>
 151 To understand why the surprising
 152 and perhaps kludgey-sounding
 153 <TT>getch</TT>/<TT>ungetch</TT>
 154 approach is in fact a good one,
 155 let's consider the alternatives.
 156 <TT>getop</TT> could keep track of the one-too-far character somehow,
 157 and remember to use it next time instead of reading a new character.
 158 (Exercise 4-11 asks you to implement exactly this.)
 159 But this arrangement of <TT>getop</TT> is considerably less
 160 clean from the standpoint of the ``invariants'' we were discussing
 161 earlier.
 162 <TT>getop</TT> can be written relatively cleanly if one of its
 163 invariants is that the operator it's getting is always formed
 164 by reading the next character(s) from the input stream.
 165 <TT>getop</TT> would be considerably messier if it always had
 166 to remember to use an old character if it had one,
 167 or read a new character otherwise.
 168 If <TT>getop</TT> were modified later
 169 to read new kinds of operators,
 170 and if reading them involved reading more characters,
 171 it would be easy to forget to take into account the possibility
 172 of an old character each time a new character was needed.
 173 In other words,
 174 everywhere that
 175 <TT>getop</TT> wanted to do the operation
 176 <pre>   <I>read the next character</I>
 177 </pre>it would instead have to do
 178 <pre>   if (<I>there's an old character</I>)
 179                 <I>use it</I>
 180         else
 181                 <I>read the next character</I>
 182 </pre>It's much cleaner to push the checking for an old character
 183 down into the <TT>getch</TT> routine.
 184 </p><p>Devising a pair of routines like <TT>getch</TT> and <TT>ungetch</TT>
 185 is an excellent example of the process of <dfn>abstraction</dfn>.
 186 We had a problem:
 187 while reading a string of digits,
 188 we always read one character too far.
 189 The obvious solution--remembering the one-too-far character
 190 and using it later--would have been clumsy if we'd
 191 implemented it directly within <TT>getop</TT>.
 192 So we invented some new functions to centralize and encapsulate
 193 the functionality of remembering accidentally-read characters,
 194 so that <TT>getop</TT> could be written cleanly in terms of a
 195 simple ``get next character'' operation.
 196 By centralizing the functionality,
 197 we make it easy for <TT>getop</TT> to use it consistently,
 198 and by encapsulating it,
 199 we hide the (potentially ugly) details from the rest of the program.
 200 <TT>getch</TT> and <TT>ungetch</TT> may be tricky to write,
 201 but once we've written them,
 202 we can seal up the little black boxes they're in
 203 and not worry about them any more,
 204 and the rest of the program
 205 (especially <TT>getop</TT>)
 206 is cleaner.
 207 </p><p>page 79
 208 </p><p>If you're not used to the conditional operator <TT>?:</TT> yet,
 209 here's how <TT>getch</TT> would look without it:
 210 <pre>   int getch(void)
 211         {
 212                 if (bufp &gt; 0)
 213                         return buf[--bufp];
 214                 else    return getchar();
 215         }
 216 </pre>Also, the extra generality of these two routines
 217 (namely, that they can push back and remember several characters,
 218 a feature which the calculator program doesn't even use)
 219 makes them a bit harder to follow.
 220 Exercise 4-8 asks you two write simpler versions which allow
 221 only one character of pushback.
 222 (Also, as the text notes,
 223 we don't really have to be writing <TT>ungetch</TT> at all,
 224 because the standard library already provides an
 225 <TT>ungetc</TT> which
 226 can provide one character of pushback for
 227 <TT>getchar</TT>.)
 228 </p><p>When we defined a stack,
 229 we said that it was ``last-in, first-out.''
 230 Are the versions of <TT>getch</TT> and <TT>ungetch</TT> on
 231 page 79 last-in, first-out or first-in, first out?
 232 Do you agree with this choice?
 233 </p><p>One last note:
 234 the name of the variable <TT>bufp</TT> suggests that it is a pointer,
 235 but it's actually an index into the <TT>buf</TT> array.
 236 </p><hr>
 237 <p>
 238 Read sequentially:
 239 <a href="sx7b.html" rev=precedes>prev</a>
 240 <a href="sx7d.html" rel=precedes>next</a>
 241 <a href="sx7.html" rev=subdocument>up</a>
 242 <a href="top.html">top</a>
 243 </p>
 244 <p>
 245 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
 246 // <a href="copyright.html">Copyright</a> 1995, 1996
 247 // <a href="mailto:scs@eskimo.com">mail feedback</a>
 248 </p>
 249 </body>
 250 </html>