C/the.ansi.c.programming.language/c.programming.notes.int/sx7.html

   1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
   2 <!-- This collection of hypertext pages is Copyright 1995-7 by Steve Summit. -->
   3 <!-- This material may be freely redistributed and used -->
   4 <!-- but may not be republished or sold without permission. -->
   5 <html>
   6 <head>
   7 <link rev="owner" href="mailto:scs@eskimo.com">
   8 <link rev="made" href="mailto:scs@eskimo.com">
   9 <title>Chapter 21: Pointer Allocation Strategies</title>
  10 <link href="sx6b.html" rev=precedes>
  11 <link href="sx8.html" rel=precedes>
  12 <link href="top.html" rev=subdocument>
  13 </head>
  14 <body>
  15 <H1>Chapter 21: Pointer Allocation Strategies</H1>
  16
  17 <p>Pointers are viewed by many as the bane of C programming,
  18 because out-of-control pointers can do a lot of damage,
  19 and can be hard to track down.
  20 But real programs tend to make heavy use of pointers.
  21 How can we keep pointers under control?
  22 </p><p>The big problem with pointers, of course,
  23 is that they can point anywhere,
  24 including to places they're not supposed to.
  25 When a pointer points to the wrong place
  26 (perhaps because it was never initialized properly,
  27 such that it essentially points to a random place),
  28 a fetch of the data it ``points'' to will result in
  29 garbage
  30 (or may cause the program to crash with a memory access
  31 violation),
  32 and a <em>write</em> of some new data to the location it
  33 ``points'' to will damage some other part of your
  34 program, or of some other program, or of the operating system
  35 (or may cause the program to crash).
  36 Crashes, in fact, though they're frustrating and annoying,
  37 may be preferable to the alternatives,
  38 namely performing quiet but meaningless computations
  39 or damaging other code,
  40 both of which can be even more annoying and even harder to track down.
  41 </p><p>Our goal, then, is to make sure that our pointers are always
  42 <dfn>valid</dfn>,
  43 or when they are not,
  44 to make sure that
  45 we can
  46 know that they are not.
  47 First, then, let's discuss what we mean by a ``valid pointer.''
  48 </p><p>A valid pointer
  49 (more precisely, a valid pointer value)
  50 is one that does in fact point to an object of the
  51 type that the pointer is declared to point to.
  52 Furthermore, if the pointer will be used to store new values,
  53 the old value must be sitting in writable memory
  54
  55 (that is, it must not be a variable that was declared <TT>const</TT>,
  56 or a string that results from a string literal).
  57 In contrast to valid pointers,
  58 we may distinguish among several kinds of invalid pointers:
  59 null pointers,
  60 uninitialized pointers,
  61 pointers to memory that used to exist but has disappeared,
  62 pointers to memory that once came from <TT>malloc</TT> but has
  63 since been freed.
  64 </p><p>The tricky thing about valid and invalid pointers
  65 is that there's no simple way in C to ask ``is this pointer
  66 valid?'' or ``is this pointer invalid?''.
  67 The only questions we can ask about pointers are
  68 ``is this pointer equal to this other pointer?'',
  69 ``is this pointer unequal to this other pointer?'',
  70 and, for pointers into the same array,
  71 ``is this pointer greater or less than this other pointer?''.
  72 </p><p>Part one of our strategy for managing pointers,
  73 then,
  74 will be to arrange that all or most invalid pointers are null pointers.
  75 Whenever we do anything which would cause a pointer to be invalid,
  76 that is,
  77 whenever we declare one
  78 (such that it would otherwise have a garbage initial value),
  79 or whenever we do something that causes the memory which one of
  80 our pointers used to point to to disappear,
  81 we'll set the pointer to <TT>NULL</TT>.
  82 Having done so, we can test whether the pointer is currently
  83 valid by checking if it's not equal to the null pointer,
  84 or contrariwise,
  85 we can test whether it's invalid by checking if it's equal to the
  86 null pointer.
  87 </p><p>Remember that C doesn't generally do any of this
  88 automatically.
  89 It does not guarantee that all newly-allocated pointers are
  90 initialized to null pointers,
  91 and it does not insert automatic validity checks before you try
  92 to use a pointer.
  93 If you want to be sure that a pointer is initialized to a null
  94 pointer,
  95 <em>you</em> must generally set it to <TT>NULL</TT>.
  96 If you have a pointer which you're thinking of using but which
  97 might or might not be valid
  98 (and if it's a pointer which you believe you'd have set to
  99 <TT>NULL</TT> if it was invalid),
 100 <em>you</em> must precede your use of the pointer with a test of
 101 the form
 102 <pre>
 103         if(p != NULL)
 104 </pre>
 105 </p><p>Furthermore,
 106 if you write the test
 107 <TT>if(p != NULL)</TT>,
 108 it does not in the general case mean
 109 ``is <TT>p</TT> valid?''.
 110 The test
 111 <TT>if(p != NULL)
 112 </TT>can only be used to mean
 113 ``is <TT>p</TT> valid?''
 114 <em>if</em> you have taken care
 115 to make sure that
 116 all non-valid pointers
 117 have been set to null.
 118 </p><p>(There is one condition under which C does guarantee that a
 119 pointer variable will be initialized to a null pointer,
 120 and that is when the pointer variable is a global variable or a
 121 member of a global structure,
 122 or more precisely,
 123 when it is part of a variable, array, or structure which has
 124 static duration.)
 125 </p><p>Remember, too,
 126 that the shorthand form
 127 <pre>
 128         if(p)
 129 </pre>
 130 is precisely equivalent to
 131 <TT>if(p != NULL)</TT>.
 132 So you may be able to read
 133 <TT>if(p)
 134 </TT>as
 135 ``if <TT>p</TT> is valid'',
 136 but again,
 137 only if
 138 you've ensured that
 139 whenever <TT>p</TT> is not valid,
 140 it is set to null.
 141 </p><p>The degree of care with which you have to implement a pointer
 142 management strategy may be different for
 143 different pointer variables
 144 you use.
 145 If a pointer variable is immediately set to a valid pointer
 146 value,
 147 and if nothing ever happens which could make it become invalid,
 148 then there's no need to check it before each time you use it.
 149 Similarly, if a pointer is set to point to different locations
 150 from time to time,
 151 but it can be shown that it will always be valid,
 152 there's again no reason to test it all the time.
 153 However, if a particular pointer is valid some of the time and
 154 invalid
 155 other
 156
 157 of the time,
 158 or in particular,
 159 if it records some optional data which might or might not be
 160 present,
 161 then you'll want to be very careful to set the pointer to
 162 <TT>NULL</TT> whenever it's not valid
 163 (or whenever the optional data is not present),
 164 and to test the pointer before using it (that is, before fetching
 165 or writing to the location that it points to).
 166 </p><p>Everything we've just said about ``pointer variables''
 167 is equally true,
 168 and perhaps more important,
 169 for pointer fields within structures.
 170 When you define a structure, you will typically be allocating
 171 many instances of that structure,
 172 so you will have many instances of that pointer.
 173 You will typically have central pieces of code which operate on
 174 instances of that structure,
 175 meaning that each time the
 176 piece of
 177 code runs,
 178 it may be operating on a different instance of the structure,
 179 so if the pointer field is one that isn't always valid
 180 (that is,
 181 isn't valid in all instances of the structure),
 182 the code had better test it before using it.
 183 Similarly, the code had better set the pointer field to
 184 <TT>NULL</TT> if it ever invalidates it.
 185 </p><p>For example, one of the first features we added to the adventure game
 186 was a long description for objects and rooms.
 187 But the long description is optional;
 188 not all objects and rooms have one.
 189 Suppose
 190 we chose to use a <TT>char *</TT>
 191 within <TT>struct object</TT> and <TT>struct room</TT>
 192 to point at a dynamically-allocated string containing the long description.
 193 (This choice would be preferable to a fixed-size array of <TT>char</TT>
 194 because it may be the case that some long descriptions will be elaborately long,
 195 and we'd neither want to limit the potential length of descriptions
 196 by having a too-small array
 197 nor waste space for objects with short or empty descriptions
 198 by always using a too-large array.)
 199 For each
 200 instance of an
 201 object or room structure,
 202 we'd initialize
 203 the description field to contain a null pointer.
 204 For each
 205 room or object
 206 with a long description,
 207 we'd set the description field to contain a pointer to the appropriate
 208 (and appropriately-allocated)
 209 string.
 210 Finally, when it came time to print the descrition,
 211 we'd use code like
 212 <pre>
 213         if(objp-&gt;desc != NULL)
 214                 printf("%s\n", objp-&gt;desc);
 215         else    printf("You see nothing special about the %s.\n", objp-&gt;name);
 216 </pre>
 217 </p><p>Particular care is needed when pointers point to
 218 dynamically-allocated memory,
 219 managed with the standard library functions <TT>malloc</TT>,
 220 <TT>free</TT>,
 221 and <TT>realloc</TT>.
 222 Somehow,
 223 it's easier to make mistakes here,
 224 and their consequences tend to be more damaging and harder to
 225 track down.
 226 </p><p>First of all,
 227 of course,
 228 you must always ensure that the allocation functions
 229 <TT>malloc</TT> and <TT>realloc</TT> succeed.
 230 These functions return null pointers
 231 when they are unable to allocate the requested memory,
 232 so you must <em>always</em> check the return value
 233 to see that it is not a null pointer,
 234 before using it.
 235 (If the return value is a null pointer,
 236 you will generally print some kind of error message
 237 and abort
 238 at least
 239 the particular function that needed the allocated memory,
 240 or perhaps abort the entire program.)
 241 </p><p>Don't get in the habit of assuming that
 242 a single, simple call to <TT>malloc</TT> will ``always'' succeed.
 243 Don't make excuses like
 244 ``this program doesn't use much memory to begin with,
 245 and I'm only allocating 10 bytes here,
 246 so how can it possibly fail?''
 247 For one thing, there are more reasons for <TT>malloc</TT>
 248 to fail--and return a null pointer--than
 249 that there was no more memory.
 250 Typically, <TT>malloc</TT> will also return a null pointer
 251 if it is able to detect that you have misused
 252 some of the memory that you have previously allocated,
 253 perhaps by writing to more of it than you asked for.
 254 In this case, <TT>malloc</TT> is trying to tell you something,
 255 something you need to know,
 256 and although its voice is small
 257 (and although tracking down the problem that it's complaining about
 258 may be difficult),
 259 you will only have more problems,
 260 and more difficult to track down,
 261 if <TT>malloc</TT> returns a null pointer
 262 but you then use that pointer as if it were valid.
 263 (As an example of how it can be alarmingly easy
 264 to misuse the memory that <TT>malloc</TT> gives you,
 265 consider this hypothetical scrap of code
 266 for making a dynamically-allocated copy of a string:
 267 <pre>
 268         char *copystring = malloc(strlen(originalstring))       /* Beware... */
 269         if(copystring != NULL)
 270                 strcpy(copystring, originalstring);
 271 </pre>
 272 Hint:
 273 what about the <TT>\0</TT> that terminates the string?)
 274 </p><p>In a program that allocates a lot of different pieces of memory
 275 for a lot of different things,
 276
 277 it can be a real nuisance to have to check
 278 each pointer returned
 279 from each call to <TT>malloc</TT>
 280 to make sure it's not null.
 281 One popular shortcut is to define
 282 a ``wrapper'' function around malloc,
 283 which calls <TT>malloc</TT>
 284 and checks the return value
 285 in one central place.
 286 For example,
 287 the adventure game uses the function
 288 <pre>
 289 #include &lt;stdio.h&gt;
 290 #include &lt;stdlib.h&gt;
 291 #include "chkmalloc.h"
 292
 293 void *
 294 chkmalloc(size_t sz)
 295 {
 296 void *ret = malloc(sz);
 297 if(ret == NULL)
 298         {
 299         fprintf(stderr, "Out of memory\n");
 300         exit(EXIT_FAILURE);
 301         }
 302 return ret;
 303 }
 304 </pre>
 305 One way to think about <TT>chkmalloc</TT>
 306 is that it centralizes the test on
 307 <TT>malloc</TT>'s return value.
 308 Another way of thinking about it
 309 is that it is a special, alternate version of <TT>malloc</TT>
 310 that never returns <TT>NULL</TT>.
 311 (The fact that it never returns <TT>NULL</TT>
 312 does not mean that it never fails,
 313 but just that if/when it does fail,
 314 it signifies this
 315 by calling <TT>exit</TT>
 316 instead of
 317 returning <TT>NULL</TT>.)
 318 Aborting the entire program
 319 when a call to <TT>malloc</TT> fails
 320 may seem draconian,
 321 and there are programs
 322 (e.g. text editors)
 323 for which it would be a completely unacceptable strategy,
 324 but it's fine for our purposes,
 325 especially if it doesn't happen very often.
 326 (In any case, aborting the program cleanly
 327 with a message like ``Out of memory''
 328 is still vastly preferable to crashing horribly and mysteriously,
 329 which is what programs that don't check
 330 <TT>malloc</TT>'s return value
 331 eventually
 332 do.)
 333 </p><p>Another
 334 area of concern
 335 is that
 336 when you're calling <TT>free</TT> and <TT>realloc</TT>,
 337 there are more ways for pointers to become invalid.
 338 For example,
 339 consider the code
 340 <pre>
 341         /* p is known to have come from malloc() */
 342         free(p);
 343 </pre>
 344 After calling <TT>free</TT>, is <TT>p</TT> valid or invalid?
 345 C uses pass-by-value, so <TT>p</TT>'s value hasn't changed.
 346 (The <TT>free</TT> function couldn't change it if it tried.)
 347 But <TT>p</TT> is most definitely now <em>invalid</em>;
 348 it no longer points to memory which the program can use.
 349 However, it does still point just where it used to,
 350 so if the program accidentally uses it,
 351 there will still seem to be data there,
 352 except that the data will be sitting in memory which may now
 353 have been allocated to ``someone else''!
 354 Therefore, if the variable <TT>p</TT> persists
 355 (that is, if it's something other than a local variable that's
 356 about to disappear when its function returns,
 357 or a pointer field within a structure which is all about to
 358 disappear),
 359 it would probably be a good idea to set <TT>p</TT> to <TT>NULL</TT>:
 360 <pre>
 361         free(p);
 362         p = NULL;
 363 </pre>
 364 (Of course, setting <TT>p</TT> to <TT>NULL</TT> only accomplishes
 365 something if later uses
 366 of <TT>p</TT> check it before using it.)
 367 </p><p>Finally, let's think about <TT>realloc</TT>.
 368 <TT>realloc</TT>, remember,
 369 attempts to enlarge a chunk of memory which we originally
 370 obtained from <TT>malloc</TT>.
 371 (It lets us change our mind about how much memory we had asked for.)
 372 But <TT>realloc</TT> is not always able to enlarge a chunk of memory in-place;
 373 sometimes it must go elsewhere in memory to find a contiguous
 374 piece of memory big enough to satisfy the enlargement request.
 375 So what
 376 about this code?
 377 <pre>
 378         newp = realloc(oldp, newsize);
 379 </pre>
 380 Is <TT>oldp</TT> valid or invalid after this call?
 381 It depends on whether <TT>realloc</TT> returned the old pointer
 382 value or not
 383 (that is, on whether it was able to enlarge the memory block
 384 in-place or had to go elsewhere).
 385 Most of the time,
 386 you will use <TT>realloc</TT> something like this:
 387 <pre>
 388         newp = realloc(p, newsize);
 389         if(newp != NULL)
 390                 {
 391                 /* success; got newsize */
 392                 p = newp;
 393                 }
 394         else    {
 395                 /* failure; p still points to block of old size */
 396                 }
 397 </pre>
 398 With a setup like this,
 399 <TT>p</TT> remains valid,
 400 and <TT>newp</TT> is a temporary variable which we don't use further
 401 after testing it and perhaps assigning it to <TT>p</TT>.
 402 </p><p>A final issue concerns pointer <dfn>aliases</dfn>.
 403 If several pointers point into the same block of memory,
 404 and if that block of memory moves or disappears,
 405 <em>all</em> the old pointers become invalid.
 406 If you have a sequence of code which amounts to
 407 <pre>
 408         p2 = p;
 409         ...
 410         free(p);
 411         p = NULL;
 412 </pre>
 413 then setting <TT>p</TT> to <TT>NULL</TT> may not have been sufficient,
 414 because <TT>p2</TT> just became invalid, too, and may also need
 415 setting to <TT>NULL</TT>.
 416 The situation is particularly tricky with <TT>realloc</TT>:
 417 suppose that you have a pointer to a chunk of memory:
 418 <pre>
 419         char *p = malloc(10);
 420 </pre>
 421 and another pointer which points within that chunk:
 422 <pre>
 423         char *p2 = p + 5;
 424 </pre>
 425 Now, if you reallocate <TT>p</TT>,
 426 and if <TT>realloc</TT>
 427 has to go elsewhere
 428 and so
 429 returns a different pointer value
 430 which you assign to <TT>p</TT>,
 431 you've also got to fix up <TT>p2</TT>,
 432 because it just had the rug yanked out from under it,
 433 and is now invalid.
 434 To keep <TT>p2</TT> up-to-date,
 435 you might use code like this:
 436 <pre>
 437         int p2offset = p2 - p;
 438         newp = realloc(p, newsize);
 439         if(newp != NULL)
 440                 {
 441                 /* success; got newsize */
 442                 p = newp;
 443                 p2 = p + p2offset;
 444                 }
 445         else    {
 446                 /* failure; p and p2 still point to block of old size */
 447                 }
 448 </pre>
 449 Before calling <TT>realloc</TT>,
 450 we record (in the <TT>int</TT> variable <TT>p2offset</TT>)
 451 how far beyond <TT>p</TT>
 452 the secondary pointer
 453 <TT>p2</TT> used to point,
 454 so that we can generate a corresponding new value of <TT>p2</TT>
 455 if <TT>p</TT> moves.
 456 </p><hr>
 457 <p>
 458 Read sequentially:
 459 <a href="sx6b.html" rev=precedes>prev</a>
 460 <a href="sx8.html" rel=precedes>next</a>
 461 <a href="top.html" rev=subdocument>up</a>
 462 <a href="top.html">top</a>
 463 </p>
 464 <p>
 465 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
 466 // <a href="copyright.html">Copyright</a> 1996-1999
 467 // <a href="mailto:scs@eskimo.com">mail feedback</a>
 468 </p>
 469 </body>
 470 </html>