* remove "\r" nonsense
[mascara-docs.git] / C / the.ansi.c.programming.language / c.programming.notes.int / sx7.html
blob013d0aab14e1a10d1c06db087e13bdc670b04ceb
1 <!DOCTYPE HTML PUBLIC "-//W3O//DTD W3 HTML 2.0//EN">
2 <!-- This collection of hypertext pages is Copyright 1995-7 by Steve Summit. -->
3 <!-- This material may be freely redistributed and used -->
4 <!-- but may not be republished or sold without permission. -->
5 <html>
6 <head>
7 <link rev="owner" href="mailto:scs@eskimo.com">
8 <link rev="made" href="mailto:scs@eskimo.com">
9 <title>Chapter 21: Pointer Allocation Strategies</title>
10 <link href="sx6b.html" rev=precedes>
11 <link href="sx8.html" rel=precedes>
12 <link href="top.html" rev=subdocument>
13 </head>
14 <body>
15 <H1>Chapter 21: Pointer Allocation Strategies</H1>
17 <p>Pointers are viewed by many as the bane of C programming,
18 because out-of-control pointers can do a lot of damage,
19 and can be hard to track down.
20 But real programs tend to make heavy use of pointers.
21 How can we keep pointers under control?
22 </p><p>The big problem with pointers, of course,
23 is that they can point anywhere,
24 including to places they're not supposed to.
25 When a pointer points to the wrong place
26 (perhaps because it was never initialized properly,
27 such that it essentially points to a random place),
28 a fetch of the data it ``points'' to will result in
29 garbage
30 (or may cause the program to crash with a memory access
31 violation),
32 and a <em>write</em> of some new data to the location it
33 ``points'' to will damage some other part of your
34 program, or of some other program, or of the operating system
35 (or may cause the program to crash).
36 Crashes, in fact, though they're frustrating and annoying,
37 may be preferable to the alternatives,
38 namely performing quiet but meaningless computations
39 or damaging other code,
40 both of which can be even more annoying and even harder to track down.
41 </p><p>Our goal, then, is to make sure that our pointers are always
42 <dfn>valid</dfn>,
43 or when they are not,
44 to make sure that
45 we can
46 know that they are not.
47 First, then, let's discuss what we mean by a ``valid pointer.''
48 </p><p>A valid pointer
49 (more precisely, a valid pointer value)
50 is one that does in fact point to an object of the
51 type that the pointer is declared to point to.
52 Furthermore, if the pointer will be used to store new values,
53 the old value must be sitting in writable memory
55 (that is, it must not be a variable that was declared <TT>const</TT>,
56 or a string that results from a string literal).
57 In contrast to valid pointers,
58 we may distinguish among several kinds of invalid pointers:
59 null pointers,
60 uninitialized pointers,
61 pointers to memory that used to exist but has disappeared,
62 pointers to memory that once came from <TT>malloc</TT> but has
63 since been freed.
64 </p><p>The tricky thing about valid and invalid pointers
65 is that there's no simple way in C to ask ``is this pointer
66 valid?'' or ``is this pointer invalid?''.
67 The only questions we can ask about pointers are
68 ``is this pointer equal to this other pointer?'',
69 ``is this pointer unequal to this other pointer?'',
70 and, for pointers into the same array,
71 ``is this pointer greater or less than this other pointer?''.
72 </p><p>Part one of our strategy for managing pointers,
73 then,
74 will be to arrange that all or most invalid pointers are null pointers.
75 Whenever we do anything which would cause a pointer to be invalid,
76 that is,
77 whenever we declare one
78 (such that it would otherwise have a garbage initial value),
79 or whenever we do something that causes the memory which one of
80 our pointers used to point to to disappear,
81 we'll set the pointer to <TT>NULL</TT>.
82 Having done so, we can test whether the pointer is currently
83 valid by checking if it's not equal to the null pointer,
84 or contrariwise,
85 we can test whether it's invalid by checking if it's equal to the
86 null pointer.
87 </p><p>Remember that C doesn't generally do any of this
88 automatically.
89 It does not guarantee that all newly-allocated pointers are
90 initialized to null pointers,
91 and it does not insert automatic validity checks before you try
92 to use a pointer.
93 If you want to be sure that a pointer is initialized to a null
94 pointer,
95 <em>you</em> must generally set it to <TT>NULL</TT>.
96 If you have a pointer which you're thinking of using but which
97 might or might not be valid
98 (and if it's a pointer which you believe you'd have set to
99 <TT>NULL</TT> if it was invalid),
100 <em>you</em> must precede your use of the pointer with a test of
101 the form
102 <pre>
103 if(p != NULL)
104 </pre>
105 </p><p>Furthermore,
106 if you write the test
107 <TT>if(p != NULL)</TT>,
108 it does not in the general case mean
109 ``is <TT>p</TT> valid?''.
110 The test
111 <TT>if(p != NULL)
112 </TT>can only be used to mean
113 ``is <TT>p</TT> valid?''
114 <em>if</em> you have taken care
115 to make sure that
116 all non-valid pointers
117 have been set to null.
118 </p><p>(There is one condition under which C does guarantee that a
119 pointer variable will be initialized to a null pointer,
120 and that is when the pointer variable is a global variable or a
121 member of a global structure,
122 or more precisely,
123 when it is part of a variable, array, or structure which has
124 static duration.)
125 </p><p>Remember, too,
126 that the shorthand form
127 <pre>
128 if(p)
129 </pre>
130 is precisely equivalent to
131 <TT>if(p != NULL)</TT>.
132 So you may be able to read
133 <TT>if(p)
134 </TT>as
135 ``if <TT>p</TT> is valid'',
136 but again,
137 only if
138 you've ensured that
139 whenever <TT>p</TT> is not valid,
140 it is set to null.
141 </p><p>The degree of care with which you have to implement a pointer
142 management strategy may be different for
143 different pointer variables
144 you use.
145 If a pointer variable is immediately set to a valid pointer
146 value,
147 and if nothing ever happens which could make it become invalid,
148 then there's no need to check it before each time you use it.
149 Similarly, if a pointer is set to point to different locations
150 from time to time,
151 but it can be shown that it will always be valid,
152 there's again no reason to test it all the time.
153 However, if a particular pointer is valid some of the time and
154 invalid
155 other
157 of the time,
158 or in particular,
159 if it records some optional data which might or might not be
160 present,
161 then you'll want to be very careful to set the pointer to
162 <TT>NULL</TT> whenever it's not valid
163 (or whenever the optional data is not present),
164 and to test the pointer before using it (that is, before fetching
165 or writing to the location that it points to).
166 </p><p>Everything we've just said about ``pointer variables''
167 is equally true,
168 and perhaps more important,
169 for pointer fields within structures.
170 When you define a structure, you will typically be allocating
171 many instances of that structure,
172 so you will have many instances of that pointer.
173 You will typically have central pieces of code which operate on
174 instances of that structure,
175 meaning that each time the
176 piece of
177 code runs,
178 it may be operating on a different instance of the structure,
179 so if the pointer field is one that isn't always valid
180 (that is,
181 isn't valid in all instances of the structure),
182 the code had better test it before using it.
183 Similarly, the code had better set the pointer field to
184 <TT>NULL</TT> if it ever invalidates it.
185 </p><p>For example, one of the first features we added to the adventure game
186 was a long description for objects and rooms.
187 But the long description is optional;
188 not all objects and rooms have one.
189 Suppose
190 we chose to use a <TT>char *</TT>
191 within <TT>struct object</TT> and <TT>struct room</TT>
192 to point at a dynamically-allocated string containing the long description.
193 (This choice would be preferable to a fixed-size array of <TT>char</TT>
194 because it may be the case that some long descriptions will be elaborately long,
195 and we'd neither want to limit the potential length of descriptions
196 by having a too-small array
197 nor waste space for objects with short or empty descriptions
198 by always using a too-large array.)
199 For each
200 instance of an
201 object or room structure,
202 we'd initialize
203 the description field to contain a null pointer.
204 For each
205 room or object
206 with a long description,
207 we'd set the description field to contain a pointer to the appropriate
208 (and appropriately-allocated)
209 string.
210 Finally, when it came time to print the descrition,
211 we'd use code like
212 <pre>
213 if(objp-&gt;desc != NULL)
214 printf("%s\n", objp-&gt;desc);
215 else printf("You see nothing special about the %s.\n", objp-&gt;name);
216 </pre>
217 </p><p>Particular care is needed when pointers point to
218 dynamically-allocated memory,
219 managed with the standard library functions <TT>malloc</TT>,
220 <TT>free</TT>,
221 and <TT>realloc</TT>.
222 Somehow,
223 it's easier to make mistakes here,
224 and their consequences tend to be more damaging and harder to
225 track down.
226 </p><p>First of all,
227 of course,
228 you must always ensure that the allocation functions
229 <TT>malloc</TT> and <TT>realloc</TT> succeed.
230 These functions return null pointers
231 when they are unable to allocate the requested memory,
232 so you must <em>always</em> check the return value
233 to see that it is not a null pointer,
234 before using it.
235 (If the return value is a null pointer,
236 you will generally print some kind of error message
237 and abort
238 at least
239 the particular function that needed the allocated memory,
240 or perhaps abort the entire program.)
241 </p><p>Don't get in the habit of assuming that
242 a single, simple call to <TT>malloc</TT> will ``always'' succeed.
243 Don't make excuses like
244 ``this program doesn't use much memory to begin with,
245 and I'm only allocating 10 bytes here,
246 so how can it possibly fail?''
247 For one thing, there are more reasons for <TT>malloc</TT>
248 to fail--and return a null pointer--than
249 that there was no more memory.
250 Typically, <TT>malloc</TT> will also return a null pointer
251 if it is able to detect that you have misused
252 some of the memory that you have previously allocated,
253 perhaps by writing to more of it than you asked for.
254 In this case, <TT>malloc</TT> is trying to tell you something,
255 something you need to know,
256 and although its voice is small
257 (and although tracking down the problem that it's complaining about
258 may be difficult),
259 you will only have more problems,
260 and more difficult to track down,
261 if <TT>malloc</TT> returns a null pointer
262 but you then use that pointer as if it were valid.
263 (As an example of how it can be alarmingly easy
264 to misuse the memory that <TT>malloc</TT> gives you,
265 consider this hypothetical scrap of code
266 for making a dynamically-allocated copy of a string:
267 <pre>
268 char *copystring = malloc(strlen(originalstring)) /* Beware... */
269 if(copystring != NULL)
270 strcpy(copystring, originalstring);
271 </pre>
272 Hint:
273 what about the <TT>\0</TT> that terminates the string?)
274 </p><p>In a program that allocates a lot of different pieces of memory
275 for a lot of different things,
277 it can be a real nuisance to have to check
278 each pointer returned
279 from each call to <TT>malloc</TT>
280 to make sure it's not null.
281 One popular shortcut is to define
282 a ``wrapper'' function around malloc,
283 which calls <TT>malloc</TT>
284 and checks the return value
285 in one central place.
286 For example,
287 the adventure game uses the function
288 <pre>
289 #include &lt;stdio.h&gt;
290 #include &lt;stdlib.h&gt;
291 #include "chkmalloc.h"
293 void *
294 chkmalloc(size_t sz)
296 void *ret = malloc(sz);
297 if(ret == NULL)
299 fprintf(stderr, "Out of memory\n");
300 exit(EXIT_FAILURE);
302 return ret;
304 </pre>
305 One way to think about <TT>chkmalloc</TT>
306 is that it centralizes the test on
307 <TT>malloc</TT>'s return value.
308 Another way of thinking about it
309 is that it is a special, alternate version of <TT>malloc</TT>
310 that never returns <TT>NULL</TT>.
311 (The fact that it never returns <TT>NULL</TT>
312 does not mean that it never fails,
313 but just that if/when it does fail,
314 it signifies this
315 by calling <TT>exit</TT>
316 instead of
317 returning <TT>NULL</TT>.)
318 Aborting the entire program
319 when a call to <TT>malloc</TT> fails
320 may seem draconian,
321 and there are programs
322 (e.g. text editors)
323 for which it would be a completely unacceptable strategy,
324 but it's fine for our purposes,
325 especially if it doesn't happen very often.
326 (In any case, aborting the program cleanly
327 with a message like ``Out of memory''
328 is still vastly preferable to crashing horribly and mysteriously,
329 which is what programs that don't check
330 <TT>malloc</TT>'s return value
331 eventually
332 do.)
333 </p><p>Another
334 area of concern
335 is that
336 when you're calling <TT>free</TT> and <TT>realloc</TT>,
337 there are more ways for pointers to become invalid.
338 For example,
339 consider the code
340 <pre>
341 /* p is known to have come from malloc() */
342 free(p);
343 </pre>
344 After calling <TT>free</TT>, is <TT>p</TT> valid or invalid?
345 C uses pass-by-value, so <TT>p</TT>'s value hasn't changed.
346 (The <TT>free</TT> function couldn't change it if it tried.)
347 But <TT>p</TT> is most definitely now <em>invalid</em>;
348 it no longer points to memory which the program can use.
349 However, it does still point just where it used to,
350 so if the program accidentally uses it,
351 there will still seem to be data there,
352 except that the data will be sitting in memory which may now
353 have been allocated to ``someone else''!
354 Therefore, if the variable <TT>p</TT> persists
355 (that is, if it's something other than a local variable that's
356 about to disappear when its function returns,
357 or a pointer field within a structure which is all about to
358 disappear),
359 it would probably be a good idea to set <TT>p</TT> to <TT>NULL</TT>:
360 <pre>
361 free(p);
362 p = NULL;
363 </pre>
364 (Of course, setting <TT>p</TT> to <TT>NULL</TT> only accomplishes
365 something if later uses
366 of <TT>p</TT> check it before using it.)
367 </p><p>Finally, let's think about <TT>realloc</TT>.
368 <TT>realloc</TT>, remember,
369 attempts to enlarge a chunk of memory which we originally
370 obtained from <TT>malloc</TT>.
371 (It lets us change our mind about how much memory we had asked for.)
372 But <TT>realloc</TT> is not always able to enlarge a chunk of memory in-place;
373 sometimes it must go elsewhere in memory to find a contiguous
374 piece of memory big enough to satisfy the enlargement request.
375 So what
376 about this code?
377 <pre>
378 newp = realloc(oldp, newsize);
379 </pre>
380 Is <TT>oldp</TT> valid or invalid after this call?
381 It depends on whether <TT>realloc</TT> returned the old pointer
382 value or not
383 (that is, on whether it was able to enlarge the memory block
384 in-place or had to go elsewhere).
385 Most of the time,
386 you will use <TT>realloc</TT> something like this:
387 <pre>
388 newp = realloc(p, newsize);
389 if(newp != NULL)
391 /* success; got newsize */
392 p = newp;
394 else {
395 /* failure; p still points to block of old size */
397 </pre>
398 With a setup like this,
399 <TT>p</TT> remains valid,
400 and <TT>newp</TT> is a temporary variable which we don't use further
401 after testing it and perhaps assigning it to <TT>p</TT>.
402 </p><p>A final issue concerns pointer <dfn>aliases</dfn>.
403 If several pointers point into the same block of memory,
404 and if that block of memory moves or disappears,
405 <em>all</em> the old pointers become invalid.
406 If you have a sequence of code which amounts to
407 <pre>
408 p2 = p;
410 free(p);
411 p = NULL;
412 </pre>
413 then setting <TT>p</TT> to <TT>NULL</TT> may not have been sufficient,
414 because <TT>p2</TT> just became invalid, too, and may also need
415 setting to <TT>NULL</TT>.
416 The situation is particularly tricky with <TT>realloc</TT>:
417 suppose that you have a pointer to a chunk of memory:
418 <pre>
419 char *p = malloc(10);
420 </pre>
421 and another pointer which points within that chunk:
422 <pre>
423 char *p2 = p + 5;
424 </pre>
425 Now, if you reallocate <TT>p</TT>,
426 and if <TT>realloc</TT>
427 has to go elsewhere
428 and so
429 returns a different pointer value
430 which you assign to <TT>p</TT>,
431 you've also got to fix up <TT>p2</TT>,
432 because it just had the rug yanked out from under it,
433 and is now invalid.
434 To keep <TT>p2</TT> up-to-date,
435 you might use code like this:
436 <pre>
437 int p2offset = p2 - p;
438 newp = realloc(p, newsize);
439 if(newp != NULL)
441 /* success; got newsize */
442 p = newp;
443 p2 = p + p2offset;
445 else {
446 /* failure; p and p2 still point to block of old size */
448 </pre>
449 Before calling <TT>realloc</TT>,
450 we record (in the <TT>int</TT> variable <TT>p2offset</TT>)
451 how far beyond <TT>p</TT>
452 the secondary pointer
453 <TT>p2</TT> used to point,
454 so that we can generate a corresponding new value of <TT>p2</TT>
455 if <TT>p</TT> moves.
456 </p><hr>
458 Read sequentially:
459 <a href="sx6b.html" rev=precedes>prev</a>
460 <a href="sx8.html" rel=precedes>next</a>
461 <a href="top.html" rev=subdocument>up</a>
462 <a href="top.html">top</a>
463 </p>
465 This page by <a href="http://www.eskimo.com/~scs/">Steve Summit</a>
466 // <a href="copyright.html">Copyright</a> 1996-1999
467 // <a href="mailto:scs@eskimo.com">mail feedback</a>
468 </p>
469 </body>
470 </html>