1 <!DOCTYPE HTML PUBLIC
"-//W3O//DTD W3 HTML 2.0//EN">
2 <!-- This collection of hypertext pages is Copyright 1995, 1996 by Steve Summit. -->
3 <!-- This material may be freely redistributed and used -->
4 <!-- but may not be republished or sold without permission. -->
7 <link rev=
"owner" href=
"mailto:scs@eskimo.com">
8 <link rev=
"made" href=
"mailto:scs@eskimo.com">
9 <title>section
5.3: Pointers and Arrays
</title>
10 <link href=
"sx8b.html" rev=precedes
>
11 <link href=
"sx8d.html" rel=precedes
>
12 <link href=
"sx8.html" rev=subdocument
>
15 <H2>section
5.3: Pointers and Arrays
</H2>
19 section
5.3 is evidently the hardest section in this book,
20 or even if they haven't read this book,
21 the most confusing aspect of the language.
22 C introduces a novel and,
24 elegant integration of pointers and arrays,
25 but there are a distressing number of ways of misunderstanding arrays,
27 Take this section very slowly,
28 learn the things it does say,
29 and
<em>don't
</em> learn anything it doesn't say
30 (i.e. don't make any false assumptions).
31 </p><p>It's not necessarily true that
32 ``the pointer version will in general be faster'';
35 a secondary concern when considering the use of pointers.
37 </p><p>On the top half of this page,
38 we aren't seeing anything we haven't seen before.
40 (or should have known)
43 declares an array of ten contiguous
<TT>int
</TT>'s
45 We saw on page
94 and again on page
96
46 that
<TT>&</TT> can be used to take the address of one cell of an array.
47 </p><p>What's new on this page are first the nice pictures
48 (and they
<em>are
</em> nice pictures;
49 I think they're the right way of thinking about arrays and pointers in C)
50 and the definition of pointer arithmetic.
52 ``then by definition
<TT>pa+
1</TT> points to the next element''
54 if you hadn't known that
<TT>pa+
1</TT> points to the next element;
58 and you aren't expected even to have suspected it:
59 the reason that
<TT>pa+
1</TT> points to the next element
60 is simply that it's defined that way,
63 subtraction works in an exactly analogous way:
66 </pre>then
<TT>*(pa-
1)
</TT> would refer to the contents of
<TT>a[
4]
</TT>,
67 and
<TT>*(pa-i)
</TT> would refer to the contents of the location
68 <TT>i
</TT> elements before cell
5
69 (as long as
<TT>i
</TT> <=
5).
70 </p><p>Note furthermore that we do
<em>not
</em> have to worry
71 about the size of the objects pointed to.
74 always means to move over one object of the type pointed to,
75 to get to the next element.
76 (If you're too worried about machine addresses,
77 or the actual address values stored in pointers,
78 or the actual sizes of things,
79 it's easy to mistakenly assume that adding or subtracting
1
80 adds or subtracts
1 from the machine address,
82 you don't have to think at this low level.
83 We'll see in section
5.4 how pointer arithmetic is actually
86 by the size of the object pointed to,
87 but we don't have to worry about it if we don't want to.)
89 <blockquote>The meaning of ``adding
1 to a pointer,''
91 all pointer arithmetic,
92 is that
<TT>pa+
1</TT> points to the next object,
93 and
<TT>pa+i
</TT> points to the
<TT>i
</TT>-th object beyond
<TT>pa
</TT>.
94 </blockquote></p><p>This aspect of pointers--that arithmetic works on them,
95 and in this way--is one of
96 several vital facts about pointers in C.
97 On the next page, we'll see the others.
99 </p><p>Deep sentences:
100 <blockquote>The correspondence between indexing and pointer arithmetic is very close.
102 the value of a variable or expression of type array
103 is the address of element zero of the array.
104 </blockquote>This is a fundamental definition,
105 which we'll now spend several pages discussing.
106 </p><p>Don't worry too much yet about the assertion that
107 ``
<TT>pa
</TT> and
<TT>a
</TT> have identical values.''
108 We're not surprised about the value of
<TT>pa
</TT> after the assignment
109 <TT>pa =
&a[
0];
</TT>
110 we've been taking the address of array elements for several pages now.
113 not yet in a position to be surprised about it or
115 what the ``value'' of the array
<TT>a
</TT> is.
116 What
<em>is
</em> the value of the array
<TT>a
</TT>?
117 </p><p>In some languages,
118 the value of an array is the entire array.
119 If an array appears on the right-hand sign of an assignment,
120 the entire array is assigned,
121 and the left-hand side had better be an array, too.
122 C does not work this way;
123 C never lets you manipulate entire arrays.
126 the value of an array,
127 when it appears in an expression,
128 is a pointer to its first element.
130 the value of the array
<TT>a
</TT> simply
<em>is
</em> <TT>&a[
0]
</TT>.
131 If this statement makes any kind of intuitive sense to you at
135 please just take it on faith for a while.
136 This statement is a fundamental
137 (in fact
<em>the
</em> fundamental)
138 definition about arrays and pointers in C,
139 and if you don't remember it,
142 pointers and arrays will never make proper sense.
143 (You will also need to know another bit of jargon:
148 it
<dfn>decays
</dfn> into a pointer to its first element.)
149 </p><p>Given the above definition,
150 let's explore some of the consequences.
152 though we've been saying
153 <pre> pa =
&a[
0];
154 </pre>we could also say
156 </pre>because by definition the value of
<TT>a
</TT> in an expression
157 (i.e. as it sits there all alone on the right-hand side)
158 is
<TT>&a[
0]
</TT>.
160 anywhere we've been using square brackets
<TT>[]
</TT> to subscript an array,
161 we could also have used the pointer dereferencing operator
<TT>*
</TT>.
162 That is, instead of writing
168 </pre>Why would this possibly work?
169 How could this possibly work?
170 Let's look at the expression
<TT>*(a+
5)
</TT> step by step.
171 It contains a reference to the array
<TT>a
</TT>,
172 which is by definition a pointer to its first element.
173 So
<TT>*(a+
5)
</TT> is equivalent to
<TT>*(
&a[
0]+
5)
</TT>.
174 To make things clear,
175 let's pretend that we'd assigned the pointer to the first element
176 to an actual pointer variable:
177 <pre> int *pa =
&a[
0];
180 <TT>*(a+
5)
</TT> is equivalent to
181 <TT>*(
&a[
0]+
5)
</TT> is equivalent to
<TT>*(pa+
5)
</TT>.
182 But we learned on page
98 that
<TT>*(pa+
5)
</TT> is simply the
183 contents of the location
5 cells past where
<TT>pa
</TT> points to.
184 Since
<TT>pa
</TT> points to
<TT>a[
0]
</TT>,
185 <TT>*(pa+
5)
</TT> is
<TT>a[
5]
</TT>.
187 for whatever it's worth,
188 any time you have an array subscript
<TT>a[i]
</TT>,
189 you could write it as
<TT>*(a+i)
</TT>.
190 </p><p>The idea of the previous paragraph isn't worth much,
191 because if you've got an array
<TT>a
</TT>,
192 indexing it using the notation
<TT>a[i]
</TT>
193 is considerably more natural and convenient than the alternate
<TT>*(a+i)
</TT>.
194 The significant fact is that this little correspondence between
195 the expressions
<TT>a[i]
</TT> and
<TT>*(a+i)
</TT>
196 holds for more than just arrays.
197 If
<TT>pa
</TT> is a pointer,
198 we can get at locations near it by using
<TT>*(pa+i)
</TT>,
199 as we learned on page
98,
200 but we can
<em>also
</em> use
<TT>pa[i]
</TT>.
201 This time, using the ``other'' notation
202 (array instead of pointer,
203 when we thought we had a pointer)
204 can be more convenient.
205 </p><p>At this point,
206 you may be asking
<em>why
</em> you can write
<TT>pa[i]
</TT>
207 instead of
<TT>*(pa+i)
</TT>.
208 You may be wondering how you're going to remember that you can do this,
209 or remember what it means if you see it in someone else's code,
210 when it's such a surprising fact in the first place.
211 There are several ways to remember it;
212 pick whichever one suits you:
213 <OL><li>It's an arbitrary fact,
217 for an array
<TT>a
</TT>,
218 instead of writing
<TT>a[i]
</TT>,
219 you can also write
<TT>*(a+i)
</TT>
220 (as we proved a few paragraphs back);
221 then it's only fair that for a pointer
<TT>pa
</TT>,
222 instead of writing
<TT>*(pa+i)
</TT>,
223 you can also write
<TT>pa[i]
</TT>.
225 ``In evaluating
<TT>a[i]
</TT>,
226 C converts it to
<TT>*(a+i)
</TT> immediately;
227 the two forms are equivalent.''
229 a contiguous block of elements of a particular type.
230 A pointer often points to
231 a contiguous block of elements of a particular type.
232 Therefore, it's very handy to treat
233 a pointer to a contiguous block of elements
236 by saying things like
<TT>pa[i]
</TT>.
238 [This is the most radical explanation,
239 though it's also the most true;
240 but if it offends your sensibilities
241 or only seems to make things more confusing,
243 When you said
<TT>a[i]
</TT>,
244 you weren't really subscripting an array at all,
245 because an array like
<TT>a
</TT> in an expression always turns
246 into a pointer to its first element.
247 So the array subscripting operator
<TT>[]
</TT> <em>always
</em>
248 finds itself working on pointers,
249 and it's a simple identity
251 that
<TT>pa[i]
</TT> is
<TT>*(pa+i)
</TT>.
252 </OL>(But do pick at least one reason to remember this fact,
253 as it's a fact you'll need to remember;
254 expressions like
<TT>pa[i]
</TT> are quite common.)
255 </p><p>The authors point out that
256 ``There is one difference between an array name and a pointer
257 that must be kept in mind,''
258 and this is quite true,
259 but note very carefully that there is
261 <em>every
</em> difference
262 between an array and a pointer.
263 When an array name appears in most expressions,
264 it turns into a pointer
265 (to the array's first element),
266 but that does
<em>not
</em> mean that the array
<em>is
</em> a pointer.
267 You may hear it stated that ``an array is just a constant pointer,''
268 and this is a convenient explanation,
269 but it is a simplified and potentially misleading explanation.
270 </p><p>With that said,
271 do make sure you understand why
<TT>a=pa
</TT> and
<TT>a++
</TT>
272 (where
<TT>a
</TT> is an array)
273 cannot mean anything.
274 </p><p>Deep sentence:
275 <blockquote>When an array name is passed to a function,
276 what is passed is the location of the initial element.
277 </blockquote>Though perhaps surprising,
278 this sentence doesn't say anything new.
280 and more importantly,
281 each of its arguments,
283 and in an expression,
284 a reference to an array is always replaced by a pointer to its first element.
288 </pre>it is not the entire array
<TT>a
</TT> that is passed to
<TT>f
</TT>
289 but rather just a pointer to its first element.
290 For an example closer to the text on page
99,
292 <pre> char string[] =
"Hello, world!";
293 int len = strlen(string);
294 </pre>it is not the entire array
<TT>string
</TT> that is passed to
<TT>strlen
</TT>
295 (recall that C never lets you do anything with
297 an array all at once),
298 but rather just a pointer to its first element.
299 </p><p>We now realize that we've been operating under a gentle fiction
300 during the first four chapters of the book.
301 Whenever we wrote a function like
<TT>getline
</TT> or
<TT>getop
</TT>
302 which seemed to accept an array of characters,
303 and whenever we thought we were passing arrays of characters to these routines,
304 we were actually passing pointers.
307 how
<TT>getline
</TT> and
<TT>getop
</TT> were able to modify
308 the arrays in the caller,
309 even though we said that call-by-value meant that functions
310 can't modify variables in their callers since they receive
311 copies of the parameters.
312 When a function receives a pointer,
313 it cannot modify the original pointer in the caller,
315 but it can definitely modify what the pointer points
<em>to
</em>.
316 </p><p>If that doesn't make sense,
317 make sure you appreciate the full difference
318 between a pointer and what it points to!
319 It is intirely possible to modify one without modifying the other.
320 Let's illustrate this with an example.
322 <pre> char a[] =
"hello";
324 </pre>we've declared two character arrays,
325 <TT>a
</TT> and
<TT>b
</TT>,
326 each containing a string.
329 </pre>we've declared
<TT>p
</TT> as a pointer-to-
<TT>char
</TT>,
330 and initialized it to point to the first character of the array
<TT>a
</TT>.
333 </pre>we've modified what
<TT>p
</TT> points to.
334 We have not modified
<TT>p
</TT> itself.
335 After saying
<TT>*p = 'H';
</TT>
336 the string in the array
<TT>a
</TT>
337 has been modified to contain
<TT>"Hello"</TT>.
340 </pre>on the other hand,
341 we have modified the pointer
<TT>p
</TT> itself.
342 We have not really modified what
<TT>p
</TT> points to.
344 ``what
<TT>p
</TT> points to''
346 used to be the string in the array
<TT>a
</TT>,
347 and now it's the string in the array
<TT>b
</TT>.
348 But saying
<TT>p = b
</TT> didn't modify either of the strings.
352 functions never receive arrays as parameters,
353 but instead always receive pointers,
354 how have we been able to get away with defining functions
355 (like
<TT>getline
</TT> and
<TT>getop
</TT>)
356 which seemed to accept arrays?
357 The answer is that whenever you declare an array parameter to a function,
358 the compiler pretends that you actually declared a pointer.
359 (It does this mostly so that
360 we can get away with the ``gentle fiction''
361 of pretending that we can pass arrays to functions.)
362 </p><p>When you see a statement like
363 ``
<TT>char s[];
</TT> and
<TT>char *s
</TT>; are equivalent''
364 (as in fact you see at the top of page
100),
366 (and you must remember that)
367 it is
<em>only
</em> function formal parameters that are being
370 arrays and pointers are quite different,
372 </p><p>Expressions like
<TT>p[-
1]
</TT>
373 (at the end of section
5.3)
374 may be easier to understand
375 if we convert them back to the pointer form
<TT>*(p + -
1)
</TT>
376 and thence to
<TT>*(p-
1)
</TT>
379 is the object one before what
<TT>p
</TT> points to.
380 </p><p>With the examples in this section,
381 we begin to see how pointer manipulations can go awry.
382 In sections
5.1 and
5.2,
383 most of our pointers were to simple variables.
384 When we use pointers into arrays,
385 and when we begin using pointer arithmetic
386 to access nearby cells of the array,
387 we must be careful never to go off the end of the array,
389 A pointer is only valid if it points to one of the allocated
391 (There is also an exception for a pointer just past the end of an array,
392 which we'll talk about later.)
393 Given the declarations
409 These statements set the pointer
<TT>pa
</TT>
410 pointing to various cells of the array
<TT>a
</TT>,
411 and modify some of those cells
412 by
<dfn>indirecting on
</dfn> the pointer
<TT>pa
</TT>.
414 verify that each cell of
<TT>a
</TT> that receives a value
415 receives the value of its own index.
416 For example,
<TT>a[
6]
</TT> is set to
6.)
420 *(pa+
10) =
0; /* WRONG */
421 *(pa-
1) =
0; /* WRONG */
423 *(pa+
10) =
0; /* WRONG */
429 pa2 = pa +
10; /* WRONG */
430 pa2 = pa -
10; /* WRONG */
431 </pre>are all invalid.
432 The first examples set
<TT>pa
</TT> to point into the array
<TT>a
</TT>
433 but then use overly-large offsets
435 which end up trying to store a value outside of the array
<TT>a
</TT>.
436 The statements in the last set of examples
437 set
<TT>pa2
</TT> to point outside of the array
<TT>a
</TT>.
438 Even though no attempt is made to access the nonexistent cells,
439 these statements are illegal, too.
445 pa2 = pa +
10; /* WRONG */
446 *pa2 =
0; /* WRONG */
447 </pre>would be very wrong,
448 because it not only computes a pointer to the nonexistent
449 15<tt><sup
></tt>th
<tt></sup
></tt> cell of a
10-element array,
450 but it also tries to store something there.
454 <a href=
"sx8b.html" rev=precedes
>prev
</a>
455 <a href=
"sx8d.html" rel=precedes
>next
</a>
456 <a href=
"sx8.html" rev=subdocument
>up
</a>
457 <a href=
"top.html">top
</a>
460 This page by
<a href=
"http://www.eskimo.com/~scs/">Steve Summit
</a>
461 //
<a href=
"copyright.html">Copyright
</a> 1995,
1996
462 //
<a href=
"mailto:scs@eskimo.com">mail feedback
</a>