css: set PRE’s max-width so it doesn’t stretch the viewport
[mina86.com.git] / posts / 0-is-ambiguous.en.html
blob2dcb0d3dffa0eca33a8a1f325d4f7dafc7613c4c
1 <!-- subject: 0 is ambiguous -->
2 <!-- date: 2010-10-24 12:28:06 -->
3 <!-- tags: c, c++, null, nullptr, null pointer -->
4 <!-- categories: Articles, Techblog -->
6 <p>It has been a long time since my last entry, so inspired by
7 <a href=https://blogs.fsfe.org/adridg/?p=1014>Adriaan de Groot’s entry</a>,
8 I decided to write something about <code>0</code>,
9 <code>NULL</code> and upcoming <code>nullptr</code>.
11 <p>I will try to be informative and explain what the whole buzz is about and
12 then give my opinion about <code>nullptr</code>. Let us first inspect how
13 <a href=https://en.wikipedia.org/wiki/Pointer_%28computer_programming%29#Null_pointer>null
14 pointer</a> can be donated in C and C++.
16 <!-- FULL -->
18 <h2>The confusing <code>0</code></h2>
20 <p>In C and C++, the literal <code>0</code> has two meanings: it’s either an
21 octal (yes, octal, here’s a fun fact of the day for you ;) ) literal
22 representing number zero or a null pointer. Which meaning is used depends on
23 context, which compiler can usually figure out. For instance:
25 <pre>
26 <b>void</b> takes_number(<b>int</b>);
27 <b>void</b> takes_pointer(<b>long</b> *);
29 <b>int</b> main(<b>void</b>) {
30 <b>char</b> ch = 0; /* number zero */
31 <b>char</b> *ptr = 0; /* null pointer */
32 takes_number(0); /* number zero (argument is an int) */
33 takes_pointer(0); /* null pointer (argument is a pointer) */
34 <b>return</b> 0; /* number zero (main returns int) */
35 }</pre>
37 <p>However, if function lacks a prototype or has variable length of arguments,
38 available information may be insufficient to figure out what programmer meant.
39 A good example is <code>printf</code> function from standard library:
41 <pre>
42 #include &lt;stdio.h&gt;
44 <b>int</b> main(<b>void</b>) {
45 printf("%p\n", 0);
46 <b>return</b> 0;
47 }</pre>
49 <p>In such situations, the first meaning prevails (i.e. a number), which in turn
50 makes the above into an undefined behaviour (an <code>int</code> is passed where
51 a pointer is expected).
53 <p>Based on that, two things to keep in mind are to <em>always</em> provide
54 function prototypes and prefer explicit <code>(void *)0</code> to mean a null
55 pointer when calling variadic functions.
57 <h2>The confusing <code>NULL</code></h2>
59 <p>To disambiguate <em>intended</em> context, <code>NULL</code> macro can be
60 used. Standard requires that it is defined such that it can be used in pointer
61 context to mean a null pointer. In C the macro is often defined as:
63 <pre>#define NULL ((void *)0)</pre>
65 <p>This fulfils the aforementioned requirement and in addition guarantees
66 a warning when it is used as a number, i.e.:
68 <pre>
69 $ <i>cat test.c</i>
70 #include &lt;stddef.h&gt;
72 <b>int</b> main(<b>void</b>) {
73 <b>return</b> NULL;
75 $ <i>gcc -ansi -pedantic test.c</i>
76 test.c: In function ‘main’:
77 test.c:4: warning: return makes integer from pointer without a cast
78 $</pre>
80 <p>And all was good until C++ came along with its stricter typing rules. Since
81 in C++ implicit conversions from <code>void*</code> to other pointer type is
82 illegal making aforementioned definition invalid:
84 <pre>
85 #define OLD_NULL ((<b>void</b> *)0)
87 <b>int</b> main(<b>void</b>) {
88 <b>char</b> *ptr = OLD_NULL; /* compile time error */
89 <b>return</b> 0;
90 }</pre>
92 <p>The easiest solution is to define <code>NULL</code> as a plain <code>0</code>
93 (or <code>0L</code>). GCC is a bit smarter and uses <code>__null</code>
94 extension but confusingly even that is treated like <code>0</code> in some
95 contexts:
97 <pre>
98 $ <i>cat test.c</i>
99 #include &lt;stddef.h&gt;
101 <b>int</b> main(<b>void</b>) {
102 <b>int</b> ret = NULL; /* no complains */
103 ret += <b>__null</b>;
104 ret += NULL;
105 <b>return</b> ret;
107 $ <i>g++ -ansi -pedantic test.c</i>
108 test.c: In function ‘int main()’:
109 test.c:5: warning: NULL used in arithmetic
110 test.c:6: warning: NULL used in arithmetic
111 $</pre>
113 <p>Whenever you use <code>NULL</code>, you have to keep in mind that you never
114 know what it really is. This in particular means, that the following code may
115 or may not be valid:
117 <pre>
118 #include &lt;stdio.h&gt;
120 <b>int</b> main(<b>void</b>) {
121 printf("%p\n", NULL); /* ((void *)0)? 0? 0L? __null? */
122 <b>return</b> 0;
123 }</pre>
126 <h2>Function overloading</h2>
128 <p>Fortunately (at least in the context of null pointers), variadic functions
129 aren’t that common. Function overloading is what poses more problem since even
130 with full prototypes, it’s not always possible to determine arguments types
131 by function name and its arity alone. For example:
133 <pre>
134 <b>void</b> print(<b>int</b> num);
135 <b>void</b> print(<b>long</b> *ptr);
137 <b>int</b> main(<b>void</b>) {
138 print(0); /* first function */
139 print((long *)0); /* second function */
140 print(NULL); /* ??? */
141 <b>return</b> 0;
142 }</pre>
144 <p>The lesson here is that (especially in C++) <code>NULL</code> macro is
145 ambiguous as well and when dealing with overloaded functions an explicit cast
146 might be necessary.
149 <h2>So what about <code>nullptr</code>?</h2>
151 <p>To help address those issues, C++11 introduced <code>nullptr</code> keyword.
152 It evaluates to a <code>std::nullptr_t</code> object which can be implicitly
153 converted to any null pointer (but not to a number).
155 <p>Unfortunately, one problem remains. If multiple pointer types are acceptable
156 in given context, the compiler cannot determine which one to use. This is most
157 easy to see with an overloaded function taking different types of pointers as in
158 the following example:
160 <pre>
161 <b>void</b> print(<b>char</b> *);
162 <b>void</b> print(wchar_t *);
164 <b>int</b> main(<b>void</b>) {
165 print(<b>nullptr</b>);
166 }</pre>
168 <p>But at least any ambiguity results in failure to build rather than the
169 compiler silently choosing one of the options (which may or may not be what we
170 want).
172 <p>My other criticism is that <code>NULL</code> was (and still is) a perfectly
173 fine identifier which served us for years yet the committee decided to throw it
174 away like yesterday’s jam and instead pollute keyword name-space and people’s
175 minds with yet another name that means ‘a null pointer’. The standard could
176 define <code>_Null</code> keyword and mandate that <code>NULL</code> expands
177 to <code>_Null</code> which would be much more straightforward.
179 <p>I used to be a wee bit critical of this new keyword. I’m still not in love
180 with it, but I also recognise I’m being in minority (perhaps even minority of
181 one person). As such, <code>nullptr</code> is the best we’re gonna get going
182 forward, though spelling out the pointer type explicitly is also a perfectly
183 valid solution and don’t let others tell you otherwise. ;)
185 <h2>Null pointer representation</h2>
187 <p>The last thing I want to talk about is a null pointer’s representation.
188 There are two misconceptions that came from the fact that <code>0</code> is used
189 to mean a null pointer. The first one is that a null pointer is in fact
190 represented by a ‘zero value’ (i.e. all bits clear). The second one is that
191 when implementation uses a different representation assigning <code>0</code> to
192 a pointer does not yield a null pointer.
194 <p>Both of those are incorrect. In various implementations zero is a perfectly
195 cromulent address and <a href="http://c-faq.com/null/machexamp.html">other
196 representations for a null pointer</a> (e.g. all bits set) may make more sense.
197 Regardless, <code>0</code> always means a null pointer in pointer context and
198 it’s compiler’s responsibility to translate it into the correct representation.
201 <h2>Summary</h2>
203 <p>To sum things up, here are all the tips that you should keep in
204 mind while programming in C or C++:
206 <ol>
207 <li>Always define function prototypes.
208 <li>In C++ use <code>nullptr</code> or <code>(<var>T</var>*)0</code> to mean
209 a null pointer.
210 <li>In C, if you’re using <code>NULL</code> beware of variadic functions and
211 false sense of security the macro might give you.
212 </ol>
214 <p>I used to recommend ignoring <code>nullptr</code> keyword but nowadays people
215 will look at you funny if you try using <code>NULL</code>.