1 <!-- subject: Will the real <code>ARG_MAX</code> please stand up? Part 1 -->
2 <!-- date: 2021-03-14 23:42:44 -->
3 <!-- tags: arg_max, unix, linux -->
4 <!-- categories: Articles, Techblog -->
6 <p>arg max is a set of values from function’s domain at which said function
7 reaches its maxima. That’s certainly
<em>an
</em> arg max but not the one
8 we’re after. No, this article is regarding the
<code>ARG_MAX
</code> that
9 limits the length of arguments to an executable. Or in other words, why you
13 bash:
<var>command
</var>: Argument list too long
18 <p>The
<code>ARG_MAX
</code> parameter is common among UNIX-like platforms but
19 since most such systems have fallen into obscurity, are BSD or are rubbish,
20 herein I will focus on a GNU/Linux environment running on a x86_64 platform.
21 While this limits the applicability of the information, many concepts will
22 apply to other systems and architectures as well.
25 <h2>Experimentation
</h2>
27 <div style=
"display:flex;flex-wrap:wrap;justify-content:space-between">
28 <div style=
"flex:.8 20rem">
29 <p style=
"margin-top:0">The value of
<code>ARG_MAX
</code> is no secret and
30 can be retrieve with
<code>getconf
</code> utility. On x86_64 platform
31 it’s
2097152 (or
2 MiB) by default. To test this limit we can use pretty
32 much any command;
<code>echo
</code> should do. A simple experiment (as
33 shown in the listing on the side) confirms that the tool can be called
34 with a one-million-character-long argument. On the other hand, further
35 investigation demonstrates that an argument three times as long works just
36 as well which shouldn’t be the case.
38 <p style=
"margin-bottom:0">Turns out that even in minimal shells, such as
39 dash and posh, the
<code>echo
</code> command is a built-in. Its execution
40 is performed entirely within the shell and as such isn’t subject to
41 kernel-imposed limitations. For the
<code>ARG_MAX
</code> limit to take
42 effect the
<code>execve
</code> system call has to be used. This can be
43 done by executing the
<code>/bin/echo
</code> binary instead.
46 <pre style=
"height:max-content;margin:0 0 0 auto">
51 head -c
"${1?}" /dev/zero |tr
"\0" A
54 $ s=$(./gen-str
1000000)
60 $ echo
"$s$s$s" |wc -c
63 echo is a shell builtin
66 sh: /bin/echo: Argument list too long
70 <p>This time, after redoing the experiment with that executable, the command
71 fails even for mere one million characters which should be within
72 the
<code>ARG_MAX
</code> limit. With some trial and error we can determined
73 that the longest argument the kernel will accept is
131072 bytes long
74 (including the terminating NUL byte). However, that’s clearly not the end of
75 the story. It’s easy to see that while
<em>a single argument
</em> is limited
76 to
128 KiB,
<em>the whole command line
</em> is not.
79 $ /bin/echo
"$(./gen-str 131072)"
80 sh: /bin/echo: Argument list too long
81 $ s=$(./gen-str
131071)
82 $ /bin/echo
"$s" |wc -c
84 $ /bin/echo
"$s" "$s" "$s" |wc -c
87 $ /bin/echo
"$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
88 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s"
89 sh: /bin/echo: Argument list too long
90 $ t=$(./gen-str $((
131071 -
3422)))
91 $ /bin/echo
"$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
92 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$t" |wc -c
96 <p>To determine the apparent limit we can try passing more and more arguments
97 until we get the ‘Argument list too long’ error. As we’ll see the value one
98 gets is dependent on the environment. In the shell session shown in listing
99 above, I got a limit of
2093730 characters which is only
3422 bytes shy of the
100 2 MiB we got as the value
<code>ARG_MAX
</code>. What could account for this
103 <p>Since
<code>echo
</code> prints a white-space character after each argument,
104 the number reported by
<code>wc -c
</code> does effectively count NUL bytes so
105 that cannot be where the missing bytes went. There’s one other aspect of
106 command line arguments that has been overlooked so far. By UNIX convention,
107 the first argument of a program is its name. In the above case
108 that’s
<code>/bin/echo
</code> which accounts for ten characters (again, NUL
109 byte is counted). While it’s something, it still leaves
3412 bytes
112 <p>The next breakthrough comes once we realise that command line arguments are
113 not the only way information is passed to an application. The other (commonly
114 overlooked) method are environment variables such
115 as
<code>PATH
</code>,
<code>TERM
</code> or
<code>USER
</code>. All of them
116 contribute to the limit in the same way command line arguments do. To measure
117 how much, we can invoke
<code>env |wc
</code> which will produce the number as
118 well as total length of all the variables. This isn’t robust against
119 variables containing newline characters, but other than that it correctly
120 measures used space including NUL bytes terminating each value.
127 <p>Environment variables explain further
2906 bytes which gets the discrepancy
128 down to
506 bytes. Close but no cigar just yet.
130 <p>Third thing to consider is how arguments and environment variables are passed
131 to an application. They end up in
<code>argv
</code> and
<code>environ
</code>
132 arrays respectively which take up space. In the example above there are
17
133 arguments and
45 environment variables (as seen from the output of
<code>env
134 |wc
</code>) so total of
62 strings. Each requires an eight-byte pointer in
135 corresponding array which in total amounts to
496 bytes. This
<em>still
</em>
136 leaves
10 bytes. The
<code>argv
</code> and
<code>environ
</code> arrays are
137 NULL-terminated but that is
<em>not
</em> counted against the limit — if we
138 were to include those NULL pointers we would overshoot the tally by
6 bytes.
139 Something else is afoot.
141 <p>The final bit of the puzzle is the auxiliary vector. A Linux-specific
142 mechanism which kernel uses to pass additional information to the user space.
143 One of the pieces of data this vector includes is the path used to launch the
144 executable. In the example above this path is once
145 again
<code>/bin/echo
</code> and thus perfectly accounts for the remaining ten
149 <h3>Verification
</h3>
151 <p>To verify all the findings we can try calling different binaries with and
152 without environment variables present. When doing that from a shell it’s
153 important to take note of any variables that the shell might automatically
154 create when calling programs. I’ve found that posh is particularly well
155 behaved in this regard. While it makes sure a
<code>PATH
</code> variable is
156 present when it starts, it lets user remove it and later doesn’t try to create
157 any more variables when invoking commands.
162 $ s=$(./gen-str
131071)
164 "$1" "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
165 "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
169 <i>#
17 arguments,
8 bytes per pointer
</i>
170 <i>#
10-byte-long path, once in argv and once in auxv
</i>
171 $ check /bin/echo $((
131071 -
17*
8 -
2*
10)) |wc -c
173 $ check /bin/false $((
131071 -
17*
8 -
2*
10))
174 posh: /bin/false: Argument list too long
175 <i># this time
11-byte-long path
</i>
176 $ check /bin/false $((
131071 -
17*
8 -
2*
11)))
178 $ foo=bar; export foo
179 $ check /bin/echo $((
131071 -
17*
8 -
2*
10))
180 posh: /bin/echo: Argument list too long
181 <i># Additional
8-byte pointer in environ array plus
</i>
182 <i># ‘foo=bar’ (inc. NUL byte) takes another
8 bytes
</i>
183 $ check /bin/echo $((
131071 -
17*
8 -
1*
8 -
2*
10 -
8)) |wc -c
187 <p>Calling programs whose paths have different lengths with and without
188 environment variables present is in fact consistent with all the rules
192 <h2>Pushing the limits
</h2>
194 <p>Knowing the limit, the next step is to understand how, if at all, can it be
195 changed. There’s no
<code>setconf
</code> counterpart to
196 the
<code>getconf
</code> tool which would allow setting the parameter.
197 However, there is a way to influence the value on Linux.
199 <p>To realise how we should consider what do all of the objects counted in the
200 limit — command line arguments, environment variables and auxiliary
201 vector — have in common. More specifically, where are they located. With
202 address space randomisation the exact addresses will vary even between runs of
203 the same application, but looking at an example addresses proves helpful
208 <tr><th>Expression
<th>Result
210 <tr><td><code>getauxval(AT_EXECFN)
</code><td><code>0x7fffac2fcfed</code>
211 <tr><td><code>environ[
0]
</code> <td><code>0x7fffac2fc473</code>
212 <tr><td><code>argv[
0]
</code> <td><code>0x7fffac2fc468</code>
214 <tr><td><code>sbrk(
0)
</code> <td><code>0x555bd584a000</code>
215 <tr><td><code>&global
</code> <td><code>0x555bd4c6a040</code>
218 <p>Yes, that’s it. All objects of interest are stored on the stack. It stands
219 to reason then that changing the maximum stack size might
220 influence
<code>ARG_MAX
</code> value. This hypothesis can be tested with the
221 help of
<code>ulimit
</code> built-in which allows reading and modifying
222 resource limits such as the maximum stack size.
225 $ ulimit -Ss
1024; getconf ARG_MAX
226 262144 <i>#
256 KiB
</i>
227 $ ulimit -Ss
512; getconf ARG_MAX
228 131072 <i>#
128 KiB
</i>
229 $ ulimit -Ss
256; getconf ARG_MAX
230 131072 <i>#
128 KiB
</i>
232 $ ulimit -Ss $((
1024 *
1024)); getconf ARG_MAX
233 268435456 <i>#
256 GiB
</i>
236 <p>Correlating maximum stack size with value of the
<code>ARG_MAX
</code> limit
237 we can easily see that the latter is set to one fourth of the former with
238 additional restriction that
<code>ARG_MAX
</code> is no lower than
128 KiB.
239 Upper limit on the other hand doesn’t appear to exist. Or does it? Let’s try
240 a stack size of
42 MiB:
243 $ ulimit -Ss $((
42 *
1024))
246 $ s=$(./gen-str
131071)
247 $ /bin/true
"$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
248 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
249 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
250 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
251 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s" \
252 "$s" "$s" "$s" "$s" "$s" "$s" "$s" "$s"
253 sh: /bin/true: Argument list too long
256 <p>Turns out even if the kernel returns a large value of
<code>ARG_MAX
</code>,
257 the limit is always capped at
6 MiB.
259 <p>On the other end, even if kernel reports
<code>ARG_MAX
</code> to be
128 KiB,
260 that effective limit will never reach more than the maximum stack size. For
263 <pre>$ ulimit -Ss
100
268 $ /bin/true
"$(../gen-str 102400)"
269 posh: /bin/true: Argument list too long
273 <h2>The real argument limit
</h2>
275 <p>To quickly recap all the information:
277 <li id=b1
>The maximum length of the arguments to the
<code>execve
</code>
278 syscall is one forth of the maximum stack size but no less than
128 KiB and
280 <li>The limit covers: i) all command line arguments, ii) all environment
281 variables, iii) pointers to the former two present in
<code>argv
</code>
282 and
<code>environ
</code> arrays and iv) the path to the executable used to
284 <li>In addition, regardless of how high the limit is, a single command line
285 argument and environment variable cannot exceed
128 KiB. Size of
286 environment variable is calculated as the size
287 of
<code><var>name
</var>=
<var>value
</var></code> string.
288 <li>String size includes the terminating NUL byte, i.e. it’s the string length
292 <p>And that’s it for now. In
<a href=
"/2021/the-real-arg-max-part-2/">the next
293 part
</a> we’re going to look at the kernel code responsible for implementing
294 the limit so stay tuned (or should I say smash that
<a href=
"/atom">Atom
297 <p id=f1
><a href=#b1
>1</a> Roughly speaking. It’s of course impossible to use
298 the entire stack for arguments and environment variables since there would be
299 no space left for any other information or setting up stack frame. As such,
300 the actual arg limit is capped at less than maximum stack size.
302 <p style=
"margin-top: 2em">PS. By the way, despite all the BSD influences in
303 XNU, I don’t consider macOS to be a BSD.