1 <!-- subject: Will the real <code>ARG_MAX</code> please stand up? Part 2 -->
2 <!-- date: 2021-04-18 01:49:29 -->
3 <!-- tags: arg_max, unix, linux -->
4 <!-- categories: Articles, Techblog -->
6 <p>In
<a href=
"/2021/the-real-arg-max-part-1/">part one
</a> we’ve looked at
7 the
<code>ARG_MAX
</code> parameter on Linux-based systems. We’ve established
8 experimentally how it limits arguments passed to programs and what influences
9 the value. This time, we’ll look directly at the source to verify our
10 findings and see how
<code>ARG_MAX
</code> looks from the point of view of
11 system libraries and kernel itself.
15 <h2>C system library
</h2>
17 <p>Application get value of the
<code>ARG_MAX
</code> parameter from
18 the
<code>sysconf
</code> function. It’s what the
<code>getconf
</code> utility
19 uses to report the limit. But even though the result of the function is
20 closely related to the kernel, looking for its definition in the Linux source
21 code is an exercise in futility. Rather, the function is defined in the
22 C system library which, in GNU/Linux distributions, is commonly providedy by
25 <p>glibc is a cross-platform library which supports many kernels and
26 architectures. It often includes multiple definitions of the same function
27 each tailored for particular platform. Such is the case
28 with
<code>sysconf
</code>. Thankfully, our analysis is limited to Linux and
29 in glibc
2.33, the implementation we’re interested in is located
30 in
<code>sysdeps/unix/sysv/linux/sysconf.c
</code> file and looks as follows:
33 #define legacy_ARG_MAX
131072
40 const char *procfname = NULL;
49 /* Use getrlimit to get the stack limit. */
50 if (__getrlimit (RLIMIT_STACK,
&rlimit) ==
0)
51 return MAX (legacy_ARG_MAX, rlimit.rlim_cur /
4);
53 return legacy_ARG_MAX;
59 return posix_sysconf (name);
63 <p>This code explains discrepancies we’ve observed
64 when
<a href=
"/2021/the-real-arg-max-part-1/#bigstack">testing large stack
65 size limit
</a>. While glibc implements the
128 KiB lower bound it’s unaware
66 of the
6 MiB upper bound. Since
<code>getconf
</code> utility relies
67 on
<code>sysconf
</code> library function, having the above implementation
68 means that for large stacks the tool will wrongly report
<code>ARG_MAX
</code>
69 as quarter of maximum stack size.
71 <p>glibc isn’t the only library used on Linux systems. Others have their
72 own
<code>sysconf
</code> implementations which may return different values.
73 uClibc-ng
1.0.38 behaves the same way glibc does while bionic
10.0, dietlibc
74 0.34 and musl
1.2 return
128 KiB as
<code>ARG_MAX
</code>.
76 <p>The good news is that situation with glibc has since improved. glibc
2.34
78 with
<a href=
"https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=a9880586eedb3ba89ca6a7c5e3f0664c279cf636">my
79 commit
</a> which makes
<code>sysconf
</code> aware of the
6 MiB upper bound.
80 Recent GNU/Linux systems will report
<code>ARG_MAX
</code> correctly even for
86 <p>On the kernel side, we want to look at the
<code>execve
</code> system call.
87 It is defined using a
<code>SYSCAL_DEFINE
<var>n
</var></code> macro and it
88 doesn’t take long to find its implementation in
<code>fs/exec.c
</code> file.
89 In Linux
5.11.11 it looks as follows:
92 SYSCALL_DEFINE3(execve,
93 const char __user *, filename,
94 const char __user *const __user *, argv,
95 const char __user *const __user *, envp)
97 return do_execve(getname(filename), argv, envp);
101 <p>Definition of
<code>do_execve
</code> can be found a few lines earlier in
102 the same file. All it does is call
<code>do_execveat_common
</code> function
103 so that’s what we’re going to take a closer look at. It is where most of the
104 checks and calculations happen:
107 static int do_execveat_common(int fd, struct filename *filename,
108 struct user_arg_ptr argv,
109 struct user_arg_ptr envp,
112 struct linux_binprm *bprm;
116 retval = count(argv, MAX_ARG_STRINGS);
121 retval = count(envp, MAX_ARG_STRINGS);
126 retval = bprm_stack_limits(bprm);
130 retval = copy_string_kernel(bprm-
>filename, bprm);
133 bprm-
>exec = bprm-
>p;
135 retval = copy_strings(bprm-
>envc, envp, bprm);
139 retval = copy_strings(bprm-
>argc, argv, bprm);
143 retval = bprm_execve(bprm, fd, filename, flags);
150 <p>The two invocations to
<code>count
</code> function calculate number of
151 command line arguments and environment variables. Each call may fail if the
152 number exceeds
<code>MAX_ARG_STRINGS
</code>. Technically speaking this is
153 another limit but in practice the constant is over two billion and, as we’ll
154 see later, there is no way to reach this number without reaching other limits
155 first. The only other situation in which
<code>count
</code> function may
156 return an error is in case of memory fault, but that’s not interesting for our
160 <h3>Limit calculation
</h3>
162 <p><code>bprm_stack_limits
</code> is where the actual calculation happens. The
163 function determines the limit and stores it in the
<code>bprm
</code>
164 structure. It’s defined as follows:
167 static int bprm_stack_limits(struct linux_binprm *bprm)
169 unsigned long limit, ptr_size;
171 limit = _STK_LIM /
4 *
3;
172 limit = min(limit, bprm-
>rlim_stack.rlim_cur /
4);
173 limit = max_t(unsigned long, limit, ARG_MAX);
175 ptr_size = (bprm-
>argc + bprm-
>envc) * sizeof(void *);
176 if (limit
<= ptr_size)
180 bprm-
>argmin = bprm-
>p - limit;
185 <p><code>_STK_LIM
</code> is the default stack size limit and equals
8 MiB. The
186 first expression in the function is what introduces the upper bound of
6 MiB
187 for arguments. It’s worth noting that it’s a relatively new restriction
188 introduced in Linux
4.13 (and later back-ported to previous releases). Why
189 it’s there might be a story for another time.
191 <p>The second expression in the function is what implements the ‘quarter of the
192 stack size’ rule. This is what could be called a ‘normal’ case and definitely
193 is most typical of common desktop and server configurations. With default
194 maximum stack size limit being
8 MiB the default limit for executable
195 arguments ends up being
2 MiB.
197 <p>The third expression sets the limit to be no less than
198 the
<code>ARG_MAX
</code>. This gets a bit confusing.
<code>ARG_MAX
</code> is
199 supposed to be a dynamic value and here we see a constant of the same name.
200 As often is the case, the explanation lays in the past. Historically the
201 value was constant and defined as a macro in kernel headers. Eventually,
202 a more dynamic approach was introduced but the definition of the macro stuck.
203 To maintain backwards-compatibility, the dynamic calculation kept the old
204 static value as a lower bound.
206 <p>The last adjustment in the function is to reserve space for
207 the
<code>argv
</code> and
<code>envp
</code> arrays. If the limit cannot
208 accommodate them the function returns an error; otherwise the limit is reduced
209 by the necessary space. This is where we can see that the limit of two
210 billion arguments and environment variables (imposed by the
<code>count
</code>
211 function called in
<code>do_execveat_common
</code>) can never be reached.
212 With a
6 MiB upper bound for the limit, the most one could hope for is
1.25
213 million arguments and that’s only on a
32-bit system with all strings empty.
215 <p>The calculated limit is finally stored in
<code>argmin
</code> field of
216 the
<code>bprm
</code> structure. It specifies the lowest address at which
217 arguments can still be stored and the value will be checked later on when
218 program executable path, environment variables and command line arguments are
219 copied. Recall that stack grows downward which is why the field specifies the
220 minimum and why it’s calculated by subtracting the argument size limit from
221 the current top of the stack (specified by
<code>bprm-
>p
</code>).
224 <h3>Copying strings
</h3>
226 <p>Eventually,
<code>do_execveat_common
</code> checks the lengths of the strings
227 while copying them to the new program’s memory. First, the path to program’s
228 executable is transferred with the help of
<code>copy_string_kernel
</code>
229 function which is defined as follows:
232 int copy_string_kernel(const char *arg, struct linux_binprm *bprm)
234 int len = strnlen(arg, MAX_ARG_STRLEN) +
1 <i>/* terminating NUL */
</i>;
235 unsigned long pos = bprm-
>p;
239 if (!valid_arg_len(bprm, len))
244 if (IS_ENABLED(CONFIG_MMU) && bprm-
>p
< bprm-
>argmin)
247 <i>/* [… copy the string …] */
</i>
248 <i>/* [… analogous to memcpy(bprm-
>p, arg, len); …] */
</i>
254 <p>Firstly,
<code>strnlen
</code> paired with call to
<code>valid_arg_len
</code>
255 checks whether the string exceeds
<code>MAX_ARG_STRLEN
</code> bytes (or
256 128 KiB).
<code>valid_arg_len
</code> is a trivial inline function whose body
257 simply states
<code>return len
<= MAX_ARG_STRLEN;
</code>. If the size of
258 the string exceeds the limit, argument list is deemed too long and the
259 function returns an error.
261 <p>Then, the function checks if there’s enough space on stack to fit the string.
262 This is done by moving the stack pointer downwards
263 (i.e. subtracting
<code>len
</code> from
<code>bprm-
>p
</code> field) to reserve
264 memory for the argument and checking whether the new position of the edge of
265 the stack crossed the limit (by checking if
<code>bprm-
>p
<
266 bprm-
>argmin
</code>). If so, argument list is to long. Otherwise the
267 argument is copied onto the stack.
269 <p>The
<code>copy_strings
</code> function which
<code>do_execveat_common
</code>
270 function calls to transfer environment variables and command line arguments is
271 entirely analogous. The two differences are that i) source data lives in
272 user-space and ii) the function operates in a loop copying a sequence of
276 static int copy_strings(int argc, struct user_arg_ptr argv,
277 struct linux_binprm *bprm)
283 const char __user *str;
288 str = get_user_arg_ptr(argv, argc);
292 len = strnlen_user(str, MAX_ARG_STRLEN);
297 if (!valid_arg_len(bprm, len))
303 if (bprm-
>p
< bprm-
>argmin)
306 <i>/* [… copy the string …] */
</i>
307 <i>/* [… analogous to memcpy(bprm-
>p, str, len); …] */
</i>
316 <p>Having to read from user-space complicates the function, though much of that
317 complexity has been hidden from the listing above in the elided code. The
318 visible parts are calls to
<code>get_user_arg_ptr
</code>
319 and
<code>strnlen_user
</code> instead of
<code>strnlen
</code>.
321 <p>The parts that interests us remain the same: the
<code>valid_arg_len
</code>
322 call and the
<code>bprm-
>p
< bprm-
>argmin
</code> comparison.
327 <p>This concludes the investigation. In the previous article we’ve seen how the
328 argument length limit affects user-space, here we looked at the source code of
329 the kernel to confirm our previous findings. There are still a few minor
330 mysteries — such as why the
6 MiB exists or what happens if maximum stack size
331 is less that
128 KiB — which I may tackle at another time.
333 <p>It remains important to remember that our findings are true for Linux only.
334 Other kernels will set the limit differently and count different things
335 towards it. POSIX leaves the details purposefully vague. As a result
336 a portable application may struggle to interpret the limit; it should not only
337 take value of
<code>ARG_MAX
</code> with a grain of salt but ideally also
338 recover from
<code>E2BIG
</code> error by reducing number of arguments.
340 <p>Fortunately, UNIX-like systems provide a simple solution in the form
341 of
<code>xargs
</code> and
<code>find … -exec … +
</code> commands. Those
342 should be much easier to use and sufficient for most cases. They will
343 typically know how to deal with the command’s argument size limit.
345 <p>Whatever the case may be, I hope this article has been informative and
346 provided further understanding of the kernel and it’s interaction with