2 .\" Copyright 1989 AT&T Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved
3 .\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
4 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
5 .\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
6 .TH GPROF 1 "Feb 8, 2007"
8 gprof \- display call-graph profile data
12 \fBgprof\fR [\fB-abcCDlsz\fR] [\fB-e\fR \fIfunction-name\fR] [\fB-E\fR \fIfunction-name\fR]
13 [\fB-f\fR \fIfunction-name\fR] [\fB-F\fR \fIfunction-name\fR]
14 [\fIimage-file\fR [\fIprofile-file\fR...]]
15 [\fB-n\fR \fInumber of functions\fR]
21 The \fBgprof\fR utility produces an execution profile of a program. The effect
22 of called routines is incorporated in the profile of each caller. The profile
23 data is taken from the call graph profile file that is created by programs
24 compiled with the \fB-xpg\fR option of \fBcc\fR(1), or by the \fB-pg\fR option
25 with other compilers, or by setting the \fBLD_PROFILE\fR environment variable
26 for shared objects. See \fBld.so.1\fR(1). These compiler options also link in
27 versions of the library routines which are compiled for profiling. The symbol
28 table in the executable image file \fIimage-file\fR (\fBa.out\fR by default) is
29 read and correlated with the call graph profile file \fIprofile-file\fR
30 (\fBgmon.out\fR by default).
33 First, execution times for each routine are propagated along the edges of the
34 call graph. Cycles are discovered, and calls into a cycle are made to share the
35 time of the cycle. The first listing shows the functions sorted according to
36 the time they represent, including the time of their call graph descendants.
37 Below each function entry is shown its (direct) call-graph children and how
38 their times are propagated to this function. A similar display above the
39 function shows how this function's time and the time of its descendants are
40 propagated to its (direct) call-graph parents.
43 Cycles are also shown, with an entry for the cycle as a whole and a listing of
44 the members of the cycle and their contributions to the time and call counts of
48 Next, a flat profile is given, similar to that provided by \fBprof\fR(1). This
49 listing gives the total execution times and call counts for each of the
50 functions in the program, sorted by decreasing time. Finally, an index is
51 given, which shows the correspondence between function names and call-graph
52 profile index numbers.
55 A single function may be split into subfunctions for profiling by means of the
56 \fBMARK\fR macro. See \fBprof\fR(5).
59 Beware of quantization errors. The granularity of the sampling is shown, but
60 remains statistical at best. It is assumed that the time for each execution of
61 a function can be expressed by the total time for the function divided by the
62 number of times the function is called. Thus the time propagated along the
63 call-graph arcs to parents of that function is directly proportional to the
64 number of times that arc is traversed.
67 The profiled program must call \fBexit\fR(2) or return normally for the
68 profiling information to be saved in the \fBgmon.out\fR file.
72 The following options are supported:
79 Suppress printing statically declared functions. If this option is given, all
80 relevant information about the static function (for instance, time samples,
81 calls to other functions, calls from other functions) belongs to the function
82 loaded just before the static function in the \fBa.out\fR file.
91 Brief. Suppress descriptions of each field in the profile.
100 Discover the static call-graph of the program by a heuristic which examines the
101 text space of the object file. Static-only parents or children are indicated
102 with call counts of 0. Note that for dynamically linked executables, the linked
103 shared objects' text segments are not examined.
112 Demangle C++ symbol names before printing them out.
121 Produce a profile file \fBgmon.sum\fR that represents the difference of the
122 profile information in all specified profile files. This summary profile file
123 may be given to subsequent executions of \fBgprof\fR (also with \fB-D\fR) to
124 summarize profile data across several runs of an \fBa.out\fR file. See also
127 As an example, suppose function A calls function B \fBn\fR times in profile
128 file \fBgmon.sum\fR, and \fBm\fR times in profile file \fBgmon.out\fR. With
129 \fB-D\fR, a new \fBgmon.sum\fR file will be created showing the number of calls
130 from A to B as \fBn-m\fR.
136 \fB\fB-e\fR\fIfunction-name\fR\fR
139 Suppress printing the graph profile entry for routine \fIfunction-name\fR and
140 all its descendants (unless they have other ancestors that are not suppressed).
141 More than one \fB-e\fR option may be given. Only one \fIfunction-name\fR may
142 be given with each \fB-e\fR option.
148 \fB\fB-E\fR\fIfunction-name\fR\fR
151 Suppress printing the graph profile entry for routine \fIfunction-name\fR (and
152 its descendants) as \fB-e\fR, below, and also exclude the time spent in
153 \fIfunction-name\fR (and its descendants) from the total and percentage time
154 computations. More than one \fB-E\fR option may be given. For example:
156 \fB-E\fR \fImcount\fR \fB-E\fR \fImcleanup\fR
164 \fB\fB-f\fR\fIfunction-name\fR\fR
167 Print the graph profile entry only for routine \fIfunction-name\fR and its
168 descendants. More than one \fB-f\fR option may be given. Only one
169 \fIfunction-name\fR may be given with each \fB-f\fR option.
175 \fB\fB-F\fR\fIfunction-name\fR\fR
178 Print the graph profile entry only for routine \fIfunction-name\fR and its
179 descendants (as \fB-f\fR, below) and also use only the times of the printed
180 routines in total time and percentage computations. More than one \fB-F\fR
181 option may be given. Only one \fIfunction-name\fR may be given with each
182 \fB-F\fR option. The \fB-F\fR option overrides the \fB-E\fR option.
191 Suppress the reporting of graph profile entries for all local symbols. This
192 option would be the equivalent of placing all of the local symbols for the
193 specified executable image on the \fB-E\fR exclusion list.
202 Limits the size of flat and graph profile listings to the top \fBn\fR offending
212 Produce a profile file \fBgmon.sum\fR which represents the sum of the profile
213 information in all of the specified profile files. This summary profile file
214 may be given to subsequent executions of \fBgprof\fR (also with \fB-s\fR) to
215 accumulate profile data across several runs of an \fBa.out\fR file. See also
225 Display routines which have zero usage (as indicated by call counts and
226 accumulated time). This is useful in conjunction with the \fB-c\fR option for
227 discovering which routines were never called. Note that this has restricted use
228 for dynamically linked executables, since shared object text space will not be
229 examined by the \fB-c\fR option.
232 .SH ENVIRONMENT VARIABLES
239 If this environment variable contains a value, place profiling output within
240 that directory, in a file named \fIpid\fR\fB\&.\fR\fIprogramname\fR. \fIpid\fR
241 is the process \fBID\fR and \fIprogramname\fR is the name of the program being
242 profiled, as determined by removing any path prefix from the \fBargv[0]\fR with
243 which the program was called. If the variable contains a null value, no
244 profiling output is produced. Otherwise, profiling output is placed in the
255 executable file containing namelist
264 dynamic call-graph and profile
273 summarized dynamic call-graph and profile
279 \fB\fB$PROFDIR/\fR\fIpid\fR\fB\&.\fR\fIprogramname\fR\fR
288 \fBcc\fR(1), \fBld.so.1\fR(1), \fBprof\fR(1), \fBexit\fR(2), \fBpcsample\fR(2),
289 \fBprofil\fR(2), \fBmalloc\fR(3C), \fBmalloc\fR(3MALLOC), \fBmonitor\fR(3C),
290 \fBattributes\fR(5), \fBprof\fR(5)
293 Graham, S.L., Kessler, P.B., McKusick, M.K., \fIgprof: A Call Graph Execution
294 Profiler Proceedings of the SIGPLAN '82 Symposium on Compiler Construction\fR,
295 \fBSIGPLAN\fR Notices, Vol. 17, No. 6, pp. 120-126, June 1982.
298 \fILinker and Libraries Guide\fR
302 If the executable image has been stripped and does not have the \fB\&.symtab\fR
303 symbol table, \fBgprof\fR reads the global dynamic symbol tables
304 \fB\&.dynsym\fR and \fB\&.SUNW_ldynsym\fR, if present. The symbols in the
305 dynamic symbol tables are a subset of the symbols that are found in
306 \fB\&.symtab\fR. The \fB\&.dynsym\fR symbol table contains the global symbols
307 used by the runtime linker. \fB\&.SUNW_ldynsym\fR augments the information in
308 \fB\&.dynsym\fR with local function symbols. In the case where \fB\&.dynsym\fR
309 is found and \fB\&.SUNW_ldynsym\fR is not, only the information for the global
310 symbols is available. Without local symbols, the behavior is as described for
314 \fBLD_LIBRARY_PATH\fR must not contain \fB/usr/lib\fR as a component when
315 compiling a program for profiling. If \fBLD_LIBRARY_PATH\fR contains
316 \fB/usr/lib\fR, the program will not be linked correctly with the profiling
317 versions of the system libraries in \fB/usr/lib/libp\fR.
320 The times reported in successive identical runs may show variances because of
321 varying cache-hit ratios that result from sharing the cache with other
322 processes. Even if a program seems to be the only one using the machine, hidden
323 background or asynchronous processes may blur the data. In rare cases, the
324 clock ticks initiating recording of the program counter may \fBbeat\fR with
325 loops in a program, grossly distorting measurements. Call counts are always
326 recorded precisely, however.
329 Only programs that call \fBexit\fR or return from \fBmain\fR are guaranteed to
330 produce a profile file, unless a final call to \fBmonitor\fR is explicitly
334 Functions such as \fBmcount()\fR, \fB_mcount()\fR, \fBmoncontrol()\fR,
335 \fB_moncontrol()\fR, \fBmonitor()\fR, and \fB_monitor()\fR may appear in the
336 \fBgprof\fR report. These functions are part of the profiling implementation
337 and thus account for some amount of the runtime overhead. Since these
338 functions are not present in an unprofiled application, time accumulated and
339 call counts for these functions may be ignored when evaluating the performance
341 .SS "64-bit profiling"
344 64-bit profiling may be used freely with dynamically linked executables, and
345 profiling information is collected for the shared objects if the objects are
346 compiled for profiling. Care must be applied to interpret the profile output,
347 since it is possible for symbols from different shared objects to have the same
348 name. If name duplication occurs in the profile output, the module id prefix
349 before the symbol name in the symbol index listing can be used to identify the
350 appropriate module for the symbol.
353 When using the \fB-s\fR or \fB-D\fRoption to sum multiple profile files, care
354 must be taken not to mix 32-bit profile files with 64-bit profile files.
355 .SS "32-bit profiling"
358 32-bit profiling may be used with dynamically linked executables, but care must
359 be applied. In 32-bit profiling, shared objects cannot be profiled with
360 \fBgprof\fR. Thus, when a profiled, dynamically linked program is executed,
361 only the \fBmain\fR portion of the image is sampled. This means that all time
362 spent outside of the \fBmain\fR object, that is, time spent in a shared object,
363 will not be included in the profile summary; the total time reported for the
364 program may be less than the total time used by the program.
367 Because the time spent in a shared object cannot be accounted for, the use of
368 shared objects should be minimized whenever a program is profiled with
369 \fBgprof\fR. If desired, the program should be linked to the profiled version
370 of a library (or to the standard archive version if no profiling version is
371 available), instead of the shared object to get profile information on the
372 functions of a library. Versions of profiled libraries may be supplied with the
373 system in the \fB/usr/lib/libp\fR directory. Refer to compiler driver
374 documentation on profiling.
377 Consider an extreme case. A profiled program dynamically linked with the shared
378 C library spends 100 units of time in some \fBlibc\fR routine, say,
379 \fBmalloc()\fR. Suppose \fBmalloc()\fR is called only from routine \fBB\fR and
380 \fBB\fR consumes only 1 unit of time. Suppose further that routine \fBA\fR
381 consumes 10 units of time, more than any other routine in the \fBmain\fR
382 (profiled) portion of the image. In this case, \fBgprof\fR will conclude that
383 most of the time is being spent in \fBA\fR and almost no time is being spent in
384 \fBB\fR. From this it will be almost impossible to tell that the greatest
385 improvement can be made by looking at routine \fBB\fR and not routine \fBA\fR.
386 The value of the profiler in this case is severely degraded; the solution is to
387 use archives as much as possible for profiling.
391 Parents which are not themselves profiled will have the time of their profiled
392 children propagated to them, but they will appear to be spontaneously invoked
393 in the call-graph listing, and will not have their time propagated further.
394 Similarly, signal catchers, even though profiled, will appear to be spontaneous
395 (although for more obscure reasons). Any profiled children of signal catchers
396 should have their times propagated properly, unless the signal catcher was
397 invoked during the execution of the profiling routine, in which case all is