1 IMPORTANT NOTE FOR 64-BIT USERS
2 -------------------------------
3 There are known issues with some perftools functionality on x86_64
4 systems. See 64-BIT ISSUES, below.
9 Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
10 tcmalloc -- a replacement for malloc and new. See below for some
11 environment variables you can use with tcmalloc, as well.
13 tcmalloc functionality is available on all systems we've tested; see
14 INSTALL for more details. See README_windows.txt for instructions on
15 using tcmalloc on Windows.
17 NOTE: When compiling with programs with gcc, that you plan to link
18 with libtcmalloc, it's safest to pass in the flags
20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
22 when compiling. gcc makes some optimizations assuming it is using its
23 own, built-in malloc; that assumption obviously isn't true with
24 tcmalloc. In practice, we haven't seen any problems with this, but
25 the expected risk is highest for users who register their own malloc
26 hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is
27 lowest for folks who use tcmalloc_minimal (or, of course, who pass in
28 the above flags :-) ).
33 See doc/heap-profiler.html for information about how to use tcmalloc's
34 heap profiler and analyze its output.
36 As a quick-start, do the following after installing this package:
38 1) Link your executable with -ltcmalloc
39 2) Run your executable with the HEAPPROFILE environment var set:
40 $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
41 3) Run pprof to analyze the heap usage
42 $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options
43 $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
45 You can also use LD_PRELOAD to heap-profile an executable that you
48 There are other environment variables, besides HEAPPROFILE, you can
49 set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
52 The heap profiler is available on all unix-based systems we've tested;
53 see INSTALL for more details. It is not currently available on Windows.
58 See doc/heap-checker.html for information about how to use tcmalloc's
61 In order to catch all heap leaks, tcmalloc must be linked *last* into
62 your executable. The heap checker may mischaracterize some memory
63 accesses in libraries listed after it on the link line. For instance,
64 it may report these libraries as leaking memory when they're not.
65 (See the source code for more details.)
67 Here's a quick-start for how to use:
69 As a quick-start, do the following after installing this package:
71 1) Link your executable with -ltcmalloc
72 2) Run your executable with the HEAPCHECK environment var set:
73 $ HEAPCHECK=1 <path/to/binary> [binary args]
75 Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
77 You can also use LD_PRELOAD to heap-check an executable that you
80 The heap checker is only available on Linux at this time; see INSTALL
86 See doc/cpu-profiler.html for information about how to use the CPU
87 profiler and analyze its output.
89 As a quick-start, do the following after installing this package:
91 1) Link your executable with -lprofiler
92 2) Run your executable with the CPUPROFILE environment var set:
93 $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
94 3) Run pprof to analyze the CPU usage
95 $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output
96 $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
98 There are other environment variables, besides CPUPROFILE, you can set
99 to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
101 The CPU profiler is available on all unix-based systems we've tested;
102 see INSTALL for more details. It is not currently available on Windows.
104 NOTE: CPU profiling doesn't work after fork (unless you immediately
105 do an exec()-like call afterwards). Furthermore, if you do
106 fork, and the child calls exit(), it may corrupt the profile
107 data. You can use _exit() to work around this. We hope to have
108 a fix for both problems in the next release of perftools
109 (hopefully perftools 1.2).
114 If you want the CPU profiler, heap profiler, and heap leak-checker to
115 all be available for your application, you can do:
116 gcc -o myapp ... -lprofiler -ltcmalloc
118 However, if you have a reason to use the static versions of the
119 library, this two-library linking won't work:
120 gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors!
122 Instead, use the special libtcmalloc_and_profiler library, which we
123 make for just this purpose:
124 gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
127 CONFIGURATION OPTIONS
128 ---------------------
129 For advanced users, there are several flags you can pass to
130 './configure' that tweak tcmalloc performace. (These are in addition
131 to the environment variables you can set at runtime to affect
132 tcmalloc, described below.) See the INSTALL file for details.
135 ENVIRONMENT VARIABLES
136 ---------------------
137 The cpu profiler, heap checker, and heap profiler will lie dormant,
138 using no memory or CPU, until you turn them on. (Thus, there's no
139 harm in linking -lprofiler into every application, and also -ltcmalloc
140 assuming you're ok using the non-libc malloc library.)
142 The easiest way to turn them on is by setting the appropriate
143 environment variables. We have several variables that let you
144 enable/disable features as well as tweak parameters.
146 Here are some of the most important variables:
148 HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
149 HEAPCHECK=<type> -- turns on heap checking with strictness 'type'
150 CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
151 PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
152 surrounded with ProfilerEnable()/ProfilerDisable().
153 PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
155 TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits
156 MALLOCSTATS=<level> -- prints memory-use stats at program-exit
158 For a full list of variables, see the documentation pages:
161 doc/heap_checker.html
164 COMPILING ON NON-LINUX SYSTEMS
165 ------------------------------
167 Perftools was developed and tested on x86 Linux systems, and it works
168 in its full generality only on those systems. However, we've
169 successfully ported much of the tcmalloc library to FreeBSD, Solaris
170 x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
171 functionality in tcmalloc_minimal to Windows. See INSTALL for details.
172 See README_windows.txt for details on the Windows port.
178 If you're interested in some third-party comparisons of tcmalloc to
179 other malloc libraries, here are a few web pages that have been
180 brought to our attention. The first discusses the effect of using
181 various malloc libraries on OpenLDAP. The second compares tcmalloc to
183 http://www.highlandsun.com/hyc/malloc/
184 http://gaiacrtn.free.fr/articles/win32perftools.html
186 It's possible to build tcmalloc in a way that trades off faster
187 performance (particularly for deletes) at the cost of more memory
188 fragmentation (that is, more unusable memory on your system). See the
189 INSTALL file for details.
195 When compiling perftools on some old systems, like RedHat 8, you may
196 get an error like this:
197 ___tls_get_addr: symbol not found
199 This means that you have a system where some parts are updated enough
200 to support Thread Local Storage, but others are not. The perftools
201 configure script can't always detect this kind of case, leading to
202 that error. To fix it, just comment out (or delete) the line
204 in your config.h file before building.
210 There are two issues that can cause program hangs or crashes on x86_64
211 64-bit systems, which use the libunwind library to get stack-traces.
212 Neither issue should affect the core tcmalloc library; they both
213 affect the perftools tools such as cpu-profiler, heap-checker, and
216 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
217 libc function dl_iterate_phdr() acquires its locks in the wrong
218 order. This bug should not affect tcmalloc, but may cause occasional
219 deadlock with the cpu-profiler, heap-profiler, and heap-checker.
220 Its likeliness increases the more dlopen() commands an executable has.
221 Most executables don't have any, though several library routines like
222 getgrgid() call dlopen() behind the scenes.
224 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
225 cpu-profiler tool is unreliable: it will sometimes work, but sometimes
226 cause a segfault. I'll explain the problem first, and then some
229 Note that this only affects the cpu-profiler, which is a
230 gperftools feature you must turn on manually by setting the
231 CPUPROFILE environment variable. If you do not turn on cpu-profiling,
232 you shouldn't see any crashes due to perftools.
234 The gory details: The underlying problem is in the backtrace()
235 function, which is a built-in function in libc.
236 Backtracing is fairly straightforward in the normal case, but can run
237 into problems when having to backtrace across a signal frame.
238 Unfortunately, the cpu-profiler uses signals in order to register a
239 profiling event, so every backtrace that the profiler does crosses a
242 In our experience, the only time there is trouble is when the signal
243 fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
244 called quite a bit from system libraries, particularly at program
245 startup and when creating a new thread.
247 The solution: The dwarf debugging format has support for 'cfi
248 annotations', which make it easy to recognize a signal frame. Some OS
249 distributions, such as Fedora and gentoo 2007.0, already have added
250 cfi annotations to their libc. A future version of libunwind should
251 recognize these annotations; these systems should not see any
254 Workarounds: If you see problems with crashes when running the
255 cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
256 your code, rather than setting CPUPROFILE. This will profile only
257 those sections of the codebase. Though we haven't done much testing,
258 in theory this should reduce the chance of crashes by limiting the
259 signal generation to only a small part of the codebase. Ideally, you
260 would not use ProfilerStart()/ProfilerStop() around code that spawns
261 new threads, or is otherwise likely to cause a call to