This commit was manufactured by cvs2svn to create tag 'cnrisync'.
[python/dscho.git] / Doc / libprofile.tex
blob4ba4cf979da9af63a288e91854fc0eef0be4452f
1 \chapter{The Python Profiler}
2 \stmodindex{profile}
3 \stmodindex{pstats}
5 Copyright \copyright\ 1994, by InfoSeek Corporation, all rights reserved.
7 Written by James Roskind%
8 \footnote{
9 Updated and converted to \LaTeX\ by Guido van Rossum. The references to
10 the old profiler are left in the text, although it no longer exists.
13 Permission to use, copy, modify, and distribute this Python software
14 and its associated documentation for any purpose (subject to the
15 restriction in the following sentence) without fee is hereby granted,
16 provided that the above copyright notice appears in all copies, and
17 that both that copyright notice and this permission notice appear in
18 supporting documentation, and that the name of InfoSeek not be used in
19 advertising or publicity pertaining to distribution of the software
20 without specific, written prior permission. This permission is
21 explicitly restricted to the copying and modification of the software
22 to remain in Python, compiled Python, or other languages (such as C)
23 wherein the modified or derived code is exclusively imported into a
24 Python module.
26 INFOSEEK CORPORATION DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
27 SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
28 FITNESS. IN NO EVENT SHALL INFOSEEK CORPORATION BE LIABLE FOR ANY
29 SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
30 RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
31 CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
32 CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
35 The profiler was written after only programming in Python for 3 weeks.
36 As a result, it is probably clumsy code, but I don't know for sure yet
37 'cause I'm a beginner :-). I did work hard to make the code run fast,
38 so that profiling would be a reasonable thing to do. I tried not to
39 repeat code fragments, but I'm sure I did some stuff in really awkward
40 ways at times. Please send suggestions for improvements to:
41 \code{jar@netscape.com}. I won't promise \emph{any} support. ...but
42 I'd appreciate the feedback.
45 \section{Introduction to the profiler}
46 \nodename{Profiler Introduction}
48 A \dfn{profiler} is a program that describes the run time performance
49 of a program, providing a variety of statistics. This documentation
50 describes the profiler functionality provided in the modules
51 \code{profile} and \code{pstats.} This profiler provides
52 \dfn{deterministic profiling} of any Python programs. It also
53 provides a series of report generation tools to allow users to rapidly
54 examine the results of a profile operation.
57 \section{How Is This Profiler Different From The Old Profiler?}
58 \nodename{Profiler Changes}
60 The big changes from old profiling module are that you get more
61 information, and you pay less CPU time. It's not a trade-off, it's a
62 trade-up.
64 To be specific:
66 \begin{description}
68 \item[Bugs removed:]
69 Local stack frame is no longer molested, execution time is now charged
70 to correct functions.
72 \item[Accuracy increased:]
73 Profiler execution time is no longer charged to user's code,
74 calibration for platform is supported, file reads are not done \emph{by}
75 profiler \emph{during} profiling (and charged to user's code!).
77 \item[Speed increased:]
78 Overhead CPU cost was reduced by more than a factor of two (perhaps a
79 factor of five), lightweight profiler module is all that must be
80 loaded, and the report generating module (\code{pstats}) is not needed
81 during profiling.
83 \item[Recursive functions support:]
84 Cumulative times in recursive functions are correctly calculated;
85 recursive entries are counted.
87 \item[Large growth in report generating UI:]
88 Distinct profiles runs can be added together forming a comprehensive
89 report; functions that import statistics take arbitrary lists of
90 files; sorting criteria is now based on keywords (instead of 4 integer
91 options); reports shows what functions were profiled as well as what
92 profile file was referenced; output format has been improved.
94 \end{description}
97 \section{Instant Users Manual}
99 This section is provided for users that ``don't want to read the
100 manual.'' It provides a very brief overview, and allows a user to
101 rapidly perform profiling on an existing application.
103 To profile an application with a main entry point of \samp{foo()}, you
104 would add the following to your module:
106 \begin{verbatim}
107 import profile
108 profile.run("foo()")
109 \end{verbatim}
111 The above action would cause \samp{foo()} to be run, and a series of
112 informative lines (the profile) to be printed. The above approach is
113 most useful when working with the interpreter. If you would like to
114 save the results of a profile into a file for later examination, you
115 can supply a file name as the second argument to the \code{run()}
116 function:
118 \begin{verbatim}
119 import profile
120 profile.run("foo()", 'fooprof')
121 \end{verbatim}
123 When you wish to review the profile, you should use the methods in the
124 \code{pstats} module. Typically you would load the statistics data as
125 follows:
127 \begin{verbatim}
128 import pstats
129 p = pstats.Stats('fooprof')
130 \end{verbatim}
132 The class \code{Stats} (the above code just created an instance of
133 this class) has a variety of methods for manipulating and printing the
134 data that was just read into \samp{p}. When you ran
135 \code{profile.run()} above, what was printed was the result of three
136 method calls:
138 \begin{verbatim}
139 p.strip_dirs().sort_stats(-1).print_stats()
140 \end{verbatim}
142 The first method removed the extraneous path from all the module
143 names. The second method sorted all the entries according to the
144 standard module/line/name string that is printed (this is to comply
145 with the semantics of the old profiler). The third method printed out
146 all the statistics. You might try the following sort calls:
148 \begin{verbatim}
149 p.sort_stats('name')
150 p.print_stats()
151 \end{verbatim}
153 The first call will actually sort the list by function name, and the
154 second call will print out the statistics. The following are some
155 interesting calls to experiment with:
157 \begin{verbatim}
158 p.sort_stats('cumulative').print_stats(10)
159 \end{verbatim}
161 This sorts the profile by cumulative time in a function, and then only
162 prints the ten most significant lines. If you want to understand what
163 algorithms are taking time, the above line is what you would use.
165 If you were looking to see what functions were looping a lot, and
166 taking a lot of time, you would do:
168 \begin{verbatim}
169 p.sort_stats('time').print_stats(10)
170 \end{verbatim}
172 to sort according to time spent within each function, and then print
173 the statistics for the top ten functions.
175 You might also try:
177 \begin{verbatim}
178 p.sort_stats('file').print_stats('__init__')
179 \end{verbatim}
181 This will sort all the statistics by file name, and then print out
182 statistics for only the class init methods ('cause they are spelled
183 with \code{__init__} in them). As one final example, you could try:
185 \begin{verbatim}
186 p.sort_stats('time', 'cum').print_stats(.5, 'init')
187 \end{verbatim}
189 This line sorts statistics with a primary key of time, and a secondary
190 key of cumulative time, and then prints out some of the statistics.
191 To be specific, the list is first culled down to 50\% (re: \samp{.5})
192 of its original size, then only lines containing \code{init} are
193 maintained, and that sub-sub-list is printed.
195 If you wondered what functions called the above functions, you could
196 now (\samp{p} is still sorted according to the last criteria) do:
198 \begin{verbatim}
199 p.print_callers(.5, 'init')
200 \end{verbatim}
202 and you would get a list of callers for each of the listed functions.
204 If you want more functionality, you're going to have to read the
205 manual, or guess what the following functions do:
207 \begin{verbatim}
208 p.print_callees()
209 p.add('fooprof')
210 \end{verbatim}
213 \section{What Is Deterministic Profiling?}
214 \nodename{Deterministic Profiling}
216 \dfn{Deterministic profiling} is meant to reflect the fact that all
217 \dfn{function call}, \dfn{function return}, and \dfn{exception} events
218 are monitored, and precise timings are made for the intervals between
219 these events (during which time the user's code is executing). In
220 contrast, \dfn{statistical profiling} (which is not done by this
221 module) randomly samples the effective instruction pointer, and
222 deduces where time is being spent. The latter technique traditionally
223 involves less overhead (as the code does not need to be instrumented),
224 but provides only relative indications of where time is being spent.
226 In Python, since there is an interpreter active during execution, the
227 presence of instrumented code is not required to do deterministic
228 profiling. Python automatically provides a \dfn{hook} (optional
229 callback) for each event. In addition, the interpreted nature of
230 Python tends to add so much overhead to execution, that deterministic
231 profiling tends to only add small processing overhead in typical
232 applications. The result is that deterministic profiling is not that
233 expensive, yet provides extensive run time statistics about the
234 execution of a Python program.
236 Call count statistics can be used to identify bugs in code (surprising
237 counts), and to identify possible inline-expansion points (high call
238 counts). Internal time statistics can be used to identify ``hot
239 loops'' that should be carefully optimized. Cumulative time
240 statistics should be used to identify high level errors in the
241 selection of algorithms. Note that the unusual handling of cumulative
242 times in this profiler allows statistics for recursive implementations
243 of algorithms to be directly compared to iterative implementations.
246 \section{Reference Manual}
248 \renewcommand{\indexsubitem}{(profiler function)}
250 The primary entry point for the profiler is the global function
251 \code{profile.run()}. It is typically used to create any profile
252 information. The reports are formatted and printed using methods of
253 the class \code{pstats.Stats}. The following is a description of all
254 of these standard entry points and functions. For a more in-depth
255 view of some of the code, consider reading the later section on
256 Profiler Extensions, which includes discussion of how to derive
257 ``better'' profilers from the classes presented, or reading the source
258 code for these modules.
260 \begin{funcdesc}{profile.run}{string\optional{\, filename\optional{\, ...}}}
262 This function takes a single argument that has can be passed to the
263 \code{exec} statement, and an optional file name. In all cases this
264 routine attempts to \code{exec} its first argument, and gather profiling
265 statistics from the execution. If no file name is present, then this
266 function automatically prints a simple profiling report, sorted by the
267 standard name string (file/line/function-name) that is presented in
268 each line. The following is a typical output from such a call:
270 \small{
271 \begin{verbatim}
272 main()
273 2706 function calls (2004 primitive calls) in 4.504 CPU seconds
275 Ordered by: standard name
277 ncalls tottime percall cumtime percall filename:lineno(function)
278 2 0.006 0.003 0.953 0.477 pobject.py:75(save_objects)
279 43/3 0.533 0.012 0.749 0.250 pobject.py:99(evaluate)
281 \end{verbatim}
284 The first line indicates that this profile was generated by the call:\\
285 \code{profile.run('main()')}, and hence the exec'ed string is
286 \code{'main()'}. The second line indicates that 2706 calls were
287 monitored. Of those calls, 2004 were \dfn{primitive}. We define
288 \dfn{primitive} to mean that the call was not induced via recursion.
289 The next line: \code{Ordered by:\ standard name}, indicates that
290 the text string in the far right column was used to sort the output.
291 The column headings include:
293 \begin{description}
295 \item[ncalls ]
296 for the number of calls,
298 \item[tottime ]
299 for the total time spent in the given function (and excluding time
300 made in calls to sub-functions),
302 \item[percall ]
303 is the quotient of \code{tottime} divided by \code{ncalls}
305 \item[cumtime ]
306 is the total time spent in this and all subfunctions (i.e., from
307 invocation till exit). This figure is accurate \emph{even} for recursive
308 functions.
310 \item[percall ]
311 is the quotient of \code{cumtime} divided by primitive calls
313 \item[filename:lineno(function) ]
314 provides the respective data of each function
316 \end{description}
318 When there are two numbers in the first column (e.g.: \samp{43/3}),
319 then the latter is the number of primitive calls, and the former is
320 the actual number of calls. Note that when the function does not
321 recurse, these two values are the same, and only the single figure is
322 printed.
324 \end{funcdesc}
326 \begin{funcdesc}{pstats.Stats}{filename\optional{\, ...}}
327 This class constructor creates an instance of a ``statistics object''
328 from a \var{filename} (or set of filenames). \code{Stats} objects are
329 manipulated by methods, in order to print useful reports.
331 The file selected by the above constructor must have been created by
332 the corresponding version of \code{profile}. To be specific, there is
333 \emph{NO} file compatibility guaranteed with future versions of this
334 profiler, and there is no compatibility with files produced by other
335 profilers (e.g., the old system profiler).
337 If several files are provided, all the statistics for identical
338 functions will be coalesced, so that an overall view of several
339 processes can be considered in a single report. If additional files
340 need to be combined with data in an existing \code{Stats} object, the
341 \code{add()} method can be used.
342 \end{funcdesc}
345 \subsection{The \sectcode{Stats} Class}
347 \renewcommand{\indexsubitem}{(Stats method)}
349 \begin{funcdesc}{strip_dirs}{}
350 This method for the \code{Stats} class removes all leading path information
351 from file names. It is very useful in reducing the size of the
352 printout to fit within (close to) 80 columns. This method modifies
353 the object, and the stripped information is lost. After performing a
354 strip operation, the object is considered to have its entries in a
355 ``random'' order, as it was just after object initialization and
356 loading. If \code{strip_dirs()} causes two function names to be
357 indistinguishable (i.e., they are on the same line of the same
358 filename, and have the same function name), then the statistics for
359 these two entries are accumulated into a single entry.
360 \end{funcdesc}
363 \begin{funcdesc}{add}{filename\optional{\, ...}}
364 This method of the \code{Stats} class accumulates additional profiling
365 information into the current profiling object. Its arguments should
366 refer to filenames created by the corresponding version of
367 \code{profile.run()}. Statistics for identically named (re: file,
368 line, name) functions are automatically accumulated into single
369 function statistics.
370 \end{funcdesc}
372 \begin{funcdesc}{sort_stats}{key\optional{\, ...}}
373 This method modifies the \code{Stats} object by sorting it according to the
374 supplied criteria. The argument is typically a string identifying the
375 basis of a sort (example: \code{"time"} or \code{"name"}).
377 When more than one key is provided, then additional keys are used as
378 secondary criteria when the there is equality in all keys selected
379 before them. For example, sort_stats('name', 'file') will sort all
380 the entries according to their function name, and resolve all ties
381 (identical function names) by sorting by file name.
383 Abbreviations can be used for any key names, as long as the
384 abbreviation is unambiguous. The following are the keys currently
385 defined:
387 \begin{tableii}{|l|l|}{code}{Valid Arg}{Meaning}
388 \lineii{"calls"}{call count}
389 \lineii{"cumulative"}{cumulative time}
390 \lineii{"file"}{file name}
391 \lineii{"module"}{file name}
392 \lineii{"pcalls"}{primitive call count}
393 \lineii{"line"}{line number}
394 \lineii{"name"}{function name}
395 \lineii{"nfl"}{name/file/line}
396 \lineii{"stdname"}{standard name}
397 \lineii{"time"}{internal time}
398 \end{tableii}
400 Note that all sorts on statistics are in descending order (placing
401 most time consuming items first), where as name, file, and line number
402 searches are in ascending order (i.e., alphabetical). The subtle
403 distinction between \code{"nfl"} and \code{"stdname"} is that the
404 standard name is a sort of the name as printed, which means that the
405 embedded line numbers get compared in an odd way. For example, lines
406 3, 20, and 40 would (if the file names were the same) appear in the
407 string order 20, 3 and 40. In contrast, \code{"nfl"} does a numeric
408 compare of the line numbers. In fact, \code{sort_stats("nfl")} is the
409 same as \code{sort_stats("name", "file", "line")}.
411 For compatibility with the old profiler, the numeric arguments
412 \samp{-1}, \samp{0}, \samp{1}, and \samp{2} are permitted. They are
413 interpreted as \code{"stdname"}, \code{"calls"}, \code{"time"}, and
414 \code{"cumulative"} respectively. If this old style format (numeric)
415 is used, only one sort key (the numeric key) will be used, and
416 additional arguments will be silently ignored.
417 \end{funcdesc}
420 \begin{funcdesc}{reverse_order}{}
421 This method for the \code{Stats} class reverses the ordering of the basic
422 list within the object. This method is provided primarily for
423 compatibility with the old profiler. Its utility is questionable
424 now that ascending vs descending order is properly selected based on
425 the sort key of choice.
426 \end{funcdesc}
428 \begin{funcdesc}{print_stats}{restriction\optional{\, ...}}
429 This method for the \code{Stats} class prints out a report as described
430 in the \code{profile.run()} definition.
432 The order of the printing is based on the last \code{sort_stats()}
433 operation done on the object (subject to caveats in \code{add()} and
434 \code{strip_dirs())}.
436 The arguments provided (if any) can be used to limit the list down to
437 the significant entries. Initially, the list is taken to be the
438 complete set of profiled functions. Each restriction is either an
439 integer (to select a count of lines), or a decimal fraction between
440 0.0 and 1.0 inclusive (to select a percentage of lines), or a regular
441 expression (to pattern match the standard name that is printed). If
442 several restrictions are provided, then they are applied sequentially.
443 For example:
445 \begin{verbatim}
446 print_stats(.1, "foo:")
447 \end{verbatim}
449 would first limit the printing to first 10\% of list, and then only
450 print functions that were part of filename \samp{.*foo:}. In
451 contrast, the command:
453 \begin{verbatim}
454 print_stats("foo:", .1)
455 \end{verbatim}
457 would limit the list to all functions having file names \samp{.*foo:},
458 and then proceed to only print the first 10\% of them.
459 \end{funcdesc}
462 \begin{funcdesc}{print_callers}{restrictions\optional{\, ...}}
463 This method for the \code{Stats} class prints a list of all functions
464 that called each function in the profiled database. The ordering is
465 identical to that provided by \code{print_stats()}, and the definition
466 of the restricting argument is also identical. For convenience, a
467 number is shown in parentheses after each caller to show how many
468 times this specific call was made. A second non-parenthesized number
469 is the cumulative time spent in the function at the right.
470 \end{funcdesc}
472 \begin{funcdesc}{print_callees}{restrictions\optional{\, ...}}
473 This method for the \code{Stats} class prints a list of all function
474 that were called by the indicated function. Aside from this reversal
475 of direction of calls (re: called vs was called by), the arguments and
476 ordering are identical to the \code{print_callers()} method.
477 \end{funcdesc}
479 \begin{funcdesc}{ignore}{}
480 This method of the \code{Stats} class is used to dispose of the value
481 returned by earlier methods. All standard methods in this class
482 return the instance that is being processed, so that the commands can
483 be strung together. For example:
485 \begin{verbatim}
486 pstats.Stats('foofile').strip_dirs().sort_stats('cum') \
487 .print_stats().ignore()
488 \end{verbatim}
490 would perform all the indicated functions, but it would not return
491 the final reference to the \code{Stats} instance.%
492 \footnote{
493 This was once necessary, when Python would print any unused expression
494 result that was not \code{None}. The method is still defined for
495 backward compatibility.
497 \end{funcdesc}
500 \section{Limitations}
502 There are two fundamental limitations on this profiler. The first is
503 that it relies on the Python interpreter to dispatch \dfn{call},
504 \dfn{return}, and \dfn{exception} events. Compiled C code does not
505 get interpreted, and hence is ``invisible'' to the profiler. All time
506 spent in C code (including builtin functions) will be charged to the
507 Python function that invoked the C code. If the C code calls out
508 to some native Python code, then those calls will be profiled
509 properly.
511 The second limitation has to do with accuracy of timing information.
512 There is a fundamental problem with deterministic profilers involving
513 accuracy. The most obvious restriction is that the underlying ``clock''
514 is only ticking at a rate (typically) of about .001 seconds. Hence no
515 measurements will be more accurate that that underlying clock. If
516 enough measurements are taken, then the ``error'' will tend to average
517 out. Unfortunately, removing this first error induces a second source
518 of error...
520 The second problem is that it ``takes a while'' from when an event is
521 dispatched until the profiler's call to get the time actually
522 \emph{gets} the state of the clock. Similarly, there is a certain lag
523 when exiting the profiler event handler from the time that the clock's
524 value was obtained (and then squirreled away), until the user's code
525 is once again executing. As a result, functions that are called many
526 times, or call many functions, will typically accumulate this error.
527 The error that accumulates in this fashion is typically less than the
528 accuracy of the clock (i.e., less than one clock tick), but it
529 \emph{can} accumulate and become very significant. This profiler
530 provides a means of calibrating itself for a given platform so that
531 this error can be probabilistically (i.e., on the average) removed.
532 After the profiler is calibrated, it will be more accurate (in a least
533 square sense), but it will sometimes produce negative numbers (when
534 call counts are exceptionally low, and the gods of probability work
535 against you :-). ) Do \emph{NOT} be alarmed by negative numbers in
536 the profile. They should \emph{only} appear if you have calibrated
537 your profiler, and the results are actually better than without
538 calibration.
541 \section{Calibration}
543 The profiler class has a hard coded constant that is added to each
544 event handling time to compensate for the overhead of calling the time
545 function, and socking away the results. The following procedure can
546 be used to obtain this constant for a given platform (see discussion
547 in section Limitations above).
549 \begin{verbatim}
550 import profile
551 pr = profile.Profile()
552 pr.calibrate(100)
553 pr.calibrate(100)
554 pr.calibrate(100)
555 \end{verbatim}
557 The argument to calibrate() is the number of times to try to do the
558 sample calls to get the CPU times. If your computer is \emph{very}
559 fast, you might have to do:
561 \begin{verbatim}
562 pr.calibrate(1000)
563 \end{verbatim}
565 or even:
567 \begin{verbatim}
568 pr.calibrate(10000)
569 \end{verbatim}
571 The object of this exercise is to get a fairly consistent result.
572 When you have a consistent answer, you are ready to use that number in
573 the source code. For a Sun Sparcstation 1000 running Solaris 2.3, the
574 magical number is about .00053. If you have a choice, you are better
575 off with a smaller constant, and your results will ``less often'' show
576 up as negative in profile statistics.
578 The following shows how the trace_dispatch() method in the Profile
579 class should be modified to install the calibration constant on a Sun
580 Sparcstation 1000:
582 \begin{verbatim}
583 def trace_dispatch(self, frame, event, arg):
584 t = self.timer()
585 t = t[0] + t[1] - self.t - .00053 # Calibration constant
587 if self.dispatch[event](frame,t):
588 t = self.timer()
589 self.t = t[0] + t[1]
590 else:
591 r = self.timer()
592 self.t = r[0] + r[1] - t # put back unrecorded delta
593 return
594 \end{verbatim}
596 Note that if there is no calibration constant, then the line
597 containing the callibration constant should simply say:
599 \begin{verbatim}
600 t = t[0] + t[1] - self.t # no calibration constant
601 \end{verbatim}
603 You can also achieve the same results using a derived class (and the
604 profiler will actually run equally fast!!), but the above method is
605 the simplest to use. I could have made the profiler ``self
606 calibrating'', but it would have made the initialization of the
607 profiler class slower, and would have required some \emph{very} fancy
608 coding, or else the use of a variable where the constant \samp{.00053}
609 was placed in the code shown. This is a \strong{VERY} critical
610 performance section, and there is no reason to use a variable lookup
611 at this point, when a constant can be used.
614 \section{Extensions --- Deriving Better Profilers}
615 \nodename{Profiler Extensions}
617 The \code{Profile} class of module \code{profile} was written so that
618 derived classes could be developed to extend the profiler. Rather
619 than describing all the details of such an effort, I'll just present
620 the following two examples of derived classes that can be used to do
621 profiling. If the reader is an avid Python programmer, then it should
622 be possible to use these as a model and create similar (and perchance
623 better) profile classes.
625 If all you want to do is change how the timer is called, or which
626 timer function is used, then the basic class has an option for that in
627 the constructor for the class. Consider passing the name of a
628 function to call into the constructor:
630 \begin{verbatim}
631 pr = profile.Profile(your_time_func)
632 \end{verbatim}
634 The resulting profiler will call \code{your_time_func()} instead of
635 \code{os.times()}. The function should return either a single number
636 or a list of numbers (like what \code{os.times()} returns). If the
637 function returns a single time number, or the list of returned numbers
638 has length 2, then you will get an especially fast version of the
639 dispatch routine.
641 Be warned that you \emph{should} calibrate the profiler class for the
642 timer function that you choose. For most machines, a timer that
643 returns a lone integer value will provide the best results in terms of
644 low overhead during profiling. (os.times is \emph{pretty} bad, 'cause
645 it returns a tuple of floating point values, so all arithmetic is
646 floating point in the profiler!). If you want to substitute a
647 better timer in the cleanest fashion, you should derive a class, and
648 simply put in the replacement dispatch method that better handles your
649 timer call, along with the appropriate calibration constant :-).
652 \subsection{OldProfile Class}
654 The following derived profiler simulates the old style profiler,
655 providing errant results on recursive functions. The reason for the
656 usefulness of this profiler is that it runs faster (i.e., less
657 overhead) than the old profiler. It still creates all the caller
658 stats, and is quite useful when there is \emph{no} recursion in the
659 user's code. It is also a lot more accurate than the old profiler, as
660 it does not charge all its overhead time to the user's code.
662 \begin{verbatim}
663 class OldProfile(Profile):
665 def trace_dispatch_exception(self, frame, t):
666 rt, rtt, rct, rfn, rframe, rcur = self.cur
667 if rcur and not rframe is frame:
668 return self.trace_dispatch_return(rframe, t)
669 return 0
671 def trace_dispatch_call(self, frame, t):
672 fn = `frame.f_code`
674 self.cur = (t, 0, 0, fn, frame, self.cur)
675 if self.timings.has_key(fn):
676 tt, ct, callers = self.timings[fn]
677 self.timings[fn] = tt, ct, callers
678 else:
679 self.timings[fn] = 0, 0, {}
680 return 1
682 def trace_dispatch_return(self, frame, t):
683 rt, rtt, rct, rfn, frame, rcur = self.cur
684 rtt = rtt + t
685 sft = rtt + rct
687 pt, ptt, pct, pfn, pframe, pcur = rcur
688 self.cur = pt, ptt+rt, pct+sft, pfn, pframe, pcur
690 tt, ct, callers = self.timings[rfn]
691 if callers.has_key(pfn):
692 callers[pfn] = callers[pfn] + 1
693 else:
694 callers[pfn] = 1
695 self.timings[rfn] = tt+rtt, ct + sft, callers
697 return 1
700 def snapshot_stats(self):
701 self.stats = {}
702 for func in self.timings.keys():
703 tt, ct, callers = self.timings[func]
704 nor_func = self.func_normalize(func)
705 nor_callers = {}
706 nc = 0
707 for func_caller in callers.keys():
708 nor_callers[self.func_normalize(func_caller)]=\
709 callers[func_caller]
710 nc = nc + callers[func_caller]
711 self.stats[nor_func] = nc, nc, tt, ct, nor_callers
712 \end{verbatim}
715 \subsection{HotProfile Class}
717 This profiler is the fastest derived profile example. It does not
718 calculate caller-callee relationships, and does not calculate
719 cumulative time under a function. It only calculates time spent in a
720 function, so it runs very quickly (re: very low overhead). In truth,
721 the basic profiler is so fast, that is probably not worth the savings
722 to give up the data, but this class still provides a nice example.
724 \begin{verbatim}
725 class HotProfile(Profile):
727 def trace_dispatch_exception(self, frame, t):
728 rt, rtt, rfn, rframe, rcur = self.cur
729 if rcur and not rframe is frame:
730 return self.trace_dispatch_return(rframe, t)
731 return 0
733 def trace_dispatch_call(self, frame, t):
734 self.cur = (t, 0, frame, self.cur)
735 return 1
737 def trace_dispatch_return(self, frame, t):
738 rt, rtt, frame, rcur = self.cur
740 rfn = `frame.f_code`
742 pt, ptt, pframe, pcur = rcur
743 self.cur = pt, ptt+rt, pframe, pcur
745 if self.timings.has_key(rfn):
746 nc, tt = self.timings[rfn]
747 self.timings[rfn] = nc + 1, rt + rtt + tt
748 else:
749 self.timings[rfn] = 1, rt + rtt
751 return 1
754 def snapshot_stats(self):
755 self.stats = {}
756 for func in self.timings.keys():
757 nc, tt = self.timings[func]
758 nor_func = self.func_normalize(func)
759 self.stats[nor_func] = nc, nc, tt, 0, {}
760 \end{verbatim}