2 .\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
3 .\" The contents of this file are subject to the terms of the Common Development and Distribution License (the "License"). You may not use this file except in compliance with the License.
4 .\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. See the License for the specific language governing permissions and limitations under the License.
5 .\" When distributing Covered Code, include this CDDL HEADER in each file and include the License file at usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your own identifying information: Portions Copyright [yyyy] [name of copyright owner]
6 .TH CPC_BIND_CURLWP 3CPC "Mar 05, 2007"
8 cpc_bind_curlwp, cpc_bind_pctx, cpc_bind_cpu, cpc_unbind, cpc_request_preset,
9 cpc_set_restart \- bind request sets to hardware counters
13 cc [ \fIflag\fR\&.\|.\|. ] \fIfile\fR\&.\|.\|. \fB-lcpc\fR [ \fIlibrary\fR\&.\|.\|. ]
16 \fBint\fR \fBcpc_bind_curlwp\fR(\fBcpc_t *\fR\fIcpc\fR, \fBcpc_set_t *\fR\fIset\fR, \fBuint_t\fR \fIflags\fR);
21 \fBint\fR \fBcpc_bind_pctx\fR(\fBcpc_t *\fR\fIcpc\fR, \fBpctx_t *\fR\fIpctx\fR, \fBid_t\fR \fIid\fR, \fBcpc_set_t *\fR\fIset\fR,
22 \fBuint_t\fR \fIflags\fR);
27 \fBint\fR \fBcpc_bind_cpu\fR(\fBcpc_t *\fR\fIcpc\fR, \fBprocessorid_t\fR \fIid\fR, \fBcpc_set_t *\fR\fIset\fR,
28 \fBuint_t\fR \fIflags\fR);
33 \fBint\fR \fBcpc_unbind\fR(\fBcpc_t *\fR\fIcpc\fR, \fBcpc_set_t *\fR\fIset\fR);
38 \fBint\fR \fBcpc_request_preset\fR(\fBcpc_t *\fR\fIcpc\fR, \fBint\fR \fIindex\fR, \fBuint64_t\fR \fIpreset\fR);
43 \fBint\fR \fBcpc_set_restart\fR(\fBcpc_t *\fR\fIcpc\fR, \fBcpc_set_t *\fR\fIset\fR);
49 These functions program the processor's hardware counters according to the
50 requests contained in the \fIset\fR argument. If these functions are
51 successful, then upon return the physical counters will have been assigned to
52 count events on behalf of each request in the set, and each counter will be
53 enabled as configured.
56 The \fBcpc_bind_curlwp()\fR function binds the set to the calling \fBLWP\fR. If
57 successful, a performance counter context is associated with the \fBLWP\fR that
58 allows the system to virtualize the hardware counters to that specific
62 By default, the system binds the set to the current \fBLWP\fR only. If the
63 \fBCPC_BIND_LWP_INHERIT\fR flag is present in the \fIflags\fR argument,
64 however, any subsequent \fBLWP\fRs created by the current \fBLWP\fR will
65 inherit a copy of the request set. The newly created \fBLWP\fR will have its
66 virtualized 64-bit counters initialized to the preset values specified in
67 \fIset\fR, and the counters will be enabled and begin counting events on behalf
68 of the new \fBLWP\fR. This automatic inheritance behavior can be useful when
69 dealing with multithreaded programs to determine aggregate statistics for the
73 If the \fBCPC_BIND_LWP_INHERIT\fR flag is specified and any of the requests in
74 the set have the \fBCPC_OVF_NOTIFY_EMT\fR flag set, the process will
75 immediately dispatch a \fBSIGEMT\fR signal to the freshly created \fBLWP\fR so
76 that it can preset its counters appropriately on the new \fBLWP\fR. This
77 initialization condition can be detected using \fBcpc_set_sample\fR(3CPC) and
78 looking at the counter value for any requests with \fBCPC_OVF_NOTIFY_EMT\fR
79 set. The value of any such counters will be \fBUINT64_MAX\fR.
82 The \fBcpc_bind_pctx()\fR function binds the set to the \fBLWP\fR specified by
83 the \fIpctx\fR-\fIid\fR pair, where \fIpctx\fR refers to a handle returned from
84 \fBlibpctx\fR and \fIid\fR is the ID of the desired \fBLWP\fR in the target
85 process. If successful, a performance counter context is associated with the
86 specified \fBLWP\fR and the system virtualizes the hardware counters to that
87 specific \fBLWP\fR. The \fIflags\fR argument is reserved for future use and
88 must always be \fB0\fR.
91 The \fBcpc_bind_cpu()\fR function binds the set to the specified CPU and
92 measures events occurring on that CPU regardless of which \fBLWP\fR is running.
93 Only one such binding can be active on the specified CPU at a time. As long as
94 any application has bound a set to a CPU, per-\fBLWP\fR counters are
95 unavailable and any attempt to use either \fBcpc_bind_curlwp()\fR or
96 \fBcpc_bind_pctx()\fR returns \fBEAGAIN\fR. The first invocation of
97 \fBcpc_bind_cpu()\fR invalidates all currently bound per-\fBLWP\fR counter
98 sets, and any attempt to sample an invalidated set returns \fBEAGAIN\fR. To
99 bind to a CPU, the library binds the calling \fBLWP\fR to the measured CPU with
100 \fBprocessor_bind\fR(2). The application must not change its processor binding
101 until after it has unbound the set with \fBcpc_unbind()\fR. The \fIflags\fR
102 argument is reserved for future use and must always be \fB0\fR.
105 The \fBcpc_request_preset()\fR function updates the preset and current value
106 stored in the indexed request within the currently bound set, thereby changing
107 the starting value for the specified request for the calling \fBLWP\fR only,
108 which takes effect at the next call to \fBcpc_set_restart()\fR.
111 When a performance counter counting on behalf of a request with the
112 \fBCPC_OVF_NOTIFY_EMT\fR flag set overflows, the performance counters are
113 frozen and the \fBLWP\fR to which the set is bound receives a \fBSIGEMT\fR
114 signal. The \fBcpc_set_restart()\fR function can be called from a \fBSIGEMT\fR
115 signal handler function to quickly restart the hardware counters. Counting
116 begins from each request's original preset (see
117 \fBcpc_set_add_request\fR(3CPC)), or from the preset specified in a prior call
118 to \fBcpc_request_preset()\fR. Applications performing performance counter
119 overflow profiling should use the \fBcpc_set_restart()\fR function to quickly
120 restart counting after receiving a \fBSIGEMT\fR overflow signal and recording
121 any relevant program state.
124 The \fBcpc_unbind()\fR function unbinds the set from the resource to which it
125 is bound. All hardware resources associated with the bound set are freed and if
126 the set was bound to a CPU, the calling \fBLWP\fR is unbound from the
127 corresponding CPU. See \fBprocessor_bind\fR(2).
131 Upon successful completion these functions return 0. Otherwise, -1 is returned
132 and \fBerrno\fR is set to indicate the error.
136 Applications wanting to get detailed error values should register an error
137 handler with \fBcpc_seterrhndlr\fR(3CPC). Otherwise, the library will output a
138 specific error description to \fBstderr\fR.
141 These functions will fail if:
148 For \fBcpc_bind_curlwp()\fR, the system has Pentium 4 processors with
149 HyperThreading and at least one physical processor has more than one hardware
150 thread online. See NOTES.
152 For \fBcpc_bind_cpu()\fR, the process does not have the \fIcpc_cpu\fR privilege
153 to access the CPU's counters.
155 For \fBcpc_bind_curlwp()\fR, \fBcpc_bind_cpc()\fR, and \fBcpc_bind_pctx()\fR,
156 access to the requested hypervisor event was denied.
165 For \fBcpc_bind_curlwp()\fR and \fBcpc_bind_pctx()\fR, the performance counters
166 are not available for use by the application.
168 For \fBcpc_bind_cpu()\fR, another process has already bound to this CPU. Only
169 one process is allowed to bind to a CPU at a time and only one set can be bound
179 The set does not contain any requests or \fBcpc_set_add_request()\fR was not
182 The value given for an attribute of a request is out of range.
184 The system could not assign a physical counter to each request in the system.
187 One or more requests in the set conflict and might not be programmed
190 The \fIset\fR was not created with the same \fIcpc\fR handle.
192 For \fBcpc_bind_cpu()\fR, the specified processor does not exist.
194 For \fBcpc_unbind()\fR, the set is not bound.
196 For \fBcpc_request_preset()\fR and \fBcpc_set_restart()\fR, the calling
197 \fBLWP\fR does not have a bound set.
206 For \fBcpc_bind_cpu()\fR, the specified processor is not online.
215 The \fBcpc_bind_curlwp()\fR function was called with the
216 \fBCPC_OVF_NOTIFY_EMT\fR flag, but the underlying processor is not capable of
217 detecting counter overflow.
226 For \fBcpc_bind_pctx()\fR, the specified \fBLWP\fR in the target process does
232 \fBExample 1 \fRUse hardware performance counters to measure events in a
236 The following example demonstrates how a standalone application can be
237 instrumented with the \fBlibcpc\fR(3LIB) functions to use hardware performance
238 counters to measure events in a process. The application performs 20 iterations
239 of a computation, measuring the counter values for each iteration. By default,
240 the example makes use of two counters to measure external cache references and
241 external cache hits. These options are only appropriate for UltraSPARC
242 processors. By setting the EVENT0 and EVENT1 environment variables to other
243 strings (a list of which can be obtained from the \fB-h\fR option of the
244 \fBcpustat\fR(1M) or \fBcputrack\fR(1) utilities), other events can be counted.
245 The \fBerror()\fR routine is assumed to be a user-provided routine analogous to
246 the familiar \fBprintf\fR(3C) function from the C library that also performs an
247 \fBexit\fR(2) after printing the message.
252 #include <inttypes.h>
260 main(int argc, char *argv[])
263 char *event0 = NULL, *event1 = NULL;
266 cpc_buf_t *diff, *after, *before;
270 if ((cpc = cpc_open(CPC_VER_CURRENT)) == NULL)
271 error("perf counters unavailable: %s", strerror(errno));
273 if ((event0 = getenv("EVENT0")) == NULL)
275 if ((event1 = getenv("EVENT1")) == NULL)
278 if ((set = cpc_set_create(cpc)) == NULL)
279 error("could not create set: %s", strerror(errno));
281 if ((ind0 = cpc_set_add_request(cpc, set, event0, 0, CPC_COUNT_USER, 0,
283 error("could not add first request: %s", strerror(errno));
285 if ((ind1 = cpc_set_add_request(cpc, set, event1, 0, CPC_COUNT_USER, 0,
287 error("could not add first request: %s", strerror(errno));
289 if ((diff = cpc_buf_create(cpc, set)) == NULL)
290 error("could not create buffer: %s", strerror(errno));
291 if ((after = cpc_buf_create(cpc, set)) == NULL)
292 error("could not create buffer: %s", strerror(errno));
293 if ((before = cpc_buf_create(cpc, set)) == NULL)
294 error("could not create buffer: %s", strerror(errno));
296 if (cpc_bind_curlwp(cpc, set, 0) == -1)
297 error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
299 for (iter = 1; iter <= 20; iter++) {
301 if (cpc_set_sample(cpc, set, before) == -1)
304 /* ==> Computation to be measured goes here <== */
306 if (cpc_set_sample(cpc, set, after) == -1)
309 cpc_buf_sub(cpc, diff, after, before);
310 cpc_buf_get(cpc, diff, ind0, &val0);
311 cpc_buf_get(cpc, diff, ind1, &val1);
313 (void) printf("%3d: %" PRId64 " %" PRId64 "\en", iter,
318 error("cannot sample set: %s", strerror(errno));
328 \fBExample 2 \fRWrite a signal handler to catch overflow signals.
331 The following example builds on Example 1 and demonstrates how to write the
332 signal handler to catch overflow signals. A counter is preset so that it is
333 1000 counts short of overflowing. After 1000 counts the signal handler is
349 emt_handler(int sig, siginfo_t *sip, void *arg)
351 ucontext_t *uap = arg;
354 if (sig != SIGEMT || sip->si_code != EMT_CPCOVF) {
355 psignal(sig, "example");
356 psiginfo(sip, "example");
360 (void) printf("lwp%d - si_addr %p ucontext: %%pc %p %%sp %p\en",
361 _lwp_self(), (void *)sip->si_addr,
362 (void *)uap->uc_mcontext.gregs[PC],
363 (void *)uap->uc_mcontext.gregs[SP]);
365 if (cpc_set_sample(cpc, set, buf) != 0)
366 error("cannot sample: %s", strerror(errno));
368 cpc_buf_get(cpc, buf, index, &val);
370 (void) printf("0x%" PRIx64"\en", val);
371 (void) fflush(stdout);
374 * Update a request's preset and restart the counters. Counters which
375 * have not been preset with cpc_request_preset() will resume counting
376 * from their current value.
378 (cpc_request_preset(cpc, ind1, val1) != 0)
379 error("cannot set preset for request %d: %s", ind1,
381 if (cpc_set_restart(cpc, set) != 0)
382 error("cannot restart lwp%d: %s", _lwp_self(), strerror(errno));
389 The setup code, which can be positioned after the code that opens the CPC
390 library and creates a set:
395 #define PRESET (UINT64_MAX - 999ull)
397 struct sigaction act;
399 act.sa_sigaction = emt_handler;
400 bzero(&act.sa_mask, sizeof (act.sa_mask));
401 act.sa_flags = SA_RESTART|SA_SIGINFO;
402 if (sigaction(SIGEMT, &act, NULL) == -1)
403 error("sigaction: %s", strerror(errno));
405 if ((index = cpc_set_add_request(cpc, set, event, PRESET,
406 CPC_COUNT_USER | CPC_OVF_NOTIFY_EMT, 0, NULL)) != 0)
407 error("cannot add request to set: %s", strerror(errno));
409 if ((buf = cpc_buf_create(cpc, set)) == NULL)
410 error("cannot create buffer: %s", strerror(errno));
412 if (cpc_bind_curlwp(cpc, set, 0) == -1)
413 error("cannot bind lwp%d: %s", _lwp_self(), strerror(errno));
415 for (iter = 1; iter <= 20; iter++) {
416 /* ==> Computation to be measured goes here <== */
419 cpc_unbind(cpc, set); /* done */
426 See \fBattributes\fR(5) for descriptions of the following attributes:
434 ATTRIBUTE TYPE ATTRIBUTE VALUE
436 Interface Stability Evolving
444 \fBcpustat\fR(1M), \fBcputrack\fR(1), \fBpsrinfo\fR(1M),
445 \fBprocessor_bind\fR(2), \fBcpc_seterrhndlr\fR(3CPC),
446 \fBcpc_set_sample\fR(3CPC), \fBlibcpc\fR(3LIB), \fBattributes\fR(5)
450 When a set is bound, the system assigns a physical hardware counter to count on
451 behalf of each request in the set. If such an assignment is not possible for
452 all requests in the set, the bind function returns -1 and sets \fBerrno\fR to
453 \fBEINVAL\fR. The assignment of requests to counters depends on the
454 capabilities of the available counters. Some processors (such as Pentium 4)
455 have a complicated counter control mechanism that requires the reservation of
456 limited hardware resources beyond the actual counters. It could occur that two
457 requests for different events might be impossible to count at the same time due
458 to these limited hardware resources. See the processor manual as referenced by
459 \fBcpc_cpuref\fR(3CPC) for details about the underlying processor's
460 capabilities and limitations.
463 Some processors can be configured to dispatch an interrupt when a physical
464 counter overflows. The most obvious use for this facility is to ensure that the
465 full 64-bit counter values are maintained without repeated sampling. Certain
466 hardware, such as the UltraSPARC processor, does not record which counter
467 overflowed. A more subtle use for this facility is to preset the counter to a
468 value slightly less than the maximum value, then use the resulting interrupt to
469 catch the counter overflow associated with that event. The overflow can then be
470 used as an indication of the frequency of the occurrence of that event.
473 The interrupt generated by the processor might not be particularly precise.
474 That is, the particular instruction that caused the counter overflow might be
475 earlier in the instruction stream than is indicated by the program counter
476 value in the ucontext.
479 When a request is added to a set with the \fBCPC_OVF_NOTIFY_EMT\fR flag set,
480 then as before, the control registers and counter are preset from the 64-bit
481 preset value given. When the flag is set, however, the kernel arranges to send
482 the calling process a \fBSIGEMT\fR signal when the overflow occurs. The
483 \fBsi_code\fR member of the corresponding \fBsiginfo\fR structure is set to
484 \fBEMT_CPCOVF\fR and the \fBsi_addr\fR member takes the program counter value
485 at the time the overflow interrupt was delivered. Counting is disabled until
486 the set is bound again.
489 If the \fBCPC_CAP_OVERFLOW_PRECISE\fR bit is set in the value returned by
490 \fBcpc_caps\fR(3CPC), the processor is able to determine precisely which
491 counter has overflowed after receiving the overflow interrupt. On such
492 processors, the \fBSIGEMT\fR signal is sent only if a counter overflows and the
493 request that the counter is counting has the \fBCPC_OVF_NOTIFY_EMT\fR flag set.
494 If the capability is not present on the processor, the system sends a
495 \fBSIGEMT\fR signal to the process if any of its requests have the
496 \fBCPC_OVF_NOTIFY_EMT\fR flag set and any counter in its set overflows.
499 Different processors have different counter ranges available, though all
500 processors supported by Solaris allow at least 31 bits to be specified as a
501 counter preset value. Portable preset values lie in the range \fBUINT64_MAX\fR
502 to \fBUINT64_MAX\fR-\fBINT32_MAX\fR.
505 The appropriate preset value will often need to be determined experimentally.
506 Typically, this value will depend on the event being measured as well as the
507 desire to minimize the impact of the act of measurement on the event being
508 measured. Less frequent interrupts and samples lead to less perturbation of the
512 If the processor cannot detect counter overflow, bind will fail and return
513 \fBENOTSUP\fR. Only user events can be measured using this technique. See
518 Most Pentium 4 events require the specification of an event mask for counting.
519 The event mask is specified with the \fIemask\fR attribute.
522 Pentium 4 processors with HyperThreading Technology have only one set of
523 hardware counters per physical processor. To use \fBcpc_bind_curlwp()\fR or
524 \fBcpc_bind_pctx()\fR to measure per-\fBLWP\fR events on a system with Pentium
525 4 HT processors, a system administrator must first take processors in the
526 system offline until each physical processor has only one hardware thread
527 online (See the \fB-p\fR option to \fBpsrinfo\fR(1M)). If a second hardware
528 thread is brought online, all per-\fBLWP\fR bound contexts will be invalidated
529 and any attempt to sample or bind a CPC set will return \fBEAGAIN\fR.
532 Only one CPC set at a time can be bound to a physical processor with
533 \fBcpc_bind_cpu()\fR. Any call to \fBcpc_bind_cpu()\fR that attempts to bind a
534 set to a processor that shares a physical processor with a processor that
535 already has a CPU-bound set returns an error.
538 To measure the shared state on a Pentium 4 processor with HyperThreading, the
539 \fIcount_sibling_usr\fR and \fIcount_sibling_sys\fR attributes are provided for
540 use with \fBcpc_bind_cpu()\fR. These attributes behave exactly as the
541 \fBCPC_COUNT_USER\fR and \fBCPC_COUNT_SYSTEM\fR request flags, except that they
542 act on the sibling hardware thread sharing the physical processor with the CPU
543 measured by \fBcpc_bind_cpu()\fR. Some CPC sets will fail to bind due to
544 resource constraints. The most common type of resource constraint is an ESCR
545 conflict among one or more requests in the set. For example, the branch_retired
546 event cannot be measured on counters 12 and 13 simultaneously because both
547 counters require the \fBCRU_ESCR2\fR ESCR to measure this event. To measure
548 \fIbranch_retired\fR events simultaneously on more than one counter, use
549 counters such that one counter uses \fBCRU_ESCR2\fR and the other counter uses
550 CRU_ESCR3. See the processor documentation for details.