User login |
stapprobes (5)
STAPPROBES(5) STAPPROBES(5)
NAME
stapprobes - systemtap probe points
DESCRIPTION
The following sections enumerate the variety of probe points supported
by the systemtap translator, and additional aliases defined by standard
tapset scripts.
The general probe point syntax is a dotted-symbol sequence. This
allows a breakdown of the event namespace into parts, somewhat like the
Domain Name System does on the Internet. Each component identifier may
be parametrized by a string or number literal, with a syntax like a
function call. A component may include a "*" character, to expand to a
set of matching probe points. Probe aliases likewise expand to other
probe points. Each and every resulting probe point is normally
resolved to some low-level system instrumentation facility (e.g., a
kprobe address, marker, or a timer configuration), otherwise the elabo-
ration phase will fail.
However, a probe point may be followed by a "?" character, to indicate
that it is optional, and that no error should result if it fails to
resolve. Optionalness passes down through all levels of alias/wildcard
expansion. Alternately, a probe point may be followed by a "!" charac-
ter, to indicate that it is both optional and sufficient. (Think
vaguely of the prolog cut operator.) If it does resolve, then no fur-
ther probe points in the same comma-separated list will be resolved.
Therefore, the "!" sufficiency mark only makes sense in a list of
probe point alternatives.
Additionally, a probe point may be followed by a "if (expr)" statement,
in order to enable/disable the probe point on-the-fly. With the "if"
statement, if the "expr" is false when the probe point is hit, the
whole probe body including alias's body is skipped. The condition is
stacked up through all levels of alias/wildcard expansion. So the final
condition becomes the logical-and of conditions of all expanded
alias/wildcard.
These are all syntactically valid probe points:
kernel.function("foo").return
syscall(22)
user.inode("/bin/vi").statement(0x2222)
end
syscall.*
kernel.function("no_such_function") ?
module("awol").function("no_such_function") !
signal.*? if (switch)
Probes may be broadly classified into "synchronous" and "asynchronous".
A "synchronous" event is deemed to occur when any processor executes an
instruction matched by the specification. This gives these probes a
reference point (instruction address) from which more contextual data
may be available. Other families of probe points refer to "asyn-
chronous" events such as timers/counters rolling over, where there is
no fixed reference point that is related. Each probe point specifica-
tion may match multiple locations (for example, using wildcards or
aliases), and all them are then probed. A probe declaration may also
contain several comma-separated specifications, all of which are
probed.
BEGIN/END/ERROR
The probe points begin and end are defined by the translator to refer
to the time of session startup and shutdown. All "begin" probe han-
dlers are run, in some sequence, during the startup of the session.
All global variables will have been initialized prior to this point.
All "end" probes are run, in some sequence, during the normal shutdown
of a session, such as in the aftermath of an exit () function call, or
an interruption from the user. In the case of an error-triggered shut-
down, "end" probes are not run. There are no target variables avail-
able in either context.
If the order of execution among "begin" or "end" probes is significant,
then an optional sequence number may be provided:
begin(N)
end(N)
The number N may be positive or negative. The probe handlers are run
in increasing order, and the order between handlers with the same se-
quence number is unspecified. When "begin" or "end" are given without
a sequence, they are effectively sequence zero.
The error probe point is similar to the end probe, except that each
such probe handler run when the session ends after errors have oc-
curred. In such cases, "end" probes are skipped, but each "error"
prober is still attempted. This kind of probe can be used to clean up
or emit a "final gasp". It may also be numerically parametrized to set
a sequence.
NEVER
The probe point never is specially defined by the translator to mean
"never". Its probe handler is never run, though its statements are an-
alyzed for symbol / type correctness as usual. This probe point may be
useful in conjunction with optional probes.
SYSCALL
The syscall.* aliases define several hundred probes, too many to sum-
marize here. They are:
syscall.NAME
syscall.NAME.return
Generally, two probes are defined for each normal system call as listed
in the syscalls(2) manual page, one for entry and one for return.
Those system calls that never return do not have a corresponding .re-
turn probe.
Each probe alias defines a variety of variables. Looking at the tapset
source code is the most reliable way. Generally, each variable listed
in the standard manual page is made available as a script-level vari-
able, so syscall.open exposes filename, flags, and mode. In addition,
a standard suite of variables is available at most aliases:
argstr A pretty-printed form of the entire argument list, without
parentheses.
name The name of the system call.
retstr For return probes, a pretty-printed form of the system-call re-
sult.
Not all probe aliases obey all of these general guidelines. Please re-
port any bothersome ones you encounter as a bug.
TIMERS
Intervals defined by the standard kernel "jiffies" timer may be used to
trigger probe handlers asynchronously. Two probe point variants are
supported by the translator:
timer.jiffies(N)
timer.jiffies(N).randomize(M)
The probe handler is run every N jiffies (a kernel-defined unit of
time, typically between 1 and 60 ms). If the "randomize" component is
given, a linearly distributed random value in the range [-M..+M] is
added to N every time the handler is run. N is restricted to a reason-
able range (1 to around a million), and M is restricted to be smaller
than N. There are no target variables provided in either context. It
is possible for such probes to be run concurrently on a multi-processor
computer.
Alternatively, intervals may be specified in units of time. There are
two probe point variants similar to the jiffies timer:
timer.ms(N)
timer.ms(N).randomize(M)
Here, N and M are specified in milliseconds, but the full options for
units are seconds (s/sec), milliseconds (ms/msec), microseconds
(us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not
supported for hertz timers.
The actual resolution of the timers depends on the target kernel. For
kernels prior to 2.6.17, timers are limited to jiffies resolution, so
intervals are rounded up to the nearest jiffies interval. After
2.6.17, the implementation uses hrtimers for tighter precision, though
the actual resolution will be arch-dependent. In either case, if the
"randomize" component is given, then the random value will be added to
the interval before any rounding occurs.
Profiling timers are also available to provide probes that execute on
all CPUs at the rate of the system tick. This probe takes no parame-
ters.
timer.profile
Full context information of the interrupted process is available, mak-
ing this probe suitable for a time-based sampling profiler.
DWARF
This family of probe points uses symbolic debugging information for the
target kernel/module/program, as may be found in unstripped executa-
bles, or the separate debuginfo packages. They allow placement of
probes logically into the execution path of the target program, by
specifying a set of points in the source or object code. When a match-
ing statement executes on any processor, the probe handler is run in
that context.
Points in a kernel, which are identified by module, source file, line
number, function name, or some combination of these.
Here is a list of probe point families currently supported. The .func-
tion variant places a probe near the beginning of the named function,
so that parameters are available as context variables. The .return
variant places a probe at the moment after the return from the named
function, so the return value is available as the "$return" context
variable. The .inline modifier for .function filters the results to
include only instances of inlined functions. The .call modifier se-
lects the opposite subset. Inline functions do not have an identifi-
able return point, so .return is not supported on .inline probes. The
.statement variant places a probe at the exact spot, exposing those lo-
cal variables that are visible there.
kernel.function(PATTERN)
kernel.function(PATTERN).call
kernel.function(PATTERN).return
kernel.function(PATTERN).inline
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).inline
kernel.statement(PATTERN)
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
In the above list, MPATTERN stands for a string literal that aims to
identify the loaded kernel module of interest. It may include "*",
"[]", and "?" wildcards. PATTERN stands for a string literal that aims
to identify a point in the program. It is made up of three parts:
? The first part is the name of a function, as would appear in the nm
program's output. This part may use the "*" and "?" wildcarding
operators to match multiple names.
? The second part is optional and begins with the "@" character. It
is followed by the path to the source file containing the function,
which may include a wildcard pattern, such as mm/slab*. If it does
not match as is, an implicit "*/" is optionally added before the
pattern, so that a script need only name the last few components of
a possibly long source directory path.
? Finally, the third part is optional if the file name part was giv-
en, and identifies the line number in the source file preceded by a
":" or a "+". The line number is assumed to be an absolute line
number if preceded by a ":", or relative to the entry of the func-
tion if preceded by a "+". All the lines in the function can be
matched with ":*". A range of lines x through y can be matched
with ":x-y".
As an alternative, PATTERN may be a numeric constant, indicating an ad-
dress. Such an address may be found from symbol tables of the appro-
priate kernel / module object file. It is verified against known
statement code boundaries, and will be relocated for use at run time.
In guru mode only, absolute kernel-space addresses may be specified
with the ".absolute" suffix. Such an address is considered already re-
located, as if it came from /proc/kallsyms, so it cannot be checked
against statement/instruction boundaries.
Some of the source-level context variables, such as function parame-
ters, locals, globals visible in the compilation unit, may be visible
to probe handlers. They may refer to these variables by prefixing
their name with "$" within the scripts. In addition, a special syntax
allows limited traversal of structures, pointers, and arrays.
$var refers to an in-scope variable "var". If it's an integer-like
type, it will be cast to a 64-bit int for systemtap script use.
String-like pointers (char *) may be copied to systemtap string
values using the kernel_string or user_string functions.
$var->field
traversal to a structure's field. The indirection operator may
be repeated to follow more levels of pointers.
$return
is available in return probes only for functions that are de-
clared with a return value.
$var[N]
indexes into an array. The index is given with a literal
number.
$$vars expands to a character string that is equivalent to
sprintf("parm1=%x ... parmN=%x var1=%x ... varN=%x",
parm1, ..., parmN, var1, ..., varN)
$$locals
expands to a subset of $$vars for only local variables.
$$parms
expands to a subset of $$vars for only function parame-
ters.
$$return
is available in return probes only. It expands to a
string that is equivalent to sprintf("return=%x", $re-
turn) if the probed function has a return value, or else
an empty string.
For ".return" probes, context variables other than the "$return"
value itself are only available for the function call parame-
ters. The expressions evaluate to the entry-time values of
those variables, since that is when a snapshot is taken. Other
local variables are not generally accessible, since by the time
a ".return" probe hits, the probed function will have already
returned.
USER-SPACE
Early prototype support for user-space probing is available in
the form of a non-symbolic probe point:
process(PID).statement(ADDRESS).absolute
is analogous to kernel.statement(ADDRESS).absolute in that both
use raw (unverified) virtual addresses and provide no $vari-
ables. The target PID parameter must identify a running pro-
cess, and ADDRESS should identify a valid instruction address.
All threads of that process will be probed.
Additional user-space probing is available in the following
forms:
process(PID).begin
process("PATH").begin
process.begin
process(PID).thread.begin
process("PATH").thread.begin
process.thread.begin
process(PID).end
process("PATH").end
process.end
process(PID).thread.end
process("PATH").thread.end
process.thread.end
process(PID).syscall
process("PATH").syscall
process.syscall
process(PID).syscall.return
process("PATH").syscall.return
process.syscall.return
process(PID).itrace
process("PATH").itrace
A .begin probe gets called when new process described by PID or
PATH gets created. A .thread.begin probe gets called when a new
thread described by PID or PATH gets created. A .end probe gets
called when process described by PID or PATH dies. A
.thread.end probe gets called when a thread described by PID or
PATH dies. A .syscall probe gets called when a thread described
by PID or PATH makes a system call. The system call number is
available in the $syscall context variable, and the first 6 ar-
guments of the system call are available in the $argN (ex.
$arg1, $arg2, ...) context variable. A .syscall.return probe
gets called when a thread described by PID or PATH returns from
a system call. The system call number is available in the
$syscall context variable, and the return value of the system
call is available in the $return context variable. A .itrace
probe gets called for every single step of the process described
by PID or PATH.
Note that PATH names refer to executables that are searched the
same way shells do: relative to the working directory if they
contain a "/" character, otherwise in $PATH. If a process probe
is specified without a PID or PATH, all user threads are probed.
PROCFS
These probe points allow procfs "files" in /proc/systemtap/MOD-
NAME to be created, read and written (MODNAME is the name of the
systemtap module). The proc filesystem is a pseudo-filesystem
which is used an an interface to kernel data structures. There
are four probe point variants supported by the translator:
procfs("PATH").read
procfs("PATH").write
procfs.read
procfs.write
PATH is the file name (relative to /proc/systemtap/MODNAME) to
be created. If no PATH is specified (as in the last two vari-
ants above), PATH defaults to "command".
When a user reads /proc/systemtap/MODNAME/PATH, the correspond-
ing procfs read probe is triggered. The string data to be read
should be assigned to a variable named $value, like this:
procfs("PATH").read { $value = "100\n" }
When a user writes into /proc/systemtap/MODNAME/PATH, the corre-
sponding procfs write probe is triggered. The data the user
wrote is available in the string variable named $value, like
this:
procfs("PATH").write { printf("user wrote: %s", $value) }
MARKERS
This family of probe points hooks up to static probing markers
inserted into the kernel or modules. These markers are special
macro calls inserted by kernel developers to make probing faster
and more reliable than with DWARF-based probes. Further, DWARF
debugging information is not required to probe markers.
Marker probe points begin with kernel. The next part names the
marker itself: mark("name"). The marker name string, which may
contain the usual wildcard characters, is matched against the
names given to the marker macros when the kernel and/or module
was compiled. Optionally, you can specify format("format").
Specifying the marker format string allows differentation be-
tween two markers with the same name but different marker format
strings.
The handler associated with a marker-based probe may read the
optional parameters specified at the macro call site. These are
named $arg1 through $argNN, where NN is the number of parameters
supplied by the macro. Number and string parameters are passed
in a type-safe manner.
The marker format string associated with a marker is available
in $format. And also the marker name string is avalable in
$name.
PERFORMANCE MONITORING HARDWARE
The perfmon family of probe points is used to access the perfor-
mance monitoring hardware available in modern processors. This
family of probes points needs the perfmon2 support in the kernel
to access the performance monitoring hardware.
Performance monitor hardware points begin with a perfmon. The
next part of the names the event being counted counter("event").
The event names are processor implementation specific with the
execption of the generic cycles and instructions events, which
are available on all processors. This sets up a counter on the
processor to count the number of events occuring on the proces-
sor. For more details on the performance monitoring events
available on a specific processor use the command perfmon2 com-
mand:
pfmon -l
$counter
is a handle used in the body of the probe for operations
involving the counter associated with the probe.
read_counter
is a function that is passed the handle for the perfmon
probe and returns the current count for the event.
EXAMPLES
Here are some example probe points, defining the associated
events.
begin, end, end
refers to the startup and normal shutdown of the session.
In this case, the handler would run once during startup
and twice during shutdown.
timer.jiffies(1000).randomize(200)
refers to a periodic interrupt, every 1000 +/- 200
jiffies.
kernel.function("*init*"), kernel.function("*exit*")
refers to all kernel functions with "init" or "exit" in
the name.
kernel.function("*@kernel/sched.c:240")
refers to any functions within the "kernel/sched.c" file
that span line 240.
kernel.mark("getuid")
refers to an STAP_MARK(getuid, ...) macro call in the
kernel.
module("usb*").function("*sync*").return
refers to the moment of return from all functions with
"sync" in the name in any of the USB drivers.
kernel.statement(0xc0044852)
refers to the first byte of the statement whose compiled
instructions include the given address in the kernel.
kernel.statement("*@kernel/sched.c:2917")
refers to the statement of line 2917 within "ker-
nel/sched.c".
kernel.statement("bio_init@fs/bio.c+3")
refers to the statement at line bio_init+3 within
"fs/bio.c".
syscall.*.return
refers to the group of probe aliases with any name in the
third position
SEE ALSO
stap(1), stapprobes.iosched(5), stapprobes.netdev(5), stap-
probes.nfs(5), stapprobes.nfsd(5), stapprobes.pagefault(5),
stapprobes.process(5), stapprobes.rpc(5), stapprobes.scsi(5),
stapprobes.signal(5), stapprobes.socket(5), stapprobes.tcp(5),
stapprobes.udp(5), proc(5)
Red Hat 2008-11-13 STAPPROBES(5)
|