Joseph Koshy > Projects > PMC based Performance Measurement in FreeBSD > Code Snapshots > Snapshot #5
Snapshot #5, against -CURRENT of 20 Feb 2005
Summary
|
Snapshot date
|
Against -CURRENT of date
|
Format
|
Download
|
|
20-Feb-2005
|
20-Feb-2005
|
Patch (-p1), gzip'ed
|
Download (~100KB)
|
Announcement
## New snapshot of hardware PMC support code
I am pleased to announce a new snapshot of the hardware performance
counter support code.
Warning: This is pre-alpha code. It may panic and behave nastily.
Please test on a scratch box.
## What's available
You can now answer the question "what are the hardware events
happening on this system?" on the following CPUs:
- AMD Athlon64/Opteron
- AMD Athlon
- Intel P4 and P4/HTT processors
(Support for answering the next question, namely "which are the spots
of code related these events?" is being worked on).
## Code components
- A kernel driver pmc(4).
- A userland library ("libpmc", see pmc(3)) to access the driver.
- Userland utilities to use the driver (pmcstat(8) and
pmccontrol(8)).
- Documentation in the form of manual pages.
## What can it do today?
- Measure a whole bunch of hardware events. See the documentation
for pmc(3).
- Supported PMC kinds:
(a) Process-virtual PMCs: these PMCs count hardware events only
when their target process is scheduled on a CPU,
(b) System-wide PMCs: these PMCs count hardware events for
the system as a whole.
- Supported PMC modes:
(a) "Counting mode" PMCs: these PMCs only count events, and do not
sample the instruction pointer.
"Sampling mode" PMCs are being worked on.
## Using the code
- Download the patch.
- Apply it to a freshly checked out -CURRENT source.
# cd /usr/src
# patch -p1 < PATCH-FILE
- Update 'world'.
- Add "options PMC_HOOKS" to your kernel config file, recompile
and reboot the new kernel.
- Load the new kernel module and start using it.
# kldload pmc
## Examples
- Example 1: Measure the TLB miss behaviour of 'firefox' on an
AMD Athlon. Print counts every 1 second.
% ps -ax | grep firefox
... [snip]
1884 v0 S 0:04.59 /usr/X11R6/lib/firefox/lib/firefox-0.9.3/firefox-bin
... [snip]
'firefox' is already running so we attach to it using the '-t
TARGET' option. The '-w 1' option specifies the desired interval.
% pmcstat -p k7-l1-dtlb-miss-and-l2-dtlb-hits -p k7-l1-and-l2-dtlb-misses \
-w 1 -t 1884
# p/k7-l1-dtlb-miss-and-l2-dtlb-hits p/k7-l1-and-l2-dtlb-misses
... [snip]
63415 13455
124529 35816
113868 28945
152241 39426
306551 78661
290212 61392
40013 11361
38530 11169
183136 47750
45264 12981
169459 37038
81363 19049
1306901 451348
1504465 557414
482502 100939
498394 76948
962112 110082
2677131 245249
1258533 178191
812905 166234
144888 34476
89319 21937
330546 46530
282000 39137
85583 19415
330585 53437
37653 10805
37263 10892
48671 14793
1952 1105
0 0
... [snip]
Clearly this program can stress the TLB!
- Example 2: Measure cycles interrupts were masked while the
ATA driver's interrupt handling thread was executing while
the 'diskinfo' command was scheduled.
We need to be root to do this:
amd64# ps -ax | grep ata
25 ?? WL 0:00.25 [irq14: ata0]
26 ?? WL 0:00.00 [irq15: ata1]
31 ?? WL 0:00.00 [irq20: atapci0]
We setup pmcstat(8) to count cycles spent with the processors IF
bit cleared and when the ata0 thread (pid 25) is executing.
amd64# diskinfo -c ad0 > /dev/null & \
pmcstat -p k8-fr-interrupts-masked-while-pending-cycles -t 25 -w 1
# p/k8-fr-interrupts-masked-while-pending-cycles
... [snip]
0
644
0
27031
31876
41459
2378
0
0
... [snip]
- Example 3: Measure the total number of interrupts seen by the
system while a particular command was executing. Also count the
number of cycles the CPU's IF bit was zero when the command was
scheduled on a CPU.
amd64# pmcstat -p k8-fr-interrupts-masked-while-pending-cycles \
-s k8-fr-taken-hardware-interrupts -w 1 diskinfo -c ad0 > /dev/null
# p/k8-fr-interrupts-masked-while-pending-cycles s/k8-fr-taken-hardware-interrupts
22887 1149
88001 1308
48058 6406
34986 7910
47714 7893
22399 1961
## Known Bugs
- The P4 HTT code is prone to freezing or panic'ing. If you turn
off HTT, the P4 code works fine.
- Sampling mode support is incomplete. If you allocate and start a
sampling mode PMC, you'll get an NMI, (if you are lucky).
## Next Steps (in no particular order)
Please contact me if you would like to take up any of these.
- Implement sampling modes.
- Support Intel P-Pro and Pentium MMX PMC implementations.
- Test suites.
- A number of Intel P4 specific features (precise sampling,
PMC cascading etc. remain to be implemented).
- A port of PAPI.
- userland tools
- use PMC based instruction pointer sampling with
gcc -pg.
- enhance our profiling support code to use the ability to read
process-mode PMC counts with the RDPMC instruction.
- convert sampling mode output to gprof format.
- create a tool that can correlate measured cache/tlb/etc.
behaviour with data structure layout and code layout.
- Write documentation suitable for /usr/share/doc/papers/.
|
Contact: jkoshy@FreeBSD.org
|
Last Modified:
Sat Apr 21 22:53:24 2007
|
|