Snapshot #5

About Articles How to contact me Projects Site Map

Joseph Koshy > Projects > PMC based Performance Measurement in FreeBSD > Code Snapshots > Snapshot #5

Snapshot #5, against -CURRENT of 20 Feb 2005

Summary

Snapshot date Against -CURRENT of date Format Download
20-Feb-2005 20-Feb-2005 Patch (-p1), gzip'ed Download (~100KB)

Announcement

## New snapshot of hardware PMC support code

I am pleased to announce a new snapshot of the hardware performance
counter support code.

Warning: This is pre-alpha code.  It may panic and behave nastily.
Please test on a scratch box.

## What's available

You can now answer the question "what are the hardware events
happening on this system?" on the following CPUs:

  - AMD Athlon64/Opteron
  - AMD Athlon
  - Intel P4 and P4/HTT processors

(Support for answering the next question, namely "which are the spots
of code related these events?" is being worked on).

## Code components

  - A kernel driver pmc(4).

  - A userland library ("libpmc", see pmc(3)) to access the driver.

  - Userland utilities to use the driver (pmcstat(8) and
    pmccontrol(8)).

  - Documentation in the form of manual pages.

## What can it do today?

  - Measure a whole bunch of hardware events.  See the documentation
    for pmc(3).

  - Supported PMC kinds:

    (a) Process-virtual PMCs: these PMCs count hardware events only
          when their target process is scheduled on a CPU,

    (b) System-wide PMCs: these PMCs count hardware events for
        the system as a whole.

  - Supported PMC modes:

    (a) "Counting mode" PMCs: these PMCs only count events, and do not
        sample the instruction pointer.

    "Sampling mode" PMCs are being worked on.


## Using the code

  - Download the patch.

  - Apply it to a freshly checked out -CURRENT source.

    # cd /usr/src
    # patch -p1 < PATCH-FILE

  - Update 'world'.

  - Add "options PMC_HOOKS" to your kernel config file, recompile
    and reboot the new kernel.

  - Load the new kernel module and start using it.

    # kldload pmc


## Examples

  - Example 1:  Measure the TLB miss behaviour of 'firefox' on an
    AMD Athlon.  Print counts every 1 second.

% ps -ax | grep firefox
... [snip]
 1884  v0  S      0:04.59  /usr/X11R6/lib/firefox/lib/firefox-0.9.3/firefox-bin
... [snip]

    'firefox' is already running so we attach to it using the '-t
    TARGET' option.  The '-w 1' option specifies the desired interval.

% pmcstat -p k7-l1-dtlb-miss-and-l2-dtlb-hits -p k7-l1-and-l2-dtlb-misses \
          -w 1 -t 1884
# p/k7-l1-dtlb-miss-and-l2-dtlb-hits p/k7-l1-and-l2-dtlb-misses
  ... [snip]
                               63415                      13455
                              124529                      35816
                              113868                      28945
                              152241                      39426
                              306551                      78661
                              290212                      61392
                               40013                      11361
                               38530                      11169
                              183136                      47750
                               45264                      12981
                              169459                      37038
                               81363                      19049
                             1306901                     451348
                             1504465                     557414
                              482502                     100939
                              498394                      76948
                              962112                     110082
                             2677131                     245249
                             1258533                     178191
                              812905                     166234
                              144888                      34476
                               89319                      21937
                              330546                      46530
                              282000                      39137
                               85583                      19415
                              330585                      53437
                               37653                      10805
                               37263                      10892
                               48671                      14793
                                1952                       1105
                                   0                          0
  ... [snip]

    Clearly this program can stress the TLB!



  - Example 2: Measure cycles interrupts were masked while the
    ATA driver's interrupt handling thread was executing while
    the 'diskinfo' command was scheduled.

    We need to be root to do this:

amd64# ps -ax | grep ata
  25  ??  WL     0:00.25 [irq14: ata0]
  26  ??  WL     0:00.00 [irq15: ata1]
  31  ??  WL     0:00.00 [irq20: atapci0]

    We setup pmcstat(8) to count cycles spent with the processors IF
    bit cleared and when the ata0 thread (pid 25) is executing.

amd64# diskinfo -c ad0 > /dev/null & \
  pmcstat -p k8-fr-interrupts-masked-while-pending-cycles -t 25 -w 1
# p/k8-fr-interrupts-masked-while-pending-cycles
  ... [snip]
                                               0
                                             644
                                               0
                                           27031
                                           31876
                                           41459
                                            2378
                                               0
                                               0
   ... [snip]



   - Example 3: Measure the total number of interrupts seen by the
     system while a particular command was executing.  Also count the
     number of cycles the CPU's IF bit was zero when the command was
     scheduled on a CPU.

amd64# pmcstat -p k8-fr-interrupts-masked-while-pending-cycles \
       -s k8-fr-taken-hardware-interrupts -w 1 diskinfo -c ad0 > /dev/null
# p/k8-fr-interrupts-masked-while-pending-cycles s/k8-fr-taken-hardware-interrupts
                                           22887                              1149
                                           88001                              1308
                                           48058                              6406
                                           34986                              7910
                                           47714                              7893
                                           22399                              1961

## Known Bugs

  - The P4 HTT code is prone to freezing or panic'ing.  If you turn
    off HTT, the P4 code works fine.

  - Sampling mode support is incomplete.  If you allocate and start a
    sampling mode PMC, you'll get an NMI, (if you are lucky).

## Next Steps (in no particular order)

  Please contact me if you would like to take up any of these.

  - Implement sampling modes.

  - Support Intel P-Pro and Pentium MMX PMC implementations.

  - Test suites.

  - A number of Intel P4 specific features (precise sampling,
    PMC cascading etc. remain to be implemented).

  - A port of PAPI.

  - userland tools
    - use PMC based instruction pointer sampling with
      gcc -pg.
    - enhance our profiling support code to use the ability to read
      process-mode PMC counts with the RDPMC instruction.
    - convert sampling mode output to gprof format.
    - create a tool that can correlate measured cache/tlb/etc.
      behaviour with data structure layout and code layout.

  - Write documentation suitable for /usr/share/doc/papers/.

Contact: jkoshy@FreeBSD.org
Last Modified: Sat Apr 21 22:53:24 2007
Site Search Google