Perf tool examples

I am a big fan of the linux perf (perf events, perf tools) tool. It has already helped me several times debugging problems or solving performance bottlenecks.

Unfortunately, documentation is rather sparse about it. As such, I created a perf-examples repository that contains small example programs that show how certain perf features can help to find problems.

It’s currently work in progress and supposed to grow over time. Contributions are very welcome. All examples are structured into sub folders that correspond to the different perf subcommands (record, trace, stat …).

For example, the trace folder currently contains an example program that shows how mutex locking times can be traced.

#include <chrono>
#include <mutex>
#include <thread>

int main() {
    std::mutex mut;

    std::thread t([&] {
        std::lock_guard<std::mutex> lock(mut);
        std::this_thread::sleep_for(std::chrono::seconds(5));
    });

    std::this_thread::sleep_for(std::chrono::seconds(1));

    std::lock_guard<std::mutex> lock(mut);

    t.join();
}

Running:

make mutex_lock_trace

which will execute the perf command:

sudo perf trace ./mutex_lock

and give the following (truncated) output:

...
1001.986 (1000.208 ms): mutex_lock/15340  ... [continued]: nanosleep()) = 0
1002.008 (1000.217 ms): mutex_lock/15341 nanosleep(rqtp: 0x7f33de1f1ec0, rmtp: 0x7f33de1f1ec0                  ) ...
5002.419 (4000.411 ms): mutex_lock/15340 futex(uaddr: 0x7ffc609dde10, op: WAIT|PRIV, val: 2                    ) = 0
5002.496 ( 0.050 ms): mutex_lock/15340 futex(uaddr: 0x7f33de1f29d0, val: 15341, uaddr2: 0xca, val3: 139860746643920) = 0
5002.516 ( 0.010 ms): mutex_lock/15340 futex(uaddr: 0x7ffc609dde10, op: WAKE|PRIV, val: 1                    ) = 0
5002.593 ( 0.000 ms): mutex_lock/15340 exit_group(
5002.352 (5000.561 ms): mutex_lock/15341  ... [continued]: nanosleep()) = 0
5002.390 ( 0.010 ms): mutex_lock/15341 futex(uaddr: 0x7ffc609dde10, op: WAKE|PRIV, val: 1                    ) = 1
5002.449 ( 0.015 ms): mutex_lock/15341 madvise(start: 0x7f33dd9f2000, len_in: 8368128, behavior: DONTNEED    ) = 0
5002.453 ( 0.000 ms): mutex_lock/15341 exit(

The first column shows the time that has passed since the start of the program (milliseconds). The second column depicts the time that we spent in a certain syscall. We see that we spent four seconds waiting on the mutex call (third line, futex syscall). Running the example yourself you would see that the output is even nicely colored, highlighting long wait times. There are more examples in the repository that can be used and played around with to try different options.

Every perf related article has to mention Brendan Gregg’s perf page that contains a huge collection of perf examples. Those are mostly based on existing tools, though.

If you have similar stories of how perf helped you solve a particular problem which can be packed into a simple example, please feel free to contribute and let me know.