Benchmarking and results

IPv6 performance analysis

As part of the FreeBSD Foundation IPv6 validation and improvement work, a framework of scripts were created to allow running benchmarks in a reproducible way without user intervention at single user level. The framework can be found in perfoce (p4) at //depot/user/bz/bench/framework/... [1].

Test setup

Unless otherwise specified tests were run in the netperf cluster on the hydra1 and hydra2 machines, 2x 4core Intel(R) Xeon(R) CPU E5320, 1.86Ghz, 8Gb of memory, Chelsio T310 10Gbit/s cards connected crossover and hw.cxgb.nfilters=0 set to avoid a problem specific to a card. The BENCH kernel configuration used can be found here [2], which also references the tree used and p4 version numbers refer to. The only other tuning applied is kern.ipc.nmbclusters=2560000.

Depending on test and configuration (see the configuration files in the framework) all tests were run in the following sequence: 1 warm up run, 1..n runs, and that repeated 1..m times with a reboot in between each.

Note well!

The tests were not carried out to achieve maximum performance with minimal efforts. They were run to compare IPv6 to IPv4 performance in a close to default configuration. So note well, that we can do better on a tuned machine in that setup! The goal is to get IPv6 and IPv4 in parity.

2012-04-08 comparisons to 2011-12-15 kernel

The following detail pages will show the progress made so far and detail the results to further investigate.

2012-04-08 based kernel initial results @203041 & modifications

This further changed kernel includes proper IPv4 and IPv6 checksum updates in the LRO case, fixes a previously missed case for delayed checksum calculations, has support for cxgb(4), cxgbe(4) and ixgbe(4) TSO6 and IPv6 with LRO, as well as some further refinements. In addition it has some excessive IPv6 statistics not present in IPv4 disabled to see if this will make a noticable difference.

2012-03-22 based kernel results @203041 & local changes

This kernel is slightly newer than the 2012-03-03 one, also supporting the checksum offloading for IPv6 on loopback.

2012-03-21 based kernel initial results @203041 (2011-12-15 kernel)

This is basically the "where are we on loopback?" results to have a baseline for the same set of benchmarks to re-run and evaluate how performance follows code changes.

2012-03-03 based kernel initial results @206131 & local changes

This is a kernel with various improvements: cxgb(4) was updated to r231317 to support IPv6 offloading. UDPv6/IPv6 output path was improved especially reducing route lookups and cache behaviour. TCPv6 was mostly adjusted for offloading, and in addition to TSO6 and cheksum offloading also supports LRO for IPv6 now. Initial results show that IPv6 TCP performance in the offloading case is always at the level of IPv4. UDP packets per second send performance seems to be better (or depending on machine at least equally high) than on IPv4.

2011-12-15 based kernel initial results @203041

This is basically the "where are we?" results to have a baseline for the same set of benchmarks to re-run and evaluate how performance follows code changes.