200805DevSummit - BSDCan 2008 FreeBSD Developer summit summary

About this document

These are my personal notes. They do not reflect any official statement of the FreeBSD project, etc. This document exists to inform others of what was going on and what was discussed during the DevSummit at BSDCan08. Initially targeted to FreeBSD developers only, I was asked and allowed to publish this summary.
(Ignore 'errors' due to my different states of consciousness while writing on the plane or train back home).

What is a Developer summit?

A developer summit is a get-together of some FreeBSD developers to present work which was done, talk about ongoing work or future plans. It is not a place to make decisions, as usually only a smaller quorum would be there, which would mean excluding all the others.

Such an event is often, but not necessarily aligned with a BSD conference, when developers are on-site then anyway.

In addition to developers, often people from companies or other FreeBSD derived projects, interested in using FreeBSD/interacting with the FreeBSD community/talking about their FreeBSD use cases, are invited. This is a great opportunity to bring together both sides and make life easier for all of us.

What happened this time?

Let me start introducing you to the DevSummit Wiki page. You might be able to find slides of talks there. In addition I'd like to say 'thanks' to the people who spent a lot of time organising this as well as sponsors supporting the event.

Following the schedule given on the Wiki, find my notes below:

Day 1

In the morning it was talks/presentations, while in the afternoon different groups met discussing their most favourite topic.

"I am amused by the text appearing immediately to my left (your right)."

On Wednesday, after a welcome message from Robert Watson (left), Poul-Henning Kamp (right) started with a general discussion about "GreenBSD". Now this will not be yet another fork of FreeBSD, but might be a new large scale project now that SMPNg is done.
This is about "green computing", turning Gigabit interfaces to 100 Mbit/s if there is no need for more, entirely shut down unused NICs, powerd on servers or suspend/resume on FreeBSD notebooks and so on. It was mentioned that we want to make sure to have a properly layered implementation to not duplicate the same code all over the place, like for device drivers. They had a BOF at a later time. Wasn't there.

Ivan Voras on finstall Alexey Tarasov on kernel netbooting via http Next was Ivan Voras (left) on his GSoC 2007 project, finstall. finstall is a graphical installer which supports more features, geom, networking, ZFS, gjournal, .. . It's split into a frontend and a backend written in gtk/phyton. It implements a remote installation console. The backend can be used for non-base-system installs as well. At the moment it is still lacking a graphical partition editor.
During the questions a few things came up like replacing the phyton backend by a C implementation, and, as the frontend using gtk, is lgpl, what about a text mode only implementation in C? unionfs, dpart, non-bsd support, PC-BSD installer, talk to secteam about the security model, mdns, and the use as a general configuration tool, installing configuration files (you can have 'install scripts'/'RPC calls'/pxeboot) came up.

After a short break it was Alexey Tarasov (right) on kernel netbooting via http, another GSoC 2007 project. The work consists of a pxe api, a tiny tcp/ip stack, pxe sockets, ... Unfortunately there is no IPv6 support with PXE. There are basic filters to restrict IP/ports. The most important but also complicated task had been the user mode implementation of the tiny tcp/ip stack along with memory constraints (buffering issues). It uses httpfs provided by the boot loader incl. http 1.1 support. DHCP, DNS client, icmp echo, arp, boot console commands are already implemented. A telnet client, socketfs, ipv6 support as possible parsing of http server index pages are work in progress.

Rafal Jaworowski: embedded (architecture) status report Ed Schouten on reimplementing FreeBSD's tty layer Next, it was Rafal Jaworowski (left) with an embedded (architecture) status report: arm, mips, powerpc. arm is "almost" tier-1, from 7.0R in CVS, more to be found in perforce and even more coming soon. There is the LINT problem because of all the different variations.
For mips, a lot of code was coming from Juniper. CVS current: mips32/64, multi user, builds world (mips32). More to come to CVS, but can be found in P4.
Powerpc in CVS (P4), find more info on the wiki. More code from Juniper. There is highend chip support as well as SMP. Bridge mode is going towards 64bit powerpc support.
There are 2 GSoC 2008 projects: optimising the build system for embedded and port to 'Efika', a cheap platform. TODO 'highlights' include: NOR/NAND fs, improve build system to support cross-building from linux windows (should there be a vmware image?), better system/kernel configuration for a smaller footprint, ..

Ed Schouten (right) continued on reimplementing FreeBSD's tty layer. He started with a design overview. The system is currently still under Giant as input drivers are still on Giant. He removed clists (fragile buffer mechanism) from tty. Among other things he's now destroying ptys when unused, He folded the TX path buffering, removed cfreelist. He managed to keep all but sgetty ABI compatible. Quite a list of foo(4) devices still need to be converted. For the current integration plan he was talking about -9000+8000 LOC.

After pizza individual 'group meetings' were scheduled.

The Network Cabal started with Jeff Roberson, Julian Elischer and Kip Macy on mbufs. Jeff has that new concept with ref-count, less forks, cluster + 1 mbuf header, add length.
The ongoing discussion were mostly with Sam Leffler and Robert Watson and included mbuf layout, mbuf tags, MRT index bits - 16 bit short or more?, mbuf back traces, tracking mbuf leaks.
(I have notes about most of the discussion in detail for active participatns of the discussion.)

Lawrence Stewart on TCP bug forensics (photo by db@) John Birrell - dtrace demo (photo by db@) Next Lawrence Stewart (left *) was talking about TCP bug forensics. After introducing himself and a bit of TCP jargon and history it was congestion control algorithms. He explained which OS is using which algorithm and talked about what parameters to look at when comparing the various different high speed algorithms. Next section was tools. Dummynet has problems. SIFTR is their tool to generate CVS like information for later analysis. Last he showed three interesting case studies of problems they had found.

Next slot was debugging and profiling tools. John Birrell (right *) did a demo on dtrace. One of the important things to remember: 'dtrace is not a debugger'. During the discussion various questions came up:

with dtrace, can other tools (ktrace, ..) go away. Peter says 'no'.
gnn had asked if there could be dtrace + hw perf stuff: Sun has done it.
the thing to understand is that if the hooks are compiled into the kernel, the (only) overhead are NULL pointer checks for the modules. If the modules are loaded, there will be an addition mask check.

Julian Elischer pointed out, that he had recently committed the mbuf chain length tracing framework currently not compiled in with GENERIC.

The last big section that day was VImage/Virtualization. Marco Zec (photo by db@) Julian Elischer (photo by db@) Marco Zec (left *) and Julian Elischer (right *) gave an overview of what was virtualized. It was explained how they had implemented kld support. They pointed out, that there was no SMP damage as there are no new locks in the data path.
On the TODO is a memory interface rewrite [I cannot really remember what that was about], documentation of interfaces along with implementing support for subsystems like PF, ALTQ, MC, SCTP, ... and various kldsym consumers like netstat.
There is no size overhead. Looking at a sample we started walking ip_input.c.
Things that came came up during that session were:

management API
resources vs. capabilities
bitmasks for permissions
what about raw sockets?
..

Pre-step: consolidate variables, API global variables (offsets). For currentvnet instead of touching all functions adding an argument they use a thread local function variable.
As the Univeristy rooms were about to be locked, we had to leave. The session would be continued the 2nd day.

Day 2

On the morning it was talks/presentations again, while in the afternoon there were more BOFs/Cabal meetings.

Adrian Chadd on TCP: Content-/Service-Provider hijacking Doug Rabson on NFS Lock Manager Adrian Chadd (left) started on TCP and talked about Content-/Service-Provider hijacking. Continuing he described the different methods of technologies and what the different problems are depending on the setup. TCP Options, PMTU, ... He talked hacking FreeBSD tcp options: tcp mss and Julian's old patches as well as TPROXY.

Doug Rabson (right) continued on NFS Lock Manager and there was a lot of cheering on the subject. He started with a basic overview of the different NFS versions (2/3) and undocumented newer stuff. For FreeBSD locking used to be in userland (old rcp.lockd) and there was no proper client side locking. The new rcp.lockd supports everything but DOS shares. It's kernel mode RPC now for both client and server. NLM server support is there, the client support is to come. Local locking now supports async operation and there is a graph based deadlock detection. He implemented fairness for contended locks. Regression tests were added as well. The options are in GENERIC but you can opt-out and fall back to the "old rpc.lockd" by removing them from your kernel configuration. There may be a problem (short dead time) after bootup as, at the moment, port numbers change.

Justin Gibbs - FreeBSD Foundation update Erwin Lansing on FreeBSD portmgr Justin Gibbs (left) gave a FreeBSD Foundation update. In case you have seen this before, it includes the 'why create the foundation' part, as a core team member there was no way to fund developers, go to vendors/long standing relationships, negotiate. A Corp sponsor was not an option as their opinion might not always align with the FreeBSD community. The FreeBSD Foundation (FF) is an independent corp, the management is internally elected, activities by charter, .. The "Tie that binds" FreeBSD. Improve, nurture, protect, evangelise.
The FF is doing Travel Grants, event sponsorship, funds development, is there for IP protection, Legal, Contract negotiation, provides hardware, cluster and runs the fund raising.
He talked about the current budget - you can find that online.
Challenges are: knowing the user base, critical mass, funding for the FreeBSD _platform_, growing our capabilities. Development proposals: more funding. formal proposal. a good proposal? Detailed description, measurable goals, schedule, costs, milestones, technical details, ...

After a short break Erwin Lansing (right) continued on FreeBSD portmgr. He gave some numbers and statistics, went into the details of portmon for tracking PRs and maintainer-timeouts, packages, visualise dependencies, ... There are ~18k ports. 3 GSoC students will work on the ports infrastructure. Further discussion was supposed to happen at a BOF the following day. Questions like .. came up:

Is software getting more portable these days?
build times?
"embedded" packages / cross-builds?

Robert Watson before his talk on TCP SMP Scalability Robert Watson (left) continued on TCP SMP Scalability. He started with 'the big picture'. MPSAFEness, Giant free, improving for workloads. UDP problems: remove wlocks, excessive overhead from socket buffer code. The routing code has no parallelism. TX queues seem to be a source of contention - we want to preserve ordering. TCP problems: there is one same lock for all incoming packets thus decompose lock as suggested by alc's students. inpcb, socket buffer send/receive, routing, ifnet transmit queue locks still significant. He talked more about stack parallelism, direct dispatch vs. netisr, multi-receive queues and multi-send queues (ordering issues).
TODO: the plan: mutex -> rwlocks. Parallel netisr for loopback, IPsec, .. Questions:

Affinity? yes, Userland - setsockopt
What about latency?
Single receive queues and direct dispatch (hurts vs. better with that).

Peter Wemm on Version Control The last talk this morning was Peter Wemm (right) on Version Control. He didn't have slides, but a case study of the FreeBSD CVS tree move to SVN. He points out that our 16/17 year old CVS tree has quite a few historic changes, that we cannot recover from: history no longer there because in early days everyone had access to the repository and manually touched things. Repo-copies are a pain and there are no change sets (we emulate that with the commit log) and 'merge' does not keep track (multiple files in different branches), vendor branches, ... no branch dates, .... Several thousands of (interrupted) commits w/o log messages ended up in /tmp on repoman. Half in, half out, no mail, no log message.
Everyone has it's own favourite on which VCS should be next. Cannot make it right for everyone. So he went with SVN after evaluating other as well. We need revision numbers, hashes are bad for "this revision or newer" in Security Advisories or Errata Notices or when pointing people at good code.
There would be the requirement that everyone would use it for checking in changes. The client is plain C but uses apr. The good news is that svn has direct language bindings to perl, python, ruby, java,.. Most likely svn would not be in base.
As there are conversion scripts from cvs to everything there would be a live exported to the current cvs tree. That means cvsup still works and there would almost be no visible changes for the world apart from minor things like slightly different commit messages and such. That also means that cvsweb would still work, and that there would be a backup plan in case svn would not work out for src. We could just switch back to CVS.
Some samples: amd64 branch: 650 merges between p4 and cvs, 2250 commits, 18GB of metadata + 16G v-files for 150k changes in p4. 3.5G of backend data for src in svn thus fits into 8G of RAM. It would be expected that for long term p4 development for new projects would be done in svn.
repo/backouts/ACL/...: repo-layout as suggested on svn.f.o, basic ACLs are there, we don't have branch dates, cvs guessing, ... The backing store for SVN is a fs "journal" rather than tree, so backout from n to now works, else it'd need a dump/restore which is a pain but doable. The CLI is basically the same as with CVS only that it uses URLs instead of paths in some places. There is builtin mirroring.
For the changeover there would be a (partial) lockdown of the tree. Conversion would then take about 4 hours. It would be only src. It would be a one time shot.
Ports hurt because of all those many small files. 8GB of backing store.
The SVN web interface is basically functional and we may have search options where you can do 'what has name done over the last n many years' for example.
Inter-version operability is something we would have to check. For commit mail, acl, log generator, exporter to cvs - Peter says he'd have it. If a move "repo-copy" will be done in SVN, for CVS files will come and go.

Over lunch I attended the secteam meeting - no details, sorry.

After that I joined the TrustedBSD/MAC/Audit BOF. The topics were:

MAC framework in GENERIC
performance issue? change to rmlocks+sx and only aquire wlock on policy change. mbuf performance: mallocs, zero on alloc and free. pre call mask checks on subsystem.
Audit and Audit in jails
no audit in jails; host environment audit configuration. per jail auditing configuration + trail and pipe + host environment as well. jail host tokens in host but not jail. virtualize jail. not attributable at the moment. what about sshd in userspace? nothing unique except the jail ID.
Privileges
euid 0 still the policy. restrict as root; allow !0 to change privileges? WARNING! sendmail sample. offer more privileges to root (dtrace stuff). general privilege facility done very carefully. ability for a binary specify what it actually needs and downgrade for the time running. what about fork? child not more privileges than parents? what about ping only needs the privileges to bind raw socket but a user does not have it? upgrading by policy? ... POSIX1e. su/setuid/.../ system call to set the privilege.

There was a bufcache BOF in parallel. You can find info on the Wiki.

David Maxwell, Coverity Extent Training: Simple Checkers. [ sorry no details here ].

Last it was Virtualization/VImage II.
For 7 it should not be committed. For 8 we shall have the global structure.
Here is the plan Marko Zec/Julian Elischer/... came up with:

applying the Macros ... rt_tables will then be V_rt_tables
apply macros at the top of functions, sysctls
use a sed script for those changes
do MD5 verification that nothing actually changed.

Changes/3rd party patches will be invalidated!

structify + init routines (INIT()) g.v

At this point I think we can do performance testing for the patch of step 3.

add framework for virtualization; per frame pointers to the current. g->v. INIT_VNET_INET would start doing something.
one module at a time || "more than one of them"

VIMage schedule Some concerns that if the naming/interfaces/.. are ok as we only want to do all the painful renaming once breaking every one else's tree.
Another non-networking example was suggested: Robert Watson thinks of POSIX IPC.
Developers documentation is needed on how to use it, how to program for it, ...
Schedule (may depend/slip on other major infrastructure changes):

date work

June 7 heads-up

June 15 step 1

July 1 step 2

let it sit for a while; assume it goes well "noise"

Aug 1 step 3

from her on: incremental to

Aug 15 step 4

Note, that While ongoing, before Aug 1 you may not want to integrate your working trees.
We will still need a management interface, ...

Something wrong? Questions? Contact? ...?

In case you feel I got something entirely wrong, want to me change something to make it more clear, or just want to get in contact with me, email: "Bjoern A. Zeeb" <bz@FreeBSD.org> .

(*) The original photos of Lawrence Stewart, John Birrell, Marco Zec and Julian Elischer were taken by Diane Bruce (db@FreeBSd.org). Thank you for providing them.
All other photos were taken by myself.

$Id: 200805DevSummit.html,v 1.2 2008/05/27 17:09:33 bz Exp bz $, Bjoern A. Zeeb

date	work
June 7	heads-up
June 15	step 1
July 1	step 2
let it sit for a while; assume it goes well "noise"
Aug 1	step 3
from her on: incremental to
Aug 15	step 4