Saturday June 30 2001 FreeBSD Meeting @ USENIX Matthew Dillon Transcribing in real time Jordan/Greg Filming 10:30 - 11:00 meeting 11:00 - 12:00 devd (imp) 12:00 - 1:00 lunch (jordan) +--------+ | COFFEE | +--------+ (10A)(11) (12) (13) (14) (15) (16) (17) (18) (19) (20) +-------------------------------------------------------+ +------+| | +-------+ |CAMERA|| TABLE | | WHITE | +------+| | | BOARD | +-------------------------------------------------------+ +-------+ (10) (9) (8) (7) (6) (5) (4) (3) (2) (1) Role: (duplicate numbers: chair insertions) (1) David O'Brian (2) Nik Clayton (3) Matt Dillon (4) Julian Elischer (5) D P Richards (6) Andrew Gallatin (6) Mike Smith (7) David Greenman (8) Jordan Hubbard (9) Jake Burkelholder (10) Bill Fenner (10) Evan Sarmiento (10A) Garance Drosehn (11) Sam Leiffler (12) Dan Eishlan (13) Michael Wu (14) Doug Rabson (15) John Baldwin (16) Brian Somers (17) Warner Losh (18) Peter Wemm (19) Robert Watson (20) Daniel Eischen (??) Greg Lehey (floating / in and out) WARNING WARNING WARNING! This is not an exact transcription, since it was in real time. I only got about half of it down and couldn't always type quickly enough to stay caught up. There are almost certainly type-o's and mis-attributions (since editing in real time is rather, well, impossible). See the video for the real deal. This should give you a flavor of the meeting, though. -Matt -- Power PC (Mike & Benno. Benno not here) DEVD / John. Talking about other processors. Mike: Benno said he got open firmware and nexus working and we're having trouble in that we don't really have a serial port... may try to implement ddb over ethernet (netbsd already has one). It would be a simple pipe device that pipes all the crap out the interface into the ip network. Once we get past the boot loader I think we are home free. pmap is almost done. locore is half done. Might make it to single user by 5.0. The goal is to support as many platforms as possible. IMac, PowerBook, G4, Titaniums, perhaps some of the embedded platforms (e.g. 7410?sp). Robert Watson: what about ABI selection for power pcs? There are several. Mike: We are using the embedded ABI at the moment. Might switch later on... depends on the bike shed. Robert: when does the ABI decision have to be made? Mike: When we boot into single user. Every time I talk to Bento he's made progress. Some talk about mounting filesystems between IA32 and PPC and NetBSD's work. Peter: no point in running a local disk on its native platform in anything other then its native byte order if it's never going to move. (Peter's concern is in regards to overhead). (bikeshed?) Mike: Knows people running UFS on DVD-RAM. Mike/Peter: Maybe a mount option? A compile-time option? Bill Fenner: NextStep used big-endian for all their filesystems, didn't bother with 80386 (no byte swap instruction), but everything else... Bill Fenner: choose at newfs time. Mike: agrees. .. lots more people start chiming in, talking about other architectures (Solaris on Sparc, ROM loaders, removable media, and so forth). Warner: we should table this for now. Back to mike: Jordan asks: What are your role out plans? Mike: PowerBook, G4 to begin with, then perhaps get into the embedded platforms. Jordan asks how far they'll get: Mike/John: single user by 5.0. G4 suppor for SMP. John: The way the kernel is now, support for SMP is fairly easy. Jordan: Apple isn't charging much more for duel cpu boxes. (Consensus: should be fairly easy). -- IA64 (Doug Rabson) Doug: IA64. Kernel to single user mode. Signal delivery appears to work. There are some things we need to address in the kernel in regards to the fact that we have two stacks. IA64 has an upward-growing register stack in addition to the normal stack. I have a hack in exec that maps a few pages, sets the register, etc... The register stack tends to be smaller then the regular stack (arguments, registerized variables, etc..). 128 register set with a lazy save to memory, rolling window. DG: questions about register allocation. Doug: register growth is open-ended. So, in anycase, for IA64 we need a second stack that grows upwards. Matt chimes in (oh no!): for user mode register stack, just mmap() a large virtual area. DG agrees, that would work... have to decide how much. Doug: userstack - 2 * maxstack. Doug: 64 bit address space broken up into 61 bit regions, no problem in regards to available VM. (timeout for Greg Lehey to setup Tripod for Jordan's VC). Doug: Next goal is to run on real hardware rather then simulated hardware. The first step of that is to get the loader working. I have real code compiled that actually runs on real hardware.... Got to the stage where I have to debug setjmp and longjmp, and after that it should be fairly easy to bring up a real kernel. It's got serial ports which should be fairly easy to use. Jordan: Do you have access to a real machine? Doug: Yes. It even has firmware support for loading stuff over TFTP. That's just about it.. oh, there are tool chain issues, I'm using an old tool chain at the moment. Once we have enough toolchain in our action tree then we will consider cross building a real userland after bringing the kernel up single user. I expect it to get to single user mode on real hardware by the 5.0 release date. Mike: question. In this SMP for ISD4. When you have two processors it really becomes an ACPU machine due to the cpu threading. Mike: thinks it might be a good idea to plan ahead, there are going to be issues with many more cpu's. Robert Watson: too soon. Greg Lehey: There are some IBM systems with 64 cpus and I've been asked to look into scaleability issues in this area. Warner: how serious are they? Are they expecting an "oh this sucks" short report? Greg: They are a little more series then that. They would like to bring it up on at least 32 cpus. (Additional talk). Robert: Regardless of that, I think fairly fine-grained locking is the strategy for that. Greg: general agreement. It will be suboptimal, but more optimal then a badly thought out mid-level locking. Peter: break break. Greg: Benno is still working his way to single user mode. He's got an initial print out of the probe messages and working forwards (PPC32). Question as to whether to port NetBSD or FreeBSD first, neither runs on PPC64. It might be better to get NetBSD to boot first, then carry that forwards into FreeBSD. This is not going to be my (Greg) prime occupation. If anyone wants access to the machine after we've gotten it basically working on the net, I can arrange limited access (due to IBM security issues). Contact me. This is the 2-cpu PPC machine, not the 32-cpu PPC machine. Greg has to leave. Banter about where Greg is located now (Australia). -- SPARC 64 Jake: Sparc64. Running into filesystem issues trying to mount a little endian filesystem on a big endian machine. Will definitely have it running in single user by 5.0. Architecture is very similar to IA32. The stack works the same. The window registers do not change things that much. Will mainly support PCI devices, will probably not try to get SBUS working in the near term. Low end ultras are all PCI, no big need for SBUS at the moment. I also have access to an 8 processor box. Video cards may be an issue... on the larger boxes they are SBUS devices. Console may be an issue (serial ports brought up as an alternate). Doug: NetBSD found issues with big-endian boxes. Jake/Doug: Yes, there were alignment issues and stuff. We just have structures for system call arguments, with appropriate padding. NetBSD has a special syscall arg macro. Our structures should be easier to port. Jake: The firmware has various nice things. It will load an ELF binary and it has TFTP support. Jake: Right now I am booting over the network. I have a little loader, not a real loader. (How far do you boot?) It gets to the point where it mounts the root filesystem. Someone at NetBSD put together a big-endian filesystem and it got past the magic number issue. Jake/Jordan/Warner. Warner: I can mount a NetBSD filesystem on FreeBSD if the endian is the same. (David O'Brian Enters) John: What is the current status of (Sparc, PowerPC?). O'Brian: It will happen by november. Once we get single user I don't see multi-user as being a big deal. It is predicated on getting GCC 3.0 in the tree and may require a newer version then the GCC going into our tree. We might have to wait until 5.1... at the moment it requires all sorts of external packages. (...). Note that the hardware will not be out by 5.0, and a similuator would cost around $15K. John: What about StrongArm? Mike: I do have the boards SA1110. David: I'm trying to get a mailing list going. Mike: I'm booting linux on it but I am splitting my time between PPC and StrongArm. (Jordan: Gary Palmer was working on StrongArm. O'Brian: I think that was pre-Strong ARM). (More talk about which particular strongarm platform we should go for first?). Jordan: suggestion to target first... the reference board is fine, but FreeBSD's mandate essentially says that we should be running on commodity hardware. G4, IPAC, HPCMIPS?sp, etc etc... we need to use commodity hardware as our reference platform. Warner: NetBSD does a good job running on all available boards as well as commodity hardware. It creates an issue where they have a lot of essentially similar ports. The commercially available ARM machines are the IPAC. HP jurnatas. (Suggestion from O'Brian: goto slashdot and look at those new StrongArm cube things). John: Any more talk about toolchain issues? O'Brian: What is there to say? Jordan: How about talk about GCC 3.0. O'Brian: GCC-3.0 must be in 5.0 in my opinion, otherwise we will have to rip C++ ABIs out from under everyone. I thought this might be the last one but 3.1 might be the last one. I expect to seriously start work on the integration on the first of august, due to windriver activities slowing me down. binutils2.11.2 was imported this morning, though there may be some issues with it (I got a bug report on it). Jordan: What about the java suppor for GCC-3.0? O'Brian: are you just curious or do you want to actually import that? Jordan: A lot of people are interested in it. O'Brian: It is a native compiler so you do not have to bring over the byte code or the AWT. Because of that it is rather limited right now in terms of what you can do. They are still trying to get libjava to present a virtual machine to the java code. The fact that you can't just take .class files around... I don't know the usefullness of it by itself. libjava does not compile on all platforms yet, so... Jordan: Should we hold off? Import it later? O'Brian: I don't know if we ever should. People were talking about porting Sun Java 2.x... John: lunch break now or at 1:00 p.m. (it is now 12:00) -- DEVD John moderating. DevD. Warner: Summary of what we talked about yesterday. What is devd. We talked about do we want it to be our security policy enforcement manager? Do we want to just have it handle events? What? The scope we came up with is that it is a replacement for pccardd and usbd. It listens for a particular device in the kernel (device arrived, device left, etc...) (PHK wants it to listen to dev filesystem requests too). Long discussion about whether it should be used for permissions persistence or policy agent. Concensus: Sounds cool, but when it comes down to it there are races, consistency issues (what if devd dies?). Worried about the concept of the kernel upcalling a userlevel daemon. Additional talk about the kernel relying on userland daemons. aka vinum. Kernel events relying on user operations sound very dangerous (from various people). Julian: Poul was seeing the same things I was seeing. There are a number of times where the user community is demanding that they want to be able to taylor a filesystem in various ways. XXX. Robert: Might want to feed things into devfs before devfs exposes things to userland. Looking at devd more as an event manager rather then in devfs. Appearance of the device in the filesystem has more to do with the presentation of the device to userland and less to do with how it is perceived by newbus. (more talk back and forth). DG: exactly what role is devd to play when some other process does a stat or something like that? (answer: no, no). DevD purely as an event manager to take other actions, such as load the kernel module associated with a new device found by the kernel. Example in the tree. You plug a card into newcard but newcard has no way to configure it (e.g. no pccardd around)... devd would be responsible for that. (Now everyone is chiming in with examples). Andrew: Stateless or statefull? Answer: Robert: Stateless. Doug: What about automounting filesystems? Warner/Robert: Yes, devd would do that. Robert: devd would be able to act on the class of the device as well as the particular brand of the device. Peter: e.g. this is an ethernet device (generic). DG: do we have to get this thing all singing and dancing before it would be useable? Julian: I think it can be done piecemeal, on a class by class basis. Warner: My view is that putting a few event types into a devdev/kernel-queue for devd would be very easy. We put that in, write the program, and maybe initially all it does is string lookups and such. Warner: That is what USBD does now, it just looks things up based on regex matches. Doug: A medium term goal would be to have DevD recognize soft eject pccard switches. Robert: A lot of PCs have soft-eject mechanisms now. (lots more talk about soft notification and soft eject mechanisms). Peter: example, CD ejects, devd could detect that the user is hitting the eject button on a locked drive and could then unmount the CD and allow the ejection. Mike: hot swap PCI cards too. Warner: the hotswap compaq PCI spec is a user-initiated power down, the led goes from green to red and the user then pulls the card out. The O.S. is cooperative in this regard. There is no standard way to do it yet, it is chipset specific. It is the PCCard bridge situation all over again. DG: I've only seen one Intel platform with hot swappable PCI. (others: The DELL 4400 (?sp)). Peter chimes in. Warner chimes in. Warner: The Strawman proposal we came up last night for devd was basically just getting events from the kernel and doing stuff. It would not be the gatekeeper for device permissions. Decided for a strawman that devices always appear, and you could see the device, but you may not be able to open it. DG: What about chroot situation? Robert: If you mount devfs in a chrooted filesystem it will have all the devices as in the root filesystem. (More talk, can you change permissions on separately mounted devfs filesystems? Nobody is entirely sure). DG: How will devd know about all these other devfs mounts? Robert: at the moment we think the best way to do it is in the mount options for devfs. Robert: note that jails in 5.x are different then 4.x. The jail info is stored in the ucred, not the process structure, and devd can work with it. (general conversion is that jail/chroot/multiple-mounts should be a property of devfs, not devd). Julian: I've actually solved this problem for my device filesystem. It is generally applicable. You implement a version of a symlink that instead of being a normal symlink it is instead a symlink into the device space. You basically give it the name into the prototype device filesystem space that you want it to be a link to. The code that follows this does it above the.. before you get down into the VFS, so you are not following a major/minor number. Access is by name. Warner: Another way of saying that is that jailed devfs's come up empty? Julian: No, the jails would not have devfs mounts at all, they would contain these special symlinks. Warner: you can create the symlinks before the devices exist? Answer: yes, it is by name. DG: what about new devices. Julian: devd could add them. Matt: devd shouldn't have to mess around with a random number of devfs mounts. Warner: the symlink suggestion by Julian seems to solve the problem (paraphrased). (More talk). Warner: it sounds like we have a rough concensus around the varient symlink idea. Julian (some stuff). Warner: what about setting permissions? DG: symlink permissions do not mean anything at the moment. Julian: It would be fixable. Robert: I just want to make sure we can adjust device permissions within jails. (Robert generally agrees). DG: The obvious thing to do would be to implement something even more grand, such as implement real varient symlinks. (More talk about devd and these symlinks and who creates and maintains them). DG: what I was suggesting here is that I understand the goal is different from general varient symlinks. The idea being that you have a target symlink and want to do some sort of translation on the path, such as an environment translation. Robert: AFS does something like this. Matt: Didn't VMS do something like that? DG: answers (lost). DG: the idea is to have some sort of general translation mechanism. They are different things, I agree, but I think we could use this mechanism to do both. Robert: We need a protection step here. DG: I think the permissions thing is a separate issues. I don't see why we can't enforce symlink permissions inside jails and not outside jails (at the moment they are not imposed anywhere). Warner: I like that idea. Bill Fenner chimes in. DG: Kirk is going to hate this. Apparently there is some draft specification that says symlinks do not have permissions associated with them. Julian: I ran this past Kirk a year and a half ago and he seemed to think it was a good idea. DG: I'm not saying we have to follow some arbitrary standard that someone else had designed. Historically symlinks had permissions, then Kirk took them away, then we and NetBSD added them back. Now Kirk says it seemed like a good idea at the time but was a mistake. (Confusion over what permissions are being enforced... just the owner/group? Or modes too?). (more talk). Peter: 4.4-lite went back to the inode to store symlink information. Warner: you setup the jail once to make the nodes. Robert: This breaks us away from the major/minor stuff. (Julian agrees and gives an example). DG: This brings up an interesting security issue in regards to what we allow people to create inside jails (General agreement). DG: perhaps we should separate the varient symlink from the device node symlink types.... (too many people talking now). Talk moves to which physical inode device type should be used... character device? Some other device type? What? (pause to talk about exodus routing problems) (more talk about how to implement the special symlink). Robert Watson: use the major/minor -1,-1 to indicate additional content. David O'Brian: power pc development depends on traditional NFS device exports (paraphrased). Robert Watson: If we were to overload the block device node and add indirect blocks to it, would the filesystem BOF on it? DG: well, we could fix that. (Matt comment: the easiest tie-in to the kernel is to subtype softlinks). John: Lets go to IRC for a quick PPC update, and then break for lunch. (More banter, getting ready for lunch, talking about the NASD failure) (Waiting for Jordan) LUNCH LUNCH LUNCH LUNCH LUNCH LUNCH (return from lunch 2:50 p.m.) (times off from original sched) ?:?? - ?:?? ports (obrien, jake?) ?:?? - ?:?? 5.0 (jordan) ?:?? - ?:?? KSE (peter, julian) ?:?? - ?:?? SMP (jhb, jake?) -- 5.0 The reason for setting the deadline as it is (in November), is because December already has a 4.x release in it. We loosely said we'd do a -current release this year, so that leaves november pretty much. For 5.0 there isn't much point adding slippage since 5.0 is not expected to be production (paraphrased). We need to get the 5.0 release out to a wider group of testers. Whether we like it or not, that should be helpful. Warner: with the pccard changes I've made, I've gotten more comments from people running on stable then on current. Jordan: Perception-wise a lot of dirty has been flying on -current. Peter: we scared them away on purpose to a degree (paraphrased) Jordan: 5.0 is a symbolic declaration. Time to get the user community crunching on it. (more discussion). Jordan: No branch will occur at 5.0. Well, like I say, this is the more important question. .0 is more like a symbolic act. Once people orient themselves around it we need to decide how much time we need to really really make it work so we are not spending all kinds of time doing work in two branches, because that kind of sucks we've done that before. So my feeling is maybe as late as 5.2. Peter: I think the real thing is I should make sure it is in a fit state to be branched or in a fit state is able to be branch without... just basically so we do not do it too early so we do not spend too long catching up. Once 5.0 comes out there's going to be a massive amount of work. Jordan: what is fit. I would define that from an engineering perspective that we fixed all the significant problems we know about, ... and that things are fixed to the best of our ability. And to me that means somewhere after 5.1... that is when people are going to really start arriving (using -current). Peter: my gut feeling is that 5.0 is going to break a lot of stuff. 5.1 is going to be our chance to get the really big stuff fix, which will get a lot more people on, and then we can tune stuff and that will get us into good shape to branch around 5.2 Warner: that sounds good. The big thing we learned about 3.0 is that we branched a little early, because Matt (and others) wanted to push things (VM) into 3.0 that really belonged in 4.0. Robert: does that mean we are going to freeze active development for a few months? Jordan: Possibly for as long as a year. If we get something that requires really significant architecting, then perhaps we can create a sub-branch for it. BSDOS put all their merge stuff on a branch and it was really stable. D P Richards: what about pushing down the locks, does that count in the freeze. Jordan: That is 'finishing it' (i.e. does not apply to the freeze). Peter: We will have to be doing that continuously. (additional talking back and forth). Jordan: you are wondering if people should track the branch or whether they should take the releases (talking about current here). Robert: there is a point in time after a release where we will be doing a huge amount of work (paraphrased). Peter: (agrees). Jordan (agrees). Jordan: cvsup will be dangerous, but we will be able to do point releases. Peter: I personally think we should get the bulk of anticipated API changes done by 5.0, so we do not have a lot of turmoil. Matt: are your point releases going to be tagged, or ad-hoc? Jordan: yes, they will be tagged. Warner: when will we know we will be ready by 5.1. (Jordan answer is substantially the same as in prior discussions). (?): What about GNATS. Mike Smith: GNATS will not be very useful considering how quickly -current is going (paraphrased). The cut feeling reaction is that we will do 5.0, there will be some really big fires, and by the time those fires are out people will probably generate lots of PRs. Jordan: We should maintain a web page, I don't think the PRs will solve the problem. Robert: GNATS has a number of problems. Jordan: (lists a number of problems with GNATS). Warner: Bugzilla makes it easier to manage... I have bugzilla at work and going through the bug reports is trivial. Jordan: (more comments about the PR system). Matt: we need an auto-assignment thing for PR's Warner: For Core and the security officer we need incident tracking Jordan: Lets take this off to a corner (offline). What we should do is migrate things from GNATS into the (undescribed future) system, and so forth... Mike: For the japanese people who do not speak english, can we have a separate bug reporting system for them (that's coordinated) (sorry, too much talking, missed the rest). Warner: Mechanical translation is not really an option. (...) Robert: there was a suggestion of projects.freebsd.org, a php-based web system to track projects (e.g. like devd), but nobody appears to have time at the moment (paraphrased). There are a lot of issues and nobody has picked up the ball. There are a lot of logistical issues like bringing the service online. Mike: The issue is that we need a dedicated maintainer or it will rot. (Jake enters the room) Jordan: any other questions? Nik: On 5.0 will we have a release engineering team? Jordan: The release engineering will pretty much work the same way it has been revolving. David O'Brien has been doing stuff, Paul has been doing stuff, Murrey has been doing stuff, etc. (More talk re: project releases). What the project should commit to doing is put is 5 ISO images, a small release-only bits (200MB or so) for making miniCD's, putting it in flash, etc. We have the four ISO images (picking 4 out of my ass) which is the release, some packages, (interruptions for discussion, list cutoff). Jordan: In addition to those 5 ISO images and the FTP dist. I think we should also make a pointer (disk space constrained) for all the disk files, all the packages, etc. etc... all on one system, so When you have a release you have a whole chunk of data. Good reference bits so you have a starting point. A definitive packet set. etc.. and what the ISVs want to do after that is up to them. We expect them to add additional value (such as package it all up on a DVD, for example). Nik: should sysinstall be able to handle extra components. So if a third party etc... want to put together a value-add distribution... What should we define as the 'bare minimum' for an ISV to be able to call the distribution FreeBSD? Julian interrupts: what does this have to do with kernels? (talk fest slows down, Jordan gets in a last word.. oops, no warner gets in a last word). In the past WCCDROM has had an exclusive for the 'official' FreeBSD dist, what do we do about that? Jordan: Once we get the official bits out there we can evaluate multiple vendors for an 'official' release and give it the stamp (paraphrased). It may come to pass that we might want someone else to do the official distribution (implied: or several people). (Additional talk). Jordan: The point I want to make as a release engineer is that this is about as clean as it is going to get, since I do not work for anyone with a vested interest in the official FreeBSD release, ... so we should do it (fair share, paraphrased). (yet more talk about who should run the official FreeBSD sites)... the FreeBSD foundation is at the point now where the sites can officially shifted to them (my comment: though obviously hosting will still probably be done by Yahoo???) -- KSEs (Julian) Julian: The basic ... the basic... we will start from the really first thing which is system calls become asynchronous and the threads library uses that to generate new threads. The basic idea is that you go down into the system and when the process would block it instead saves the basically the uarea. We want to save the kernel stack and state for that process so it can be restarted. Now the process can keep running. It then allocates from a cache of uareas (not the best definition) and loads the very top of that stack with a trampoline (my term) that returns to the same system call in user mode that originally blocked, and that place does some kernel housekeeping and jumps back into the userland/kernelland scheduler. Presently we have one process structure but we end up with four structures under this picture. We have the process structure, P. This owns all the resources, including address space, descriptors, any credentials, all that stuff. Then we have another small entity KSEG that owns scheduling priority and quanta. Mike: We already have a KSEG (kernel segment). Julian: well, whatever you want to call it. Julian: then we have the KSE. Its sort of an empty shell in which contexts are loaded to run. This is the scheduling vehicle.. a kernel schedulable entity of some sort. This is what provides your parallelization within a process. The last thing is called a KSEC. This contains the kernel stack, all state of blocked and running code. So, umm, the way it works is in a normal process before you do anything has one of all of these, and one other possiblity is that the process structure always contains (one of) all of these. Robert: note that in the linux model the credential is bound to the thread, which is contrary to the POSIX model. Julian: you could bind the credential to another structure (this is not being seriously entertained). Julian: So how do you glue all this stuff together? It's simple. You are running in usermode and do a system call into the kernel. All the status where you came from is saved, and is copied into the kernel. The kernel then returns to where it came from this first time only. At another time in the future you run a system call into the kernel and block. At this point you allocate a new KSEC and the old one is saved on a sleep queue somewhere. You then take the original state copied into the kernel and return to it. (The userland scheduler is what originally calls into the kernel whos state is saved. Later on this state is used by the kernel to 'return' to the userland schedule many times, as necessary when a userland thread blocks in the kernel, allowing the userland scheduler to then handle the userland side of thread scheduling). (additional discussion, a suggestion to keep the conversation at a higher level). DG: (paraphrased) does this fundamentally change the way syscalls work? Julian: No, only if that very first system call is made (to place the kernel in a threaded mode of operation). DG: so if you don't do that the behavior is exactly the same as it works today. Julian: Yes. DG: So this is similar to what pthreads does, but we move it into the kernel. Julian/others: Yes. Julian: Except you don't have to wrap your system calls (refering to pthreads). Robert: So locks would result in a switch of this sort, but Mutexes would not? Julian: right (because mutexes cannot be held across a schedule blocking condition). John: blocking mutexes and SX locks already do switches, so it would be relatively easy to do this. (paraphrased)... discussion about the potential for deadlocks between Mike Smith and Julian. When the kernel stalls on a context, wakeup and continuance of that kernel context is independant of the userland thread it came from. Lots more stuff. Jordan breaks in: How soon will we see this code into current? Julian: The first sections of this...it depends how far I am allowed to go. I can make a lot of the changes now and make it so it works the same as it does now (as long as the new system call is not called). Jordan: How would you do that? Julian: Breaking the proc structure into four parts and having all four present in the proc structure. Matt: would those be embedded structures or allocated? Julian: For the moment those would be embedded structures. Jordan: what else? Julian: You would have to go through and change all the places where the system runs through the process to run through the KSE. Jordan: ok, what else? Jordan: Are there any funcitonal changes assuming you cleanup all the naming issues? Peter: No, there is not much that has to change functionally. The biggest impact is renaming existing things under slightly different names that need to be broken up. But the bigger picture is not much different. DG: the current scheme for initializing a proc structure is already broken up into three sections. Julian: I will have a call for each of these broken up pieces for each of the three sections. Jordan: Assuming all of this (paraphrased), how long would it take? Julian: Macros, around 4 hours. Jordan: the rest? Julian: Adding the syscalls, probably around a week. Jordan: And how long to make the resulting... Julian: The changes to all the rest of the kernel is the big one.. about a month of time. (Robert describes the issues involved in the big one). DG: one man month or one month period? (Lots more questions).. Julian: Yes, it can be done piecemeal without breaking the kernel. Robert: How is this going to effect the lock pushdown? Peter and Julian: We just continue to lock the whole process for now, and say that it 'locks' all the things under the process (related to the KSE stuff). Jordan: But are there any objections? Answer from everyone: No Objections. Robert: what are the differences between what the NetBSD and we are doing? Julian: NetBSD is basically implementing scheduler activations directly from the paper. We looked at the paper and said "Hmm, but what about...". Peter: There isn't a lot of fundamental change between the two, it should be possible to run the NetBSD binaries with a simple layer. Mike Smith: there haven't been any other suggestions that appeal to people other then this way. Peter: (Peter tries to scare people into believing that KSEs are really the best way to go by implying that other systems are much worse). Peter: do you have time? Julian: A quarter of my time is paid for FreeBSD work, but the work hours only apply to a 40 hour week. I do not think it will take 4 times as long (aka will not take 4 real months for 1 man month). Nik: The mechanical stuff, how easy would it be to document rules for a junior kernel hacker? Julian: Very easy. Jordan: Julian has been given clearance. (more jokes follow, sorry you won't see it on the video, Jordan is changing the tape!). -- SMP John: network layer will be locked down for the moment using three our four larger locks. Matt: question in regards to the proc lock. Jordan: so how are you going to break the milestones down? John: I would happy to get the network and proc locking done by 5.0. Greg: are we going to have performance goals for 5.0? Answer (from several people): no, none. Peter: (paraphrased) we will work on performance goals for 5.1. Greg: if we had a 5.0 release that is half the speed of 4.x... I would be unhappy. Mike smith: Realistically it probably would not be that bad, and if it were it would set off alarm bells. Mike smith: The times for buildworlds are still in the ballpark. Greg: can someone give me something more specific? Mike smith: it is possible to do it in about 40 minutes, roughly similar to 4.x. Peter: On an SMP duel 1.2GHz AMD, it was 27 minutes (stable) verses 32 or 33 (current). That was both being built .. actually building -current on 4.3, then install 5.0 and building again. Local filesystems. No ram disks, etc etc. Mike smith: Bottom line we care about performance but it is not something we will be persuing in 5.0. Greg: bottom line is stability. (Mike says functionality and stability are similar, Greg says they are very different). Robert: Maybe we should go on talking about the more general SMP functionalities. Jordan: so you are saying proc and networking stuff. John: yes. So the idea we came up with is having three or four general locks. Matt: what about moving the rest of the kernel out of giant? John: we thought about trying to do subsystem locks when we started all of this, but the problem is that code flow goes both ways and you really need to attach locks to data rather then code. (Paraphrased, My typing is incomplete). Jordan: so is that not going to happen until 5.1? 5.2? John: (noncommittal) Jordan: so when can we call SMP 'finished'? (now too many people chime in) Jordan: will giant be involved to hold all the misc stuff so there is no need to do a lock pushdown everywhere? John: what I would like to do is get some syscalls outside of Giant. Mike S: so you would like the infrastructure in place so the lock pushdown can go on in the background without necessary impinging performance. You would like the hardest work out of the way? John: yes. There will not be a lot of instant gratification. Mike S: we understand that. Jordan: how much dust will there be in the tree when it branches? When does it stop? John: I expect it to take quite a lot of time to get the kernel completely locked. (additional discussion). Mike: so there is no clear line. John: ... yes. I don't have a line in my head. Mike: it sounds like a time consuming process. At some point we need to make a decision as to what level of locking is appropriate for the 5.x branch. ... We want 5.0 to be reasonably concurrent in the kernel. That is one of the design goals for the 5.x branch. John: it is going to take more resources then I have now. I don't know how long it is going to take. (goes on to explain a number of subsystems showing the complexity of the issue). Jordan: these are questions that you probably can't answer in this meeting... .(and goes on to expound on that a bit). People are going to want to know when this (5.x) is going to be a win verses 4.x. Nik: so by 5.2.... Jordan: so by then we would want there to be enough of a win that people can be comfortable with it (paraphrased). John: (talks about profiling tools to locate bottlenecks). (additional discussion about how to break the network up in regards to lock pushdowns) Greg: so again we will not have a parallel network stack in 5.0. Peter: yes, but that will not effect device driver writers per say. (too many people talking back and forth now, sorry) Matt: I will be working on -current on weekends (VM and giant related stuff). Robert: so what subsystems do we have to worry about? John: VM VFS Buffer cache TTY Proc/Sessions/Group (long discussion (sorry) on VFS locking. Discussions in regards to where BSDOS is on this vs us, etc...) -- MISC Misc discussions about the state of firewire, USB, NetBSD/FreeBSD codefork in regards to (firewire?). Misc discussions on what various people are doing. Jordan: so is firewire basically dead then? Mike S: well, it isn't dead but nobody is driving it now Robert: it's basically dead. (more talk about firewire) (my fingers are getting tired) Jordan: we get beaten up the most about 3D support, on USB support, on firewire. Warner: nobody has beaten you up about cardbus: Jordan: no, nobody has beaten me up... (others:) they just assume it won't work. (laugh) (talk on compat/linux stuff as it relates to games. Basic concensus is that it's possible but a lot of messing around. Jordan never could get it to work). What about IRDA? Julian: where there's the S?? chipset, and there's the Toshiba chipset. It's in the state now where IR data is being piped up to userland and they are playing with it. Mike Smith: there's no killer driver to drive IRDA. Nik: In Europe there are phones with IRDA ports where you point your laptop at the phone and get 9600. Mike Smith: but, well is there protocol documentation available? (more talk on that) Mike Smith: where I am getting at is that by ourselves we will need a lot of extra attention to get that kind of functionality. Jordan: can we summarize this as not having anyone particularly working on this? Julian: Well, we have some people working on it (working on Benno's code base) Jordan: can you get us in contact so they aren't unknowns (paraphrased) Julian: yes, I've already started that (paraphrased). Jordan: so Julian is going to drive our IRDA Julian: I will be the liason. Jordan: does anyone have any objections to having the ports stuff split out of the cvs repository, i.e. be a separate cvs repository? (probably on the same machine). (general response): No, that's a good idea. (Matt's internal note: this was brought up a number of times outside this meeting. The concern is that from a security standpoint the number of ports committers are growing to a point where we have issues with the kernel tree, but nobody wants to alienate or categorized a 'ports committer' as being somehow a 'lesser' committer. This is basically being driven by security concerns. Similarly, most new committers are not given shells on freefall any more, just cvs server access). END