Building an Integrated Server Product using FreeBSD


Mark J. Taylor
mtaylor@cybernet.com
Cybernet Systems Corporation
NetMAX Division
Ann Arbor, Michigan
http://www.cybernet.com/
http://www.netmax.com/

Abstract

Servers are a complex growth of multiple vendor applications, making the administration of these services non-trivial, confusing, and highly disintegrated. The tools that are publicly available to configure these services vary from "vi" to "swat". Our goal with the NetMAX product (Network Media Access Server), initially created for FreeBSD, is to choose interesting services, such as web serving, file serving, and account management, and create an integrated, multi-platform environment for installation and configuration of these services. This paper will present some of the issues that were encountered in building such a product, and our solutions, in a not-too technical format. After that, there will be a brief look toward the future of the project.

Introduction

1.1 What is NetMAX?

The NetMAX product is a Network Media Access server software for i386 machines, providing multiple paths of access to data, while being configured through a single, simple, and consistent interface. The user-interface for configuration is a graphical web browser that supports JavaScript, such as Netscape's Communicator and Microsoft's Internet Explorer. Some of the services currently include:

o DNS server
o Web servers (per-user and per-IP address)
o Email server (SMTP, POP, and IMAP)
o Internet News server
o FTP server
o Tape backup of virtually any networked machine
o Network routing, firewalling, NAT, and dial-in/out modems

Simply stated, the NetMAX project integrates public software, and provides a simple, common configuration interface for these softwares.

1.2 Goals of the Product

Free software created by the masses of programmers in the world tends to be written for other programmers. For this reason, it used to be that System Administrators were also programmers, with a sufficient knowledge of how to compile, configure, and upgrade pieces of server code, aka "subsystems". This is not as true anymore- with the explosion of the Information Age, these knowledgeable administrators have been supplemented by a horde of junior admins, and many more users. The need then becomes the ability to quickly reconfigure subsystems, not to upgrade and repair existing subsystems- more time is spent using the software than maintaining it, simply because there are more users.

The initial focus of the NetMAX project was to create an easy-to-install software package for FreeBSD that allowed for a junior administrator to quickly reconfigure a server's file sharing subsystem using a web-based interface. This was estimated to be a two-person, four month project. Eventually, the product was released in its initial version after over ten man-years of work. Many lessons were learned along the way.

1.3 Feature Creep

Feature creep is a deadly disease that can corrupt the most simple project. The NetMAX project is no exception: for our initial web interface design of the NetMAX, we started using static HTML forms for data input, with a CGI script used to store the data (more on data storage later). After only a few days, it was obvious that immediate data validation/feedback was necessary: it was too easy for the admin to enter invalid data. To facilitate active data validation, we started using JavaScript, but most of us knew only a little of JavaScript, and those that did found it cumbersome and limited. Java applets were not acceptable, because it crashed more than half of the browsers available at the time. A momentous decision was made then (early decisions tend to have huge impacts): use Perl, and let CGI scripts, not static forms, do the data validation. This meant that the developers had to learn Perl (and sed, and awk by-the-by), and convert existing web pages from JavaScript to Perl. Now it was simpler to create shared libraries of code, but we knew then that we didn't want to release our product's source code, and we would eventually have to come up with a Perl source-code protection mechanism. Malcolm Beattie's excellent new Perl-CC / "B" backend (version alpha 2), didn't work well for us at the time, so another method was needed (more on this later).

Who is responsible for feature creep? Everyone. It is part of the marketing department's job to evaluate the features, and find the weaknesses. On the other hand, the project manager must be able to say "no" to these scoundrels. This is the biggest problem the NetMAX project had in the beginning: we started with a small vision, and let it get way out of hand. Looking back, there was absolutely no reason not to release a web-configured file server. What we ended up with, after much more effort, is more salable, but it took years longer to get to market.

The Development Process

2.1 History

2.1.1 Why?

The root reason for the development of the NetMAX product was because of Windows/NT server and a Quantum Atlas I drive. As many unfortunates have found out, these Quantum drives do not correctly support Tagged Command Queuing, but they report that they do, so Windows/NT will use this feature. Unfortunately, this causes filesystem corruption, and the server crashes. Every time Windows/NT checked its filesystems on startup, it deleted several random files. We were unaware that these disks were the real problem, but we knew from our years of Unix experience that we could do better with a Unix (FreeBSD) and Samba/NetATalk solution. Within a day or two, we had a FreeBSD 2.2-SNAP running with Samba, NetATalk, and NFS. We ran this server, with no OS upgrades, for several years, until it was finally replaced by a NetMAX/Linux in August of 1999. It was very stable, and ended up having around twelve 4 Gbyte disks on three SCSI adapters at the time it was converted.

Since we knew how to make our own Unix file server, and we saw that there was a need for an alternative to a Windows/NT solution, we decided to create a product of our FreeBSD-based system: a web-configured File Server. One other programmer, Mike Suarez, and I were to create a system that could install onto a FreeBSD machine, and configure the file sharing subsystems via a web browser. We were also to keep in mind that the product may eventually be embedded into custom hardware. Our time estimates of the project length was four months for the two of us.

2.1.2 In the Beginning

After some initial design work, we both started working on a single FreeBSD 2.2 system, on which we installed Samba, NetATalk, and Apache. This means that we were both editing the same files at times, and modifying code that the web server was currently using. This worked for a time, because Mike was doing more design while I started coding what we had. After a few months, it became clear that we needed a single source repository, so we hacked one up: all files were to be edited on a single NFS server, which could be mounted read-only on a client machine, and a script was created which would install the new files onto the client.

This was simply a stop-gap along the way to a real source management system. The NFS mounting worked, but we got tired of editing the files on the server only, so we started mounting the NFS server read/write, and using RCS. This process was much more appealing than the previous process, and worked for a short while.

Somewhere around six months into the four month project, we added three more programmers, to help with coding the CGI scripts for the web pages that Mike was coming up with (feature creep had added User management and Machine/DNS management by this time). We had one main server (which had the RCS directories), running one web server on port 80, and everyone used it for development. This was fine, unless someone broke the common libraries (which happened a few times a day). Then it occurred to us that having another web server, running on another port, could stop the programmers from stepping on each others toes. For this, though, each programmer had to have their own copy of the development environment, so we bit the bullet and finally stepped into CVS. This was the best thing that ever happened to the project.

2.1.3 Introducing Code Versioning: CVS

CVS allowed each developer to have their own copies of the source code, and they chose when to update it from the master repository. Several people could now work on the same source code, and later merge their changes into the master. Moreover, it allowed for the developers to use their own FreeBSD machines as development environments, because CVS can work with remote repositories.

For not-quite-ready-for-prime-time code development, a CVS "branch" can be created. Developers can update their sources from the main branch, the "head", or from private branches. One example of branching was when we needed some re-work of the low-level filesystem creation code, so we created a branch. For a few weeks, development occurred on this branch, until it was ready to be brought in to the main line of development. A simple merge later, and the branch was mainlined.

Early adoption of the CVS philosophy "all the source ever written" would have been beneficial for the subsystem modules. We imported the FreeBSD source code as-is into our local CVS repository. The subsystems were different: since they were originally built as FreeBSD "ports", the FreeBSD ports and distfiles were imported, instead of the expanded source code. This means that the code for the subsystems was in the CVS repository as compressed tarballs, making patching difficult. Eventually, as we needed to update the versions, all "ports"-type subsystems were converted to expanded source code in the repository. See the similar notes in the section later on in this paper regarding the horrors of having Linux as binary RPMs in the source repository.

Working with multiple programmers made it harder to maintain code that others could pick up and read. Every programmer has their own preferred code style. We had to create rules on coding style, and enforce it by doing "code reviews" with all of the developers. These rules were not designed to limit creativity, their intent was to create a standard, so that other developers could more easily understand each others code. The code reviews helped individual programmers to learn several important things: 1) new ways of programming (Perl's TIMTOWTDI); 2) become aware of security and portability issues; 3) work together. Once an individual had gone through a few review sessions, they were held responsible for maintaining the standard. Another benefit of reviews is for the program manager to get a feel for what is missing from the project's standard libraries: don't re-invent the wheel.

2.1.4 System Installer Developed

Since all of the developers didn't have their own FreeBSD workstation to use as a test-bed, we needed a method to install FreeBSD onto a machine from a bootable floppy. So we imported FreeBSD 2.2.5, the latest FreeBSD at the time, into our own CVS repository, and I learned how to "make release", and came up with a daily build of FreeBSD, with the NetMAX pre-installed into the tree, plus all of the subsystems like Samba, NetATalk, and Apache, built in. The installer was a bootable floppy disk with a FreeBSD kernel and a shell script that simply NFS mounted the build server, partitioned a disk, created a BSD filesystem on it, and then extracted a tar-compressed file onto a new filesystem.

Now that we had nightly builds, and a complete install procedure that took less than one-half hour, every developer was given a machine and a boot floppy to do installations as often as they wished. At the time, we also had a method of updating these machine's version of NetMAX software to the latest version, using either "cvs export" or "rcp -r" of their NetMAX code tree on another machine. Much later, we gave that up, because full install times went down to less than fifteen minutes with an NFS-mounted server.

Since we could not expect our customers to have an NFS server handy, to export a directory with our tar-compressed NetMAX software on it, we added in an FTP method of installation. This was never popular during development, because the NFS installation method was slightly simpler. Installation was getting more acceptable, but most customers don't have FTP servers, or high-speed Internet access to Cybernet's FTP server.

We then learned how to make bootable ("EL-TORITO") CD-ROMs. Technically, the CD-ROM disc installation method ended up being exactly like the floppy-disk NFS-mounting installer- they are both read-only filesystems, so only the "where" is different. We actually had one customer who didn't have a CD-ROM drive on their system, but did have an NFS server available, so he ended up using the NFS-based installation method! This is great for OEMs who want to install the NetMAX onto a hard drive without having to install a CD-ROM drive in the machine first (a floppy drive will still be required). Direct duplication of hard disks also works, using tools like "dd", if both the source and destination hard disks are exactly the same. There has also been some work recently on a BOOTP installer, so OEMs and developers will not need a CD-ROM drive or floppy drive.

2.1.5 Repair Mode

The system has to have a Repair mode, so it can correct problems with broken filesystems and accidentally removed system software. The bootable CD-ROM (optionally) takes on this function, if a NetMAX installation is detected on any of the installed hard drives. It is a web-based repair method, which can repair damaged filesystems, replace bad or missing programs, and restore backed-up server configurations from either floppy disks or the "backup" slice on non-system hard drives. Also, it can be used as a security audit tool, to check for altered programs.

We initially chose to make it web-based because the user can setup the local X server after the CD-ROM (or boot floppy) is booted. From there, a supplied web browser to the localhost is brought up, and Installation or Repair can commence. This web-based Repair mode may be augmented by a text-mode Repair in the future.

2.1.6 The Web Server and mod_perl

All of the configuration of a NetMAX is done through a web browser-- there should not be a need (for most installations) to edit subsystem configuration files by hand. Since the web server is the primary interface to the administrator, we had to make it simple, consistent, and fast. We chose the Apache web server, version 1.2.5 at the time, because of its wide acceptance and large feature set.

The Apache web server worked very well, except that since each of our web pages were generated from CGI scripts, there was quite a bit of perl process creation/parsing/destruction going on. Loading a simple page took several seconds, because all of the Perl libraries we had created had to be re-parsed. It behooved us to try to use features of the web server to make our web pages work faster. There were two methods that we tried: mod_fastcgi, and mod_perl. After unsuccessful messing with mod_fastcgi, it was seen that a great improvement in speed could be gained from using mod_perl.

Over the course of about two weeks, every CGI script was converted to work under mod_perl, as well as without it- the changes were fairly minimal, and many of the changes could have been averted if strict Perl programming practices had been enforced, via Perl's "use strict" clauses. One of the main corrections, however, had nothing to do with mod_perl: if a subroutine is called in Perl, and there are no parenthesis present at the caller, nominally meaning no parameters are being passed, Perl will not modify the contents of "@_" to be empty. Under mod_perl, we were seeing "bogus" parameters being passed into the first subroutine called by some CGI scripts, because of these "missing" parenthesis. Once again, this could have been avoided if good coding style had been better enforced.

Mod_perl does have a few peculiarities with respect to Perl, due to its persistent nature. Subroutines like "BEGIN" happen only once, and global variables are persistent, and need to be re-initialized. Also, changed library files were not being detected and reloaded. We ended up creating a single library to take care of these problems. The only problem left was the large amount of RAM consumption by the web server processes.

Because mod_perl will parse a CGI script and store it in a variable, the web server process will take up more RAM than a non-caching web server. There is no easy solution to this RAM consumption problem with the existing version of mod_perl. It could be re-written to cache only a certain number of CGI scripts at a time, and to use a LRU cache, and/or to serialize all CGI requests through one mod_perl web child, but we felt that the alternatives were acceptable: 1) decrease the number of handled requests per child web server process, so they don't stay in memory as long, and 2) after a half hour of inactivity, a cron job restarts the administrative web server, thereby reaping the RAM-consuming web processes. Now the NetMAX runs at acceptable performance levels with only 32 Mbytes of RAM.

2.2 Data Representation

One of the more important decisions up front was to determine the source of the data that we were going to use for configuration: do we try to parse the text and binary data files used for each subsystem, or do we "mirror" the data in our own format? The choice was fairly simple-- we took a look at a reasonably complicated configuration file, Samba's smb.conf, and decided that creating a useful parser for it would be non-trivial, so we decided to use our own data storage mechanism. In order to remove the overhead of a persistent database server process, we chose to use text files on disk rather than a database back-end like SQL or LDAP (remember- this was initially an eight man-month project). The initial target hardware was an i486 DX2/66 type machine, so speed and memory use was a major concern.

During development, it is quite common to want to read and write the contents of the back-end data files by hand, so we used text files, and created two simple tools to read and write these file formats. The data was to be used primarily by Perl scripts, so we used Perl's hashes, which allowed us to access data by name. Some data required a single-level hash, others required a two-level hash (a hash of hashes), and only one or two required a three-level hash ($r->{$i}->{$j}->{$k}->[0]). All of the data values are kept in arrays, even if there is only ever going to be one value in the array- having all configuration data references, as opposed to some, end in "->[#]" makes it easier for programmers to remember. We still use this internal representation format today, but the on-disk files are in a binary format determined by a Perl module called Storable, because Storable's file I/O is faster than parsing text files.

One example of the link between the configuration data and the web interface is the printer data. The data is stored in a file called "PrinterInfo", in a two-level hash. The first hash level is the name of the printer, and the second level holds the configuration data items, such as the type of the printer, the connection method to the printer, and the printer's description. To read the data, we use our library call '%pd = &Filer::SDBRead("PrinterInfo")', thus loading the entire printer configuration data into the "pd" hash. To access the printer description string, a programmer would then write '$descr = $pd{"local"}->{"Description"}->[0]'.

One drawback to not using a database server is that a record, file, and database locking mechanism must be setup from scratch. For the first several months of development, we were able to get away with not locking the databases. Eventually, of course, this led to data corruption. The locking mechanism that we now have in place takes a "snapshot" of the requested data items when the database is read, which happens only once, when the administrator enters a configuration page. Further reads from the original database are not necessary, because the data is kept in a server-side "cookie" file. The requested data, along with the snapshot value, is stored in this "cookie", and is checked when the data file is written, when the administrator records changes in system configuration. If the snapshot taken at the time the data was read in does not match the snapshot at write time, then an error is returned.

Web browsers have a limit on the amount of data passed in the URL via the "GET" method, and the web server also limits the length of this data, so we had to devise a method to get large amounts of data from the web browser to the server in order to efficiently maintain some of the larger, paged, tables that appear on many of the configuration pages. This means that we had to use the "POST" method to transfer data from the browser to the server, and use locally stored "cookie" files to hold the passive data (required data that is not being displayed on the current web page). Browsers get references to these server-side cookies, instead of the raw data that they contain, thus making it simple to associate unlimited amounts of data with a web page.

2.3 The Commit Process

There was another important reason to choose using multiple data files and no back-end database server: we had decided that when an administrator requests changes to the system, they don't get implemented immediately, they get batched, and happen when the administrator does a "commit". Propagating system changes immediately would take too long if we were to reconfigure several subsystems when, for example, a single user gets their public web pages enabled. By breaking up each subsystem's data into their own files let us create a "make" type dependency for each subsystem, so when a commit is requested, only the affected subsystems get reconfigured. The "currently running" subsystem configuration data is kept separate from the "next desired" configuration data.

The goal of the batch configuration method was speedy reconfiguration, as well as decreased downtime- the system never has to be powered down, no matter what configuration the administrator requests. Unfortunately, we had an administrator who would reboot the machine after a commit anyway, because they were taught by Windows/NT that that is what you do. This behavior was strongly discouraged: this is Unix, not Windows/NT!

To determine if configuration data had changed, we implemented a "diff"-like program for our commit system. It currently has two modes of comparison: it defaults to comparing the time on the data files, or it can perform a checksum comparison of the contents of the data file. Normally, the last-modified-time of the data files is sufficient, but in a few cases, as with data files that are created automatically at boot time, the checksum difference method is used.

2.4 Porting to Linux

After the initial release of FreeBSD-based NetMAX, it was time to port it to other popular operating systems. After all, most of the work was the web-based configuration interface and system build tools, so as long as we used the same CGI scripts and subsystems, most of that we would have to do would be to recreate the build tools (and find the bugs). We chose Red Hat 5.2 Linux/i386 for the initial basis of the port, because of its popularity, wide acceptance, and for its inclusion of the stable Linux 2.0.x kernel.

2.4.1 What about the Source Repository?

We were once again very happy that we had chosen CVS for handling our source repository, because of an add-on package to CVS from Network Appliances called "cvslines". With this tool, we could very easily keep track of OS branches in our source tree. Now, when changes are committed to cvs-lines enabled modules, the option to automatically merge the changes in to the other OS lines is presented to the developer. Internally, cvslines uses cvs's branch tags, and manages them intelligently.

In order to decrease development time for the Linux porting effort, we decided to not build the entire operating system every night, but instead to mostly extract binary RPMs. This turns out to be a blessing and a curse- we don't have an expanded source code repository for Red Hat, so it made finding source code "somewhere in a SRPM" more difficult, but the build time was decreased from about eight hours (FreeBSD's entire system build time) to about three. Working with binary and source RPMs in the CVS tree is difficult, and (personally) not recommended. Having all of the source code readily available, as for the FreeBSD effort, would have made some of the tasks much simpler.

Consider, for example, the problem with SIOCGIFCONF ioctl that we corrected in our FreeBSD 2.2.7 sources: there are many user-level programs that don't provide enough buffer space for this ioctl, some for as little as eight interfaces total. The NetMAX has 16 PPP, 16 tunnel, and 16 SL/IP interfaces statically configured into the kernel, so it can act as a dial-in server. Therefore a minimum of 16+16+16+1 (Ethernet) interface buffers is necessary, otherwise the program, like "portmap" and "pppd", will not find all of the interfaces. Fixing FreeBSD's code was simply performing a recursive "grep" in a checked-out source tree, correcting it, and committing the changes. Fixing Red Hat's was a matter of extracting all of the source RPMs, getting FreeBSD's grep compiled on a Red Hat 5.2 system (the supplied grep does not support recursion), performing a recursive grep, fixing the files, rebuilding the SRPMs and RPMS, then committing the changes.

Another set of problems that we ran into: there are lies, damn lies, and Linux header files + man pages. Many functions that were documented (using BSD man pages, no less) didn't exist in any library. Several functions that had prototypes in header files didn't exist either. And there were also the functions that didn't act quite the same: select() for example. The Red Hat/Linux version of select() modifies the returned time value to indicate the time remaining on the call. We discovered this after months of head-scratching, looking for a specific problem that had been happening to our backup subsystem. The Linux man page even documents this behavior as non-standard.

Most of the Red Hat/Linux user-level code comes from GNU, so some functions, like "getopt()", behave slightly differently than the FreeBSD version. By default, the GNU getopt() will rearrange the command-line arguments to put all arguments that start with "-" before any that don't. This caused a problem in our Universal Translator (discussed elsewhere in this paper). Setting the "POSIXLY_CORRECT" environment variable corrected this behavior.

Cybernet Systems intends to write a white paper detailing the differences we found between development under Linux vs. FreeBSD.

2.4.2 Compat Scripts and the Universal Translator

Another shortcut that we took was to emulate the output of certain FreeBSD commands using "compat" scripts. Programs like "netstat" and "ifconfig" are different enough under Linux to warrant this: Linux has Ethernet interfaces labeled "eth#[:#]", such as "eth0" for the first Ethernet interface, and "eth0:#" for the aliased interfaces off of it. With FreeBSD, each interface has the aliases associated with the primary interface, there are no "sub interfaces". The purpose of the compat scripts is to take care of these differences, so we can use the FreeBSD calling conventions without modifying our NetMAX code. Currently, there are about forty compat scripts, of which half are "ifconfig" and "netstat" variations.

Perhaps the greatest single contribution to the success of the porting project was the Universal Translator (UT). The NetMAX software has system() calls sprinkled throughout the code, and sometimes these programs are required to run as different users. To do this, we had already written a secure program that would or would not allow the uid/gid change, based on certain criteria. We simply modified ALL of the system() calls to go through this program, and modified this program to perform command translation.

For example, given the command "/sbin/ifconfig -a -u", the UT would leave it alone for FreeBSD, but convert it to "/usr/netmax/compat/ifconfig.iface.all.up" for Linux. Programs like "grep" and "touch" are installed in different locations on FreeBSD vs. Linux, but going through the UT makes them easily translatable. The compiled-in translation keys/values helps for fast lookup, with an optional external keys file for non-standard translations and testing.

Protecting our Investment: Licenses and Source Code Protection

Many people will argue that for acceptance into today's "Open Source" software marketplace, source code availability is a major advantage. Cybernet Systems certainly believes in Open Source- it is what makes the NetMAX project possible. The reality, however, is that companies must make money in order to survive. Cybernet Systems invested over ten man-years of work into the initial NetMAX product. If it gave away all of that effort for free, then it would lose much of its competitive advantage, unless there is a clause in the license agreement that let it retain this advantage somehow, perhaps to limit its use to strictly non-commercial applications. Cybernet is currently investigating just such a licensing agreement on parts of the NetMAX system, as well as a GNU-style license agreement on others. Some portions may remain closed-source.

That said, the following sections document the steps we have taken to protect our competitive advantage.

3.1 License Manager

Every NetMAX has a license manager daemon, which can detect other NetMAXes on your LAN. If another NetMAX with the same license number is detected, or it cannot send its license number out, will disable many features of the NetMAX, and allow the administrator write access to only the Serial Number Settings page, which will be brought up automatically when there are license problems. There are multi-machine license numbers, so up to N licenses are allowed in such cases. This licensing scheme is implemented to help prevent theft of license numbers. License numbers are recorded when a NetMAX customer registers their product with Cybernet.

The NetMAX also has the ability to have several license numbers, because the original product has been "broken up" into several sub-products, and all license numbers are checked to ensure the products' capabilities. Each license number has the sub-product code embedded within it. There are also a special "demo" license numbers, with an embedded expiration date. Originally, the demo license number caused a fixed expiration date, but we found that embedding the expiration date in the license number is more valuable. By giving out a demo license number, any release version of the product may be "tested out" by customers, and a new, non-demo license number substituted when the product is purchased, without re-installing the product.

3.2 Kernel-Based Source Code Protection

Since most of the NetMAX software is written in Perl, and Perl is a interpreted/tokenized language, the source code is necessary in order to run. This makes it difficult/impossible to "hide" your proprietary Perl code. Some of the alternatives include pre-compiling the Perl text into Perl byte code, or even C code, using Malcolm Beattie's excellent "B" back-end. This works for smaller Perl programs, but in practice, with the thousands of lines of Perl code in the NetMAX system, and the dozen or so Perl libraries we created, it didn't work well. Some operating systems support an "undump" feature, which can run core-dump files (which contain the tokenized Perl code, not the original text). Perl supports this feature, to some extent. Testing this feature with FreeBSD and the NetMAX code yielded unpromising results.

For the initial revision of the NetMAX software, we settled on providing a custom kernel, and created an interface between the custom kernel and our version of Perl, where the perl executable would feed parts of the encrypted source code to the kernel, which would then modify and send the data back to the perl executable. If the perl executable was modified, or being debugged, bogus data would be returned, rendering the decryption "incomplete". The data path is secure, because the kernel is talking directly to the application.

With this kernel-based scheme, we unfortunately could not allow users to modify their kernels, because we cannot give the source code to the kernel part of the decryption away, else someone could possibly obtain our Perl source code (which would be a license violation anyway). This was not a serious drawback, because the kernel included most of the available drivers.

3.3 User-Level Based Protection Server

The most important feature that was involved in the FreeBSD to Linux port was the source code protection mechanism. Since the licensing of the GNU and Linux software is different than BSD licensing, we had to change the Perl source code protection mechanism for Linux. We can no longer use a kernel module, so we had to create a user-level protection daemon. It had to be as secure as the kernel-level decryption mechanism.

With the protection daemon, only certain executables will get "valid" results. The Perl "client", for example, must have a certain signature, otherwise bogus data will be returned by the daemon. Also, certain other requirements must be met prior to successful decryption.

The conversion of the kernel-based protection mechanism was so successful, that we decided to use it for the FreeBSD 3.x versions as well. One of the other major benefits of this, besides resolving some licensing issues, is that NetMAX administrators can now create custom kernels.

3.4 Hardware Dongle

When Cybernet initially started the NetMAX project, there was an option to create it as an embedded system, where it would simply be a piece of hardware. For this reason, we developed a one-time programmable dongle that is placed on the parallel port, and can act as a pass through. This dongle was to contain the license number of the product, and must match the license number of the server that it was attached to.

For many reasons, one being that the market already had several hardware-based NetMAX-like solutions, this path has been discontinued, at least for the time being. Cybernet also felt that it should concentrate on getting the software completed, and let OEMs worry about building custom hardware- we could license them the NetMAX software.

3.5 Benefits and Drawbacks

Running a protection mechanism in the kernel limits the ability to generate new kernels. Providing the ".o" files necessary to link the protection into the kernel is not acceptable: it severely limits the hacker's search space for the protection code, allowing them easier access to the protection algorithm. The algorithm, however, is intentionally complicated, and is non-trivial to reverse engineer.

If the protection mechanism is in the kernel, then there can be a guarantee of atomic operation. For a user-level protection, this was one of the major hurdles to overcome. The initial version of the new user-level client/server protection did guarantee atomic operation, but at a cost: multiple simultaneous processes performing decryption could (and did, most often on the faster machines) get into decryption lock-step with the daemon, rendering unacceptable decryption performance. This problem was eliminated using smaller resource-locked timeouts, with the cost of only a few more processor cycles.

For security reasons, Perl must be statically linked into the Apache web server. This makes the Apache "bloat": it takes much more space in RAM. We have taken steps to reduce this problem, but there is no simple solution to eliminate it.

With protected source code, the users, administrators, and developers, cannot modify the code. This makes it impossible to add features to installed code, or to correct problems. This prohibition is inherently incompatible with the *BSD and Linux open-source policy.

Future Efforts

When the NetMAX/FreeBSD and NetMAX/Linux features were frozen, it gave Cybernet a short time to plan on where it wants to take the product family. As always, the list of desired additional features continually grows, and the versions of the subsystems keeps going up. Porting the NetMAX to other operating systems would not be as complicated as the first time, especially across to other BSDs and Linux distributions. Perhaps a Windows version?

Since the beginning of this project, we have been attracted by the idea of having NetMAX in embedded systems, because that is one of the areas that Cybernet is very strong in. We are pursuing OEMs and embedded systems manufacturers.

We are not going to abandon either FreeBSD or Linux- we will take them further, with an "enterprise" version of the software, designed to handle large-scale computing environments, and even perhaps computing clusters. With the integration of security technologies like Kerberos, and the file serving like CODA and Andrew File System (AFS), we will be able to handle larger installations.

NetMAX is currently a server-based product, but we are working on a client version as well. It will come with pre-installed and configured office applications, and a choice of desktop environments like KDE and GNOME. The existing Package Manager will play in important role in the new NetMAX client systems, allowing users to easily add features to their own workstations through the same, easy to use, web based interface.

There are other places to go with the NetMAX software, like the install-over that we currently have for Red Hat/Linux 5.2, where the NetMAX system installs as an RPM over an existing Red Hat installation. Now that we've done it for Red Hat/Linux, we can do it for FreeBSD as well.

Since the OS vendors, such as FreeBSD and Red Hat, generate major releases a few times a year, we are busy keeping up with the changes in these systems. Cybernet has recently been granted commit privileges to the FreeBSD source code repository, by Jordan Hubbard, leader of the FreeBSD project. Not only does this enable us to better track changes in the BSD world, it enhances our ability to give value back to the open software community, in a faster way than was previously possible.

Finishing putting together the NetMAX Perl API documentation, to enable others to develop software modules for the NetMAX, is a high priority. Along with this, Web and file space is being created on our high-bandwidth site for developers and users of open source software, including the NetMAX. Procedures and other documentation is being written to allow others to deploy interesting open-source type content on this site.

Summary

Cybernet has spent well over ten man-years in the development of NetMAX, a web-based configuration environment for many open-source software programs. This project started with a FreeBSD version, and was ported to Red Hat/Linux. Along the way, we learned things like:

o  Code Management
      use CVS and cvslines
      the repo is for the source, not tarballs of the source (disks are cheap)

o  Project Management
      only one manager, with the ability to say "no"
      early decisions have huge impacts, so think problems through

o  Managing Feature Creep
      listen carefully, but put it on the "for future releases" list

o  Source Code Protection and Licensing
      for protection, use a user-level daemon, as opposed to a kernel-level hack
      license numbers can encode data like product code and expiration date

o  System Installation
      create bootable CD-ROMs and floppies, plus quick installation of daily builds for developers

o  Database Management
      pseudo-record locking can be acceptable

o  Implementing Subsystem Changes
      batch the changes for speed

o  Writing Portable Software
      brucify your code! (generate a code style, and strictly enforce it)
      peer review is invaluable!

One of the biggest lessons: if you are going to sell your software to others, you should try to use it in-house. We started early (way before the first sale) in converting all of our internal servers to NetMAX servers (except for one lonely, legacy Windows/NT server). This way, we were able to find problems that showed up only after longer-term usage. Also, we spotted features that were missing.

If it wasn't for FreeBSD, this project would not have been possible. The stability (rock-solid), availability (CVS repo), support (public mailing lists), and broad range of tools (ports) for FreeBSD were the things that really let Cybernet put this project together.