Recommendations for web site development What follows are some recommendations for web site development. These are based on my experiences designing the infrastructure[1] for a variety of different web sites, including the Design Council, Cable and Wireless, Ted Baker, and Laura Ashley. The traditional way to build dynamic web site is to store the majority of the content in a database. Pages are then generated on the fly, either using CGI scripts, or by embedding the programming within the web page, using technologies like PHP or ASP. +----------+ +---------+ +--------+ +---------+ | | | PHP/ASP | | Web | | | | Database | ----> | page | ----> | Server | ----> | Browser | | | | | | | | | +----------+ +---------+ +--------+ +---------+ Sometimes you might throw some templates in to the mix to make it easier to maintain the look and feel of many pages with similar content; +----------+ +---------+ +--------+ +---------+ | | | PHP/ASP | | Web | | | | Database | ----> | page | ----> | Server | ----> | Browser | | | | | | | | | +----------+ +---------+ +--------+ +---------+ ^ | +-----------+ | Templates |+ +-----------+|+ +-----------+| +-----------+ In theory this lets the web site designers work on the templates without needing to worry about the structure of the content in the database, and lets the programmers get on with extracting the data from the database without worrying about how that content will appear on the web site. There are a number of problems with this approach: 1. It tends to tie you to one scripting language for your site. Whether that scripting language is PHP, ASP, CGI in Perl, or some other language, you tend to get tied to it. This makes it difficult to migrate to other web servers or hosting technologies. It also raises the bar to people who want to contribute, because they have to know your scripting system. 2. It tends to tie you to one templating mechanism, with the same problems as point 1. 3. It ties your scripting code to the structure of the database. Changing the database can require changing the scripting code in many different places. 4. It puts your application logic and presentation logic in the same place. Now, if your programmers need to update the way the website behaves they have to edit the same files that the designers do, and vice versa. A screw up from one team can introduce bugs that are difficult to find and hard to fix (imagine if one of the designers inadvertently deletes a line of PHP). 5. It makes it hard for the designers to design the site without having access to a test database, which increases the burden on the designers. It is possible to lessen some of these problems. For example, point 4 can be addressed by being very anal about the separation between the script and the templates. This takes a lot of discipline, and is something else that often needs to be taught to new volunteers before they can make a useful contribution. XML and XSLT make it easy to avoid this in a cross-platform standards compliant way. An XML implementation would look something like this: +----------+ +---------+ +-------+ +--------+ +---------+ | | | db2xml | | | | Web | | | | Database | ---> | scripts | ---> | XML | ---> | Server | ---> | Browser | | | | | | | | | | | +----------+ +---------+ +-------+ +--------+ +---------+ ^ | +-----------+ |XSLT sheets|+ +-----------+|+ +-----------+| +-----------+ At first glance this similar to the previous example, but the XML step introduces a valuable separation between the code that generates the content of the site, and the mechanism that turns that content in to HTML (or other formats, such as RDF). As before, content is extracted from the database. However, instead of being immediately converted in to HTML using PHP or ASP, this content is converted in to one or XML schemas. This is the "db2xml" step. Note: db2xml is not a command that is run. It's just a term for this step of the process. The scripts that implement this step can be written in a variety of different languages. In fact, different scripts can be written in different languages, depending on what needs to be done. The scripts are also often much simpler, because they don't need to bother with any extraneous code relating to generating HTML, or handling templates. The XML is then converted in to its final output format (which might be HTML, RDF (for syndication to other sites), or some other format) using the XSLT stylesheets. These stylesheets are written by the designers, and implement the site's look and feel. This conversion process might be a batch job (if the site's content changes infrequently), or it might be done each time the page is requested (if the content changes constantly). It is also possible to mix and match schemes on different pages, and the resulting output can be cached so that the pages are only regenerated on an as-needed basis. The XML files that are written by the scripts represent a contract between the site programmers and the site designers. The site designers undertake to style the XML to generate the site, and the site's programmers undertake to generate XML according to the agreed schemas. The designers don't need to care how the XML is generated from the DB (or the structure of the DB), and the programmers don't need to care how the content will be used on the site. The example above explicitly spells out all the steps that are taken, and as a consequence, looks like it has additional complexity. In reality, only one additional step is introduced (styling the XML output to HTML), and with caching that has a minimal overhead. A typical implementation of this setup would see the scripts (CGI, ASP, PHP, whatever) producing output that starts like this: Content-Type: text/xml The webserver notices that the content type is XML, searches the output for xml-stylesheet directive, applies the stylesheet, caches the result, and sends the output from the stylesheet to the client's browser. This can be done, today, using Apache, and a variety of different Apache modules (AxKit is a good example). Some web browsers (Mozilla, IE6, ...) even implement this functionality natively. I.e., given an XML file with an xml-stylesheet, they will download the stylesheet and apply it on the *client's* side. This can reduce download times, and improve the use of the end-user's cache. Your web designers then do not need any scripting functionality or database on their own machine. The programmers simply provide them with examples of the XML that will be output from the database. -- [1] I take no blame for the content or layout of these sites :-)