After we have a pkg-data file and the program(s) to take advantage of it, then there a wide variety of ways which we might want extend the format. What follows is just some random blue-sky ideas of additional sections that we might want to add. I am not claiming that these are all good ideas, but I just wanted to list a few ideas I had for possible extensions. This is mainly meant as a springboard for discussion.
Right now we (Darren and I) have no plans to include any of the following in the initial implementation of the pkg-data support. However, if any of the following generate a lot of interest, we will see if we can implement those ideas in the time we have available.
Here are some possible extensions:
We are probably going to have some ports where collapsing
all the information into a single pkg-data file will
result in a much larger file than we want. With this
option, we could break some sections out into a separate
file (or files), and then have the ports pkg-data file
"include" those other file(s). The most likely example
would be to have all patches put in a file called
pkg-patches, and then have the original pkg-data
file
==<include:pkg-patches>
It may seem rather silly to go to all this work to collapse everything into a single file, only to turn around and break that up into multiple files. However it would be useful for some ports, and will make even more sense if combined with some of the following extensions to the format.
Note that this is a single statement. There is no need for a "closing tag", the way I have described it above. Also note that a single pkg-data file might include multiple files, so it is not just setting some "single value". However, it may be that some other syntax should be used to specify the filename.
This is something that I would like to implement, but I have not thought through how it should interact with the --expand and --archive options of the PdHandlingProgram.
The ports collection has recently added the file /usr/ports/UPDATING to tell users any particularly tricky steps which the user must do to update their ports. Given 10,000 ports, with plenty of interactions between ports, I wonder if putting that information in a single file will really scale well.
Perhaps we could add some ==<updating> section to the pkg-data file of ports that need it, and then the user could run some program to see what special situations they need to care about (based on what ports they have installed). This section would just be multiple lines of plain-text that would be shown to the user.
One obvious shortcoming of this idea is that the UPDATING information is most-likely based on what ports they presently have installed -- and that may be very different from the ports that presently exist in the ports tree. In some cases, the port they have installed may not even exist any more.
This could be an md5 digest of the information in this pkg-data file. Note that it would be specific to any given file, and thus files which are included would each have their own versions of this section.
If this were done, then the way it would work is to calcuate the correct md5 for the entire file while assuming this digest value will be all zeros. Then when you have the real value, put that into this line. That way, the actual value of the MD5digest will not include itself, but you will still have some to tell if any byte in the file is corrupted.
This would usually get messy when it comes to handling the $FreeBSD$ or $Id$ value in a file. The program would probably also skip over that section, since the exact value of that string does not effect how anything in the port is actually processed.
This obviously will only detect simple data corruption of some sort. It is not going to provide any real protection against someone who is deliberately trying to trojan the ports collection on your disk. Although I guess it might be useful to see if the value on your disk matches the value in some trusted repository.
It seems to me that some of the information which is saved in the ports-Makefile is put there simply because it is the convenient place to put it. For any cases where that is true, we could copy it into the pkg-data file, and then remove it from the Makefile. Or it could be that I am simply wrong about that, and none of these values should be moved...
Up to this point, I have only talked about tags which set some value (plus the "include" line, which is more of an operation-request than a tag). In this case I am thinking about having some kind of "selectors" in a pkg-data file. The idea is that a pkg-data file could do something like:
==<lingua==chinese> ==<include:pkginfo-chinese> ==</lingua> ==<lingua==german> ==<include:pkginfo-german> ==</lingua> ...I picked "lingua" as a name because the word "language" can bring to mind other aspects of compiling a program. Some better tagnames for this might be "dialect" or "tongue". Just something so people know it refers to "human language"...
There are obviously a number of details which would need to be filled in there, and I suspect that XML experts will tell say that I have chosen the wrong syntax anyway. Since we have no immediate plans to implement this, I will not attempt to solve those issues. But whoever does try to implement this will have to fill in the missing details and make sure this is using a reasonable syntax...
Once we have the idea of selectors, we could also use it to select pkg-data information based on the branch of freebsd that the user is running.
After we have ports which are using pkg-data files, we could also (perhaps) recognize different pkg-data files for "stable" vs "test" versions of a port. In general there would only be the "stable" version of any port, but sometimes you might want to distribute an alternate version of a port for people to test, even though that it has not been tested enough to make it the "stable production version" of that port. My feeling is that this would be handled by a separate pkg-data file in the directory for that port. Perhaps name the file "pkg-testdata". By having it a separate file, you know that you can not be disrupting the stable-production version of the port. Users who want to use the alternate-version of the port would have to take some special action to get the right pkg-data file.