The pkg-data files are obviously of no use unless we have a way to do all the same work that is done in the present ports setup. I believe the best way to handle this is with a program that understands pkg-data files, and that does all the necessary work based on parameters to the program. I personally expect that we will avoid a lot of headaches if we make this a standalone C program, instead of trying to implement everything via clever contortions with sed, make, awk, and shell commands.
What follows is a list of some operations that I expect this program to perform. I have not done all that much work on the ports collection, so I am sure there are many important operations that I have missed here. Please let me know what those are.
It might be that for each of the following operations, the program should also support this option to specify which ports-directory to work on. If this option is not specified, then the program will work on the files it finds in the current working directory.
This would take a list of filenames, compute their MD5 checksums, and compare those results to the lines in the distinfo section of the pkg-data file. For a port which was not already transformed to use pkg-data, this option would compare the results to the contents of the distinfo file of the port.
This would take a list of filenames, compute their MD5 checksums, and install those results as the distinfo section of the pkg-data file. For a port which was not already transformed to use pkg-data, this option would save the result into the distinfo file.
Personally, I'd like to see the --makesum and --checksum operations support both `md5' outputs (which is what the ports collection presently uses), and `md5 -r' outputs (which is neater in appearence, IMO, and also uses fewer bytes...).
The program would apply all the patches in the pkg-data file to the distribution files for a port. The idea is to avoid the bother of creating temporary files for each patch just to apply the patches. The program could just loop through each patch and do the equivalent of:
cat <lines-from-the-patch> | patch -p0
This operation might also take optional tag-names (of some sort), which is how all the special EXTRA_PATCHES could be applied.
This would take some of the sections in the pkg-data file, and expand them into separate files in some temporary work directory. Probably into a directory called work/pkg-data or just work/data under the directory of the port. The intent of this is to make it easy for ports-developers to work on ports the manner they are used to working on ports. This will not extract *all* sections from pkg-data, it would only extract the ones that developers might need when working on a port.
I personally suspect that this should also rename the pkg-data file, to make sure the developer remembers to "--contract" their changes back into the pkg-data file before they `cvs commit' whatever changes they made. Maybe rename the pkg-data file to pkg-data-wip, and then rename it back as part of doing the --contract operation.
programming NOTE: when the pkg-data file is expanded, then the program will have to be smart enough to use the "expanded versions" of any section which is expanded, and not the sections from the file named pkg-data-wip. And the program obviously needs to recognize that a port-directory with a file named pkg-data-wip is a pkg-data port, and not look for information from the standard (pre-pkg-data) files.
This operation might take an optional parameter, to indicate which section(s) should be extracted. For instance, some sections may need to be extracted into a work directory for every user who builds the port, while other sections would only be needed by the developer of a port.
This would be the reverse operation of --expand or --extract. The idea is that a developer would (a) expand the pkg-data file, (b) do whatever work they need to do, (c) test test test, (d) --contract all those work files back into the original pkg-data file, and then finally (e) `cvs commit' the result.
This verifies, as much as possible, the format of all the data inside a pkg-data file. By this I mean that it makes sure that all required-sections are in the file, and that they are in the right order (in the few cases where a specific order is required). This operation should also make sure that each opening-tag has a matching closing-tag, and that separate sections are indeed separate. It should also check, as much as is reasonable, that the data inside a section "looks right". For instance, check that all the lines in a distinfo-section are the format expected for lines in distinfo.
If the pkg-data file does not pass all of these tests, the program will print out some error messages, and exit() with a non-zero command status. This might be run by the `cvs commit' process, to avoid at least some common errors when committing pkg-data files.
This option is expected to only be used on ports which do have a pkg-data file. Ports which do not have a pkg-data file will continue to be verified by other means.
This would just copy the value of "section-name" to stdout. I am not sure how much this will be needed. This might be used in `make describe' processing, for instance (or more accurately: it might be used by whatever process looks into /usr/ports/INDEX for the pointer to pkg-data information).
This might also be all that is needed for pkg-plist processing.
I know that `make describe' presently works with the
pkg-descr file, and if a port has that file then the full
pathname is put in /usr/ports/INDEX. Right now I do not have
a good idea for how to handle that in a pkg-data world. My initial
guess is that we could put a pathname to the pkg-data file in
/usr/ports/INDEX, and then change whatever processes reference
that field to call the pkg-handling program with some option to print
out the pkg-descr section of the file.