enhance kiconv - patches for kiconv

kiconv is a iconv(3) style character conversion utility in kernel. Since some file systems record filenames using Unicode, we can't avoid converting characters from Unicode to local code in order to acquire full capacity of these file systems.
At this time, any 2 byte character to 2 byte character conversion is possible using xlat16 converter. However, there are some limitations.
- unable to handle over 3 byte characters at all.
- tolower/toupper conversion is only possible for single byte characters.
For example, UTF-8 has 1-4 byte characters. GB18030 has 1,2,4 byte characters. At this time, we're unable to handle them.

I made following patches to get rid of such limitations around character conversion.

Change Logs

[2005/08/23] bp@ gave me good advices. Three changes.

[2005/08/23] I found a bug in smbfs UCS-2 support.

[2006/08/18] Ota-san gave me bug reports.

[2006/08/18] As I promised a year ago, I'll check in these patches, soon... I apologize for long inconvenience.


Patches

  1. Split tolower/toupper code from usual xlat16 kiconv table,
    and using tolower/toupper is possible independently from code conversion.
    - No API breakage.
    - It's possible to tolower/toupper characters up to 2byte.
    - Add towlower(9)/towupper(9).

    NOTE: in RELENG_5 or RELENG_6, the following modification are required.

  2. (This feature depends on above patches.)
    Add new kiconv converter(ucs converter).
    It is intended to make all xlat16 based conversion internally go through UCS-4.
    Just adding a few charset, much more charset pairs are flexibly used in conversion.
    This means no need to register `A<->B' conversion table to convert `A<->B',
    if an `A<->Unicode' conversion table and a `B<->Unicode' conversion table are already registered.
    - No API breakage.
    - UTF-8 conversion is available collaterally.
    - New function iconv_add() was obtained from Darwin smbfs.

    NOTE: in RELENG_5, the following modification is required.

  3. Extend local character width to 4 bytes for each file systems.

  4. (This feature depends on above patches.)
    Teach smbfs speaking to a SMB server in UCS-2.
    Recent MS Windows and samba 3 speaks in UCS-2.
    These codes were obtained from Darwin project.

    NOTE: in RELENG_5, the following modifications are required.

    NOTE: in RELENG_6, the following modifications are required.

  5. Add userland utility(kiconvctl) and rc script.
    With this utility, it's possible to preload tables with rc scripts so that vfs.usermount feature works with kiconv.

    Some additional work is required for smbfs(libsmb) in order to use vfs.usermount.


  6. [Work in progress(5%)]
    Add capacity of converting 2byte character to 4byte character. (xlat32 converter)
    4byte character to 4byte character conversion tables can be manually defined.

  7. [Not yet started] Write iconv(9) manual.

  8. [Not yet started] Fully GB18030 support?

  9. [Not yet started] HFS+ issue?


Other Topics

2005/08/22:
I got a problem report about using kiconv mounts in fstab(5).
When /var/run/ld-elf.so.hints disappeared or broken for some reason, boot sequence will fall into single user mode because kiconv depends on libiconv from ports.
If you don't like to fall into single user mode, I recomend that you define extra_netfs_types variable in your rc.conf(5) with your kiconv file system's name.
(e.g. extra_netfs_types=msdosfs)