The pNFS service is now in FreeBSD-12 and newer. In FreeBSD-13, it will also support NFSv4.2. Mixing an MDS server that supports NFSv4.2 without all DSs supporting NFSv4.2 will not work correctly. When upgrading to FreeBSD-13 (or a pre-release from main/current), upgrade the MDS after upgrading all the DSs. The mounts from the MDS to the DSs must all use NFSv4.2 if the MDS supports NFSv4.2 (minorversion=2 instead of minorversion=1 on the mounts). The remainder of this document assumes FreeBSD-13, which supports NFSv4.2. (For FreeBSD-12, replace all occurrences of "minorversion=2" with "minorversion=1".) Overall Goal A pNFS service separates the Read/Write operations from all the other NFSv4.n Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four. This could be increased by recompiling with NFSDEV_MAXMIRRORS set to a larger value. The Metadata server will always be a single point of failure, just as a single NFS server is. Overview of Plan B A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are FreeBSD-12 or newer systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created on all of the DSs. For pNFS aware NFSv4.1/4.2 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1/4.2 is to be used to do I/O on the DS plus that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, called File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1/4.2 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS, so that the DS can be disabled if it is mirrored. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. The FreeBSD DSs maintain the same ownership, mode and ACL on data storage file as the corresponding file on the MDS, so that the DSs can apply the same permission checking as the MDS does. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Write and DeviceID(s) that identify which DS(s) the data is stored on, along with the file handle(s) for the data storage file on the DS(s). At this time, the MDS generates File Layout layouts to NFSv4.1/4.2 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. A mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1/4.2 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS(s) as a proxy. Multiple DSs can be on the same FreeBSD server, but the DSs must be on system(s) separate from the MDS. When the MDS is configured, DSs can be used to either store data files for all exported file systems on the MDS or a specific exported file system on the MDS. The latter configuration allows a system administrator to limit allocation of data storage for an exported file system to specific file system(s) on specific DS(s). For the case where the DS(s) are assigned to specific MDS exported file systems, a system administrator may find it useful to configure multiple DSs on the same system, using separate file systems on this system for each of the multiple DSs. Each of these DSs must be accessed via separate IP addresses. This can be done via alias addresses assigned to the same network interface or via multiple network interfaces on the system. In other words, only one DS mount per IP address is allowed. (This configuration can also be useful for testing.) To do testing, you will need to use a FreeBSD-12 or newer system. Setting up a FreeBSD pNFS server using Plan B - I have only tested AUTH_SYS, although I do not know a reason why Kerberized mounts won't work. For Kerberized mounts, nfsuserd(8) must be used to map between uid/gid and owner/owner_group names. This can also be done for AUTH_SYS mounts, but I find it easier to set: vfs.nfs.enable_uidtostring=1 vfs.nfsd.enable_stringtouid=1 in /etc/sysctl.conf, so that the uid/gid numbers go on the wire as strings. (When you do this there is no reason to run the nfsuserd(8) daemon unless you are using it with the "-manage-gids" option.) *** Note that you must use one of the two above methods for mapping user/group names. If this mapping is not working correctly, the server will be badly broken, because it will not be able to set attributes on the data files on the DS(s). The default of "nobody" will not work correctly. - As with other NFS setups, all the servers and clients need to have common user/uid and group/gid databases. On the DS systems: (All commands need to be done by root/su.) The DSs are configured like a normal NFS server, with the following: - There needs to be an exported directory with empty directories in it with the names: ds0, ds1, ds2, ds3, ds4, ds5, ds6, ds7, ds8, ds9, ds10, ds11, ds12, ds13, ds14, ds15, ds16, ds17, ds18, ds19 (More subdirectories if vfs.nfsd.dsdirsize has been increased from the default of 20 on the MDS as above, with names ds20,...) This command done in each of the exported DS directories will create them: # jot -w ds 20 0 | xargs mkdir -m 700 (Replace 20 with the value of vfs.nfsd.dsdirsize if you have increased it.) - The exported directory must be mountable by the MDS via NFSv4.1/4.2 with the "-maproot=root" option on the /etc/exports line. It must also be exported to the clients, but the "-maproot=root" export option is not required. Assuming the exported directory is called "/DSstore", AUTH_SYS is being used and all clients on the 192.168.1 subnet, a typical /etc/exports on the DSs might be: /DSstore -sec=sys -maproot=root nfsv4-mds /DSstore -sec=sys -network 192.168.1.0 -mask 255.255.255.0 V4: /DSstore -network 192.168.1.0 -mask 255.255.255.0 (If multiple DSs are being configured on the system, there needs to be a separate exported directory with empty "dsN" directories in it for each DS. Each of these exported directories would normally be on separate DS file systems with separate export lines in /etc/exports.) - The two sysctls: vfs.nfsd.enable_nobodycheck vfs.nfsd.enable_nogroupcheck should both normally be set to 0 on the DS (and maybe the MDS as well). This allows files to be correctly created by unknown users, where the user/group name gets mapped to "nobody"/"nogroup". In other respects, the DSs are configured the same as a normal NFS server, with no "-p" or "-m" options on the nfsd daemon. - Since I choose to use uid/gid numbers in the strings and not run the nfsuserd, I set... In /etc/rc.conf I have: nfs_server_enable="YES" nfsv4_server_enable="YES" nfsv4_server_only="YES" nfs_server_flags="-t -n 32" - You also need to make sure that mountd_flags has the "-S" option, but that is normally the default. In /etc/sysctl.conf I have: vfs.nfsd.enable_stringtouid=1 vfs.nfs.enable_uidtostring=1 vfs.nfsd.enable_nobodycheck=0 vfs.nfsd.enable_nogroupcheck=0 Alternately, you can run the nfsuserd by adding this line to your /etc/rc.conf instead of the first two of the above lines in /etc/sysctl.conf. nfsuserd_enable="YES" On the MDS system: The MDS is set up like a normal NFS server, plus the following: - The MDS must have the data storage directory (/DSstore for example) of all the DS(s) mounted somewhere on the MDS file system (not within the exported subtree) using NFSv4.1/4.2 mounts, but without the "pnfs" option. For example, if there are four DS servers called nfsv4-data0, nfsv4-data1, nfsv4-data2 and nfsv4-data3 where each has a /DSstore storage file directory exported as above, the /etc/fstab lines might look like: nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 Note that, unlike a normal NFSv4 mount, the "soft,retrans=2" options can be used, so that the MDS will detect failures of a DS. These two options are only useful for pNFS servers that implement mirroring. Once the above DS server mounts are done on the MDS, do # nfsstat -m on the MDS and make sure the rsize, wsize are configured to the same size as # sysctl vfs.nfsd.srvmaxio is set to on the MDS and all DSs. For the pNFS server to work correctly, these must all be the same. For default FreeBSD13 or 14 system, vfs.nfsd.srvmaxio will equal 131072, however the rsize, wsize on the MDS will be 65536. To fix this on the MDS: Add an entry like vfs.maxbcachebuf=131072 to the /boot/loader.conf file. Reboot and check that "nfsstat -m" shows the mounts of the DSs with rsize=131027,wsize=131072. If the MDS generates messages w.r.t. increasing the value of kern.ipc.maxsockbuf, you should add a line to /etc/sysctl.conf on the MDS to do so. Then, the "-p" and optionally the "-m" options are added to the command line options for the nfsd. The "-p" option indicates that it is a pNFS service and lists the DSs. The "-m" option enables mirroring and defines how many DSs will store the data file for a file on the MDS. Assuming there are four DSs mounted as above and they will store files for all MDS file systems with a mirroring level of 2: - In /etc/rc.conf I have: rpcbind_enable="YES" mountd_enable="YES" nfs_server_enable="YES" nfsv4_server_enable="YES" nfs_server_flags="-u -t -n 32 -m 2 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3" - You also need to make sure that mountd_flags has the "-S" option. In /etc/sysctl.conf I have: vfs.nfsd.enable_stringtouid=1 vfs.nfs.enable_uidtostring=1 Alternately, you can run the nfsuserd by adding this line to your /etc/rc.conf instead of the above lines in /etc/sysctl.conf. nfsuserd_enable="YES" In /etc/fstab I have: nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 (Note that these mounts do not use the "pnfs" option and there is no need to have the nfscbd running on the MDS, since no callbacks from the DSs to MDS are done. Also note that the paths in the "-p" argument are the "mounted-on" paths from the above mounts and not the directories on the DSs.) If the MDS exports two file systems to clients called /export1 and /export2 and the DSs are to be assigned to specific MDS file systems, the nfs_server_flags line might be: nfs_server_flags="-u -t -n 32 -m 2 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2" so that files on /export1 are stored on nfsv4-data0 and nfsv4-data1, whereas files on /export2 will be stored on nfsv4-data2 and nfsv4-data3. A few notes: - When shutting down the MDS, the nfsd threads must be killed before the data store mounts can be dismounted. For example: # /etc/rc.d/nfsd stop # umount /data0 # umount /data1 # umount /data2 # umount /data3 (This seems to work ok when you do a "reboot" from multiuser mode, but I'm not sure if the scripts guarantee this?) - This should be done before the DS machines are shut down. - If DSs have failed, it may not be possible to stop the kernel nfsd threads. (They will still be seen on a "ps axHl" command.) If this happens, it may be necessary for the system administrator to use the pnfsdskill(8) command with the "-f" option to disable the DSs. Once that happens, the nfsd threads should terminate and that should allow the DSs to be umount(8)ed with the "-N" option. - If you are making any use of NFSv4 ACLs, these must be enabled on all the exported file systems (both MDS and DS). For UFS, this means that all must be mounted with the "nfsv4acls" option. - At times I believe there are some differences w.r.t. NFSv4 ACL semantics between UFS and ZFS, so I would avoid mixing the 2 file system types if NFSv4 ACLs are being used. - For reasonable performance, these tunables should be increased on the MDS by putting these lines in /boot/loader.conf. vfs.nfsd.fhhashsize="1000" - For a production server with a reasonable amount of memory, you might want to increase this to 10000. If the above has all worked, an NFSv4.1/4.2 mount with pNFS enabled should work, with the I/O being done directly to the DS. ("nfsstat -E -s" on the MDS should show few Read or Write operations happening, since they are being done directly on the DSs.) For a non-mirrored configuration, a fairly recent FreeBSD system should be sufficient for the clients. (FreeBSD-11 or FreeBSD-12) For a mirrored configuration, the FreeBSD clients will need to be FreeBSD-12 or newer. For the FreeBSD client: In the /etc/rc.conf file: nfscbd_enable="YES" - Then you should be able to do the mount... # mount -t nfs -o nfsv4,minorversion=2,pnfs nfsv4-mds:/export /mnt (minorversion=1 for FreeBSD-12.) (Assuming the MDS is called nfsv4-mds and "/export" is the exported file system on the MDS.) - If this works, you can put an entry in your /etc/fstab like: nfsv4-mds:/export /mnt nfs rw,nfsv4,minorversion=2,pnfs 0 0 If you are using a recent Linux client, the mount command looks about the same: # mount -t nfs -o nfsvers=4 nfsv4-mds:/export /mnt If you are using a mirrored pNFS configuration and you want pNFS to work, you will probably want a Linux 4.17-rc2 or later kernel. Kernels prior to 4.12 only handle Flexible File Layouts for NFSv3 DS servers. As such, they will fall back to doing all I/O through the MDS. A 4.12 kernel works, but I saw Linux client crashes. I don't see crashes with a 4.17-rc2 kernel, but I haven't tried Linux kernels in between these two versions. Also, the Flexible File Layout driver in the Linux client in a 4.17-rc2 kernel does not handle a "tightly coupled" server correctly and uses the synthetic user/group in the AUTH_SYS credentials instead of the ones for the user doing the I/O. There are two ways to deal with this: 1 - Run a Linux system with a patched Flexible File Layout driver. I do not know the exact Linux kernel version that acquired the fix, but recent (maybe all) 5.n kernels are fixed. OR 2 - Set the sysctl vfs.nfsd.flexlinuxhack=1, so that the layouts will be issued with synthetic user/group of "0". This works around the problem, so long as you export the file systems on the DSs to the clients with "-maproot=root". If this works, you should be ready for testing. You can monitor how it is working in a few ways. - You can enable logging in the server by: # sysctl vfs.nfsd.debuglevel=4 or on the client by: # sysctl vfs.nfs.debuglevel=4 - You can capture packets and look at them in wireshark. To do this on the server: # tcpdump -s 0 -w run.pcap host - Then look at run.pcap in wireshark, which understands NFSv4. - You can get some basic information from nfsstat. On the server: # nfsstat -E -s On the client: # nfsstat -E -c The command pnfsdsfile(8) can be used on files in the exported file system on the MDS to find out where the data storage for a file resides and to fix up the pnfsd.dsfile extended attribute, if needed. - The MDS file should look normal, except for being empty. If this file happens to be abc.c and the pNFS file service is mirrored, the command: # pnfsdsfile abc.c abc.c: nfsv4-data2 ds5/207508569ff983350c000000a9730200eec58e800000000000000000 nfsv4-data3 ds5/207508569ff983350c000000a9730200eec58e800000000000000000 shows that the data files for abc.c are on nfsv4-data2 and nfsv4-data3 in subdirectory "ds5" with the file name "2075...". - The DS file will have a 56byte hexidecimal name and should be the size, ownership, mode and ACL of the file. The contents of this file is the file's data. - If the pnfsd.dsattr extended attribute somehow gets corrupted, you can remove it on the MDS file and the MDS server will recreate it, once the file attributes are changed in any way. (It caches attributes for the data file.) For example (for a file called abc.c): - Done on the exported file system on the MDS. # rmextattr system pnfsd.dsattr abc.c Then it will be recreated when the file is accessed via a client NFS mount. Backup up the pNFS store: The easy way to back this up is to archive from an NFS client mount of it, since the files look "normal" with data as well as attributes. If, for example, you did this with "tar", you could recover the archive anywhere. It does not need to be recovered onto a pNFS store. If the MDS tree is being archived on the MDS, the system extended attributes must be saved/restored. (I'll admit I don't know how to do this at this time.) Handling of failed mirrored DSs When a mirrored DS fails, it can be disabled one or three ways: 1 - The MDS detects a problem when trying to do proxy operations on the DS. This is why the DS servers are mounted on the MDS with the "soft,retrans=2" options. This can take a couple of minutes after the DS failure or network partitioning occurs. 2 - A pNFS client can report an I/O error on the DS to the MDS in the arguments for a LayoutReturn operation. 3 - The system administrator can perform the pnfsdskill(8) command on the MDS to disable it. If the system administrator does a pnfsdskill(8) and it fails with ENXIO (Device not configured) that normally means the DS was already disabled via #1 or #2. Since doing this is harmless, once a system administrator knows that there is a problem with a mirrored DS, doing the command is recommended. As such, once a system administrator knows that a mirrored DS has malfunctioned or has been network partitioned, they should do the following as root/su on the MDS: # pnfsdskill - If this fails with ENXIO (Device not configured), it normally isn't a problem and simply indicates that the DS has already been disabled. # umount -N - Note that the must be the exact mounted-on path string used when the DS was mounted on the MDS. For example, if the DS was mounted on the MDS with: # mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3 the above commands would be: # pnfsdskill /data3 # umount -N /data3 Once the mirrored DS has been disabled, the pNFS service should continue to function, but file updates will only happen on the DS(s) that have not been disabled. Assuming two way mirroring, that implies the one DS of the pair stored in pnfsd.dsfile for the file on the MDS. The next step is to clear the IP address in the pnfsd.dsfile extended attribute on all files on the MDS for the failed DS. This is done so that the recovered DS won't be used for these files when the repaired DS is brought back online. The command that clears the IP address is pnfsdsfile(8) with the "-r" option. For example: # pnfsdsfile yyy.c yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 nfsv4-data3.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 shows that this file has data files stored on nfsv4-data2 and nfsv4-data3. After nfsv4-data3 has been disabled as above, only nfsv4-data2 will be used. However, if nfsv4-data3 was brought back online without fixing this, the client(s) and MDS could access an out-of-date data file on nfsv4-data3. The command: # pnfsdsfile -r nfsv4-data3 yyy.c yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 replaces nfsv4-data3 with the IPv4 address 0.0.0.0, so that nfsv4-data3 will not get used. Normally this will be called within a find(1) command for all regular files in the exported directory tree and must be done on the MDS. When used with find(1), you will probably also want the "-q" option so that it won't spit out the results for every file. If the disabled/recovered DS is nfsv4-data3, the commands done on the MDS would be: # cd # find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \; There is a problem with the above command if the file found by find(1) is renamed or unlinked before the pnfsdsfile(8) command is done on it. This should normally generate an error message. A simple unlink is harmless but a link/unlink or rename might result in the file not having been processed under its new name. To check that all files have their IP addresses set to 0.0.0.0 these commands can be used (assuming the "sh" shell): # cd # find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d" Any line(s) printed require the pnfsdsfile(8) with "-r" to be done again. (In theory, the file could be renamed during the first command such that it gets missed and renamed again during the second command such that it gets missed again. However, I think this is highly unlikely. You can run the second command repeatedly if you feel it necessary. The only way to absolutely guarantee success is to shut down the pNFS service during the recovery, but since it may take minutes to hours, that probably isn't feasible.) Once this is done, the replaced/repaired DS can be brought back online. It should have empty dsNN directories under the exported storage directory, just like it did when first set up. Mount it on the MDS exactly as you did before disabling it. For the nfsv4-data3 example, the command would be: # mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3 Then restart the nfsd to re-enable the DS. # /etc/rc.d/nfsd restart Now, new files can be stored on nfsv4-data3, but files with the IP address zeroed out will not yet use the repaired DS (nfsv4-data3). The next step is to go through the exported file tree and, for each of the files with an IPv4 address of 0.0.0.0, copy the file data to the repaired DS and re-enable use of this mirror for it. The command for copying the file data for one MDS file is pnfsdscopymr(8) and it will also normally be used in a find(1). This will take a while, since the kernel function performing this has to perform several steps, as follows: - The MDS file's vnode is locked, blocking LayoutGet operations. - Disable issuing of read/write layouts for the file via the nfsdontlist, so that they will be disabled after the MDS file's vnode is unlocked. - Set up the nfsrv_recalllist so that recall of read/write layouts can be done. - Unlock the MDS file's vnode, so that the client(s) can perform proxied writes, LayoutCommits and LayoutReturns for the file when completing the LayoutReturn requested by the LayoutRecall callback. - Issue a LayoutRecall callback for all read/write layouts and wait for them to be returned. (If the LayoutRecall callback replies NFSERR_NOMATCHLAYOUT, they are gone and no LayoutReturn is needed.) - Exclusively lock the MDS file's vnode. This ensures that no proxied writes are in progress or can occur during the DS file copy. It also blocks Setattr operations. - Create the file on the recovered mirror. - Copy the file from the operational DS. - Copy any ACL from the MDS file to the new DS file. - Set the modify time of the new DS file to that of the MDS file. - Update the extended attribute for the MDS file. - Enable issuing of rw layouts by deleting the nfsdontlist entry. - Unlock the MDS file's vnode allowing operations to continue normally, since it is now on the mirror again. and this is done for every regular file in the directory tree. For the example case, the commands on the MDS would be: # cd # find . -type f -exec pnfsdscopymr -r /data3 {} \; When this completes, the recovery should be complete or at least nearly so. As noted above, if a link/unlink or rename occurs on a file name while the above find(1) is in progress, it may not get copied. To check for any file(s) not yet copied, the commands are: # cd # find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d" If this command prints out any file name(s), these files must have the pnfsdscopymr(8) command done on them again to complete the recovery. (The above command is looking for any file that has 0.0.0.0 as a DS IP address.) # pnfsdscopymr -r /data3 If there are any errors printed out, these files need the command redone on them. If repeated attempts fail with "pnfsdscopymr: Copymr failed for file : Device not configured" it may mean that there is a Read/Write layout for this file that has not been returned. All that can be done to fix this is to restart the nfsd, once you are convinced that the file is no longer being written by any client. All of these commands are designed to be done while the pNFS service is running. The pnfsdscopymr(8) command can be safely re-run on files, as it recognizes cases where the file does not need to be copied. Switching from a non-mirrored to mirrored DS configuration: Once the nfsd is restarted on the MDS with the "-m 2" option, mirroring of newly created files will be done. To create mirrors for old files, the following commands can be used on the MDS: # cd # find . -type f -exec pnfsdscopymr {} \; - Which will mirror each unmirrored file on one of the other DSs. If you wish to mirror file(s) to a specific DS, you can use pnfsdsfile(8) with the "-m" option to add "0.0.0.0" entrie(s) and then use pnfsdscopymr(8) with the "-r" option as above for the recovery of a repaired DS. Migrating a data file to a different DS: Lets assume the system administrator wished to move the data file for "xxx.c" from nfsv4-data2 to nfsv4-data3 in our example. The command on the MDS is: # pnfsdscopymr -m /data2 /data3 xxx.c pnfsdscopymr(8) just does some sanity checking and one system call to move the data file. A load or storage balancer could easily do the same system call. I don't plan on implementing such a tool, but hopefully others will someday. Please let me know how any testing goes, rick