Random Musings

O for a muse of fire, that would ascend the brightest heaven of invention!


Using Podman hooks to mount persistent ZFS datasets into ephemeral Containers

Friday, 27 Jun 2025 Tags: containersfreebsdhooksocipodmanzfs

Podman containers can mount jailed ZFS datasets, allowing you to use ZFS features like snapshots and replication, with your containerised toys.

I use this setup to have an ephemeral container for a database, where the data is stored on a separate persistent ZFS dataset. This allows me to upgrade the container image without losing the data, and to take snapshots as required for backups, managed via the host system.

For extra points, I will use a child dataset for the database’s materialised views, as they do not need to be backed up, which will also be automatically mounted into the container, when its parent dataset is mounted.

We will use Podman annotations to provide the name of the ZFS dataset to be used.

Build the base container

First up, we will need a base container image that has ZFS support.

# Containerfile
FROM ghcr.io/freebsd/freebsd-runtime:14.3@sha256:3a5ffe995405b5f6300797b38d87328a267bbeeb550d3707c9c5e0a76827a978
RUN /bin/pkg upgrade -yr FreeBSD-base
RUN /bin/pkg install -yr FreeBSD-base FreeBSD-utilities FreeBSD-zfs

As usual, we run as root, to build the container image, with podman. We use the host system’s pkg tool to bootstrap the container image quickly, and as mine runs FreeBSD 15.0-CURRENT, I need to override the OS version and ABI to match the desired base image.

# podman build \
  --volume /usr/local/sbin/pkg:/bin/pkg \
  --env IGNORE_OSVERSION=yes \
  --env ABI=FreeBSD:14:$(sysctl -n hw.machine_arch) \
  --env OSVERSION=1403000 \
  --no-hosts \
  --tag freebsd-zfs:14.3 \
  --file Containerfile
...
[6526b2f7f6db] [8/8] Extracting FreeBSD-zfs-14.3: .......... done
COMMIT freebsd-zfs:14.3
--> 14ac6abadcab
Successfully tagged localhost/freebsd-zfs:14.3
14ac6abadcab1f84fe3b549faa31c1bb7778feb53854756a0dc71554f049c507

# podman image ls -a
REPOSITORY                       TAG         IMAGE ID      CREATED      SIZE
localhost/freebsd-zfs            14.3        0b937b31f872  2 hours ago  142 MB
ghcr.io/freebsd/freebsd-runtime  <none>      786f2592a8e1  3 weeks ago  33 MB
...

Prepare Podman Hooks

Podman allows you to run hooks at various points in the container lifecycle. We will use a createRuntime hook to mount the ZFS dataset. This runs in the host system, after the container is created, but before it is started. This means there will be no processes or daemons running in the jail yet.

  • make sure you have the container hookd directory set up in your Podman containers.conf[engine] section:
# /usr/local/etc/containers/containers.conf
[engine]
hooks_dir = "/usr/local/etc/containers/hooks.d"
  • now add the hook metadata, as /usr/local/etc/containers/hooks.d/zfs.json:
{
  "version": "1.0.0",
  "hook": {
    "path": "/usr/local/etc/containers/hooks.d/zfs.sh"
  },
  "when": {
    "annotations": {
      "^zfs.dataset$": ".+"
    }
  },
  "stages": ["createRuntime", "poststop"]
}

The when section specifies that this hook should run only when a specific annotation is present, in this case zfs.dataset. This allows the script to be skipped for containers that do not require a ZFS dataset, and simplifies the script logic a bit.

  • finally, the createRuntime script itself, as /usr/local/etc/containers/hooks.d/zfs.sh:
#!/bin/sh -e
set -o pipefail
INPUT=$(  cat - | tee -a /var/log/oci/zfs.json)
ID=$(     echo $INPUT | jq -r .id || exit 1)
STATUS=$( echo $INPUT | jq -r .status || exit 1)
DATASET=$(echo $INPUT | jq -r '.annotations."zfs.dataset"')

# if the dataset doesn't actually exist, bail
/sbin/zfs list -Ho name ${DATASET} || exit 1

# if we are in created state, we can proceed
if [ "$STATUS" == "created" ]; then
  /usr/sbin/jail -vm name=$ID allow.mount=1 allow.mount.zfs=1 allow.mount enforce_statfs=1
  /sbin/zfs jail $ID ${DATASET}
  /usr/sbin/jexec $ID /sbin/zfs mount -a
fi
  • mark the script executable:
# chmod +x /usr/local/etc/containers/hooks.d/*.sh

The script is not overly complicate. It is provided with a JSON blob via stdin, which points us to the container ID:

{
  "annotations": {
    "io.container.manager": "libpod",
    "io.podman.annotations.autoremove": "TRUE",
    "org.freebsd.jail.vnet": "new",
    "org.opencontainers.image.stopSignal": "15",
    "org.systemd.property.KillSignal": "15",
    "org.systemd.property.TimeoutStopUSec": "uint64 10000000",
    "zfs_dataset": "zroot/jailed/couchdb3"
  },
  "bundle": "/var/db/containers/storage/zfs-containers/906f163312500d15597e8b6accd0cd8637bb792c750fbc15a8012f231d09a224/userdata",
  "id": "906f163312500d15597e8b6accd0cd8637bb792c750fbc15a8012f231d09a224",
  "ociVersion": "1.0.2",
  "pid": 29880,
  "status": "created"
}

The script reads the JSON blob from stdin, extracts the container ID and and status from it, and proceeds only if the status is created.

We then use the jail command to ensure the required jail permissions are set, then jail the dataset itself into the container, and finally within the jail, mount all ZFS datasets.

This last command is equivalent to running service zfs start in the container, but we do it manually here, as we are not running rc services at all in this container. This could equally be done in a startContainer hook, which would also run in the jail before the container started, but it feels simpler to do it here all in one zfs-y spot.

Finally, we return to Podman, so it can continue with the container creation process, and possibly run other hooks.

There is more information about Podman hooks in the oci-hooks(5) docs, as well as a more gentle introduction to hooks from Red Hat. A list of hooks and when they run can be found in the [OCI runtime config spec].

Create the ZFS dataset

Now we create the ZFS dataset that we will mount into the container. We will use the jailed=on property, so that the dataset is only available to jails, and we will set the mountpoint to a location inside the jail,

# zfs create -o jailed=on -o mountpoint=/var/db/couchdb3 zroot/jailed/couchdb
# zfs create zroot/jailed/couchdb3/views
# zfs list -o canmount,mounted,jailed,name -r zroot/jailed/couchdb3
CANMOUNT  MOUNTED  JAILED  NAME
on        no       on      zroot/jailed/couchdb3
on        no       on      zroot/jailed/couchdb3/views

Run the container

Note how the annotation is used to pass in the ZFS dataset to our hook:

# podman --log-level=debug run -it --rm \
  --annotation='zfs_dataset=zroot/jailed/couchdb3' \
  freebsd-zfs:14.3
...
DEBU[0000] reading hooks from /usr/local/etc/containers/hooks.d
DEBU[0000] added hook /usr/local/etc/containers/hooks.d/zfs.json
DEBU[0000] hook zfs.json matched; adding to stages [createRuntime]
...
DEBU[0001] Starting container f903a1c776627070d8cc95ea89bac14df39781bfae172869ce887912341c1ccb with command [/bin/sh]
DEBU[0001] Started container f903a1c776627070d8cc95ea89bac14df39781bfae172869ce887912341c1ccb
DEBU[0001] Notify sent successfully
# mount
zroot/containers/2c4218bc205c6bbc395a4618a7d6ed9f39696dc155e68eee87c592be621a974a on / (zfs, local, nfsv4acls)
zroot/jailed/couchdb3 on /var/db/couchdb3 (zfs, local, nfsv4acls)
zroot/jailed/couchdb3/views on /var/db/couchdb3/views (zfs, local, nfsv4acls)
# zfs list -o canmount,mounted,jailed,mountpoint,name
CANMOUNT  MOUNTED  JAILED  MOUNTPOINT              NAME
on        no       off     none                    zroot
off       no       off     none                    zroot/jailed
on        yes      on      /var/db/couchdb3        zroot/jailed/couchdb3
on        yes      on      /var/db/couchdb3/views  zroot/jailed/couchdb3/views
#

This is just half the story, as we need to un-attach the ZFS dataset when the container is stopped. We can do this with a post-stop hook.

Cleaning up on Exit

You will notice that our zfs.json hook metadata has a poststop stage as well, and that our zfs.sh script only covers the created status.

As before, there will be a JSON blob provided to the script, but this time its status will be stopped:


{
  "ociVersion": "1.2.0",
  "id": "906f163312500d15597e8b6accd0cd8637bb792c750fbc15a8012f231d09a224",
  "status": "stopped",
  "bundle": "/var/db/containers/storage/zfs-containers/906f163312500d15597e8b6accd0cd8637bb792c750fbc15a8012f231d09a224/userdata",
  "annotations": {
    "io.podman.annotations.autoremove": "TRUE",
    "zfs_dataset": "zroot/jailed/couchdb3"
  }
}

Let’s add a poststop section to the script, so that it can unmount the dataset when the container is stopped, ready for the next run:

... append to /usr/local/etc/containers/hooks.d/zfs.sh before the final echo
# if status = stopped, loop over mounted datasets
if [ "$STATUS" == "stopped" ]; then
  for ds in $(/sbin/zfs list -Ho name -r ${DATASET}); do
    /sbin/umount -f ${ds}
  done
fi

That’s all for now! You should be able to run & re-launch the container, each time with the ZFS dataset mounted, and unmounted when the container exits.

Thanks

Again dfr@FreeBSD.org who did the podman porting work, told me about hooks, and provided several rounds of feedback that helped me get this right.