Random Musings

O for a muse of fire, that would ascend the brightest heaven of invention!


Using cloud-init on FreeBSD, in VMs, and Jails

Thursday, 25 Jul 2024 Tags: cloudinitfreebsd

The Canonical project, cloudinit, has spread wide & far, becoming the de-facto runtime config option for first-run deployment modification.

It is python-based, which makes it awkward to use at first-run, when python itself may need to be updated, and it has a long list of open issues! There is also an active IRC community on libera.chat.

This doc is the missing 20% that helps you get started on FreeBSD with cloud-init, where things go, how to test, and how to debug.

We’ll start off with a real cloudinit system (bring your own cloud), and look through the various config files and directories, and then circle back to testing this in a local jail.

Daemons

The daemons are run at boot time, in this order:

order name actual command being run phase
1 cloudinitlocal cloud-init init --local disks, net
2 cloudinit cloud-init init core
3 cloudconfig cloud-init modules --mode config extensions
4 cloudfinal cloud-init --mode final packages

Files and Folders

Within a cloudinit provisioned system, there are a few important files and locations, linux flavoured:

  • /usr/local/etc/cloud/ is installed by the package and may have vendor-specific customisations in the cloud.cfg.d directory
  • /var/lib/cloud/ is created and has most of the ephemeral data
  • /run/ is created and has most of the logs and fetched configs

From the pkg

# fd -tf .  /usr/local/etc/cloud/
/usr/local/etc/cloud/cloud.cfg
/usr/local/etc/cloud/cloud.cfg.d/05_logging.cfg
/usr/local/etc/cloud/cloud.cfg.d/99_freebsd.cfg
/usr/local/etc/cloud/cloud.cfg.d/README
/usr/local/etc/cloud/cloud.cfg.sample
/usr/local/etc/cloud/templates/...

Of particular interest are /usr/local/etc/cloud/cloud.cfg which specifies what modules of cloudinit are installed and available, and what datasources are available to fetch data from.

# /usr/local/etc/cloud/cloud.cfg snippet
...
# The modules that run in the 'init' stage
cloud_init_modules:
  - seed_random
  - bootcmd
    ...
# The modules that run in the 'config' stage
cloud_config_modules:
  - ssh_import_id
    ...
  - runcmd

# The modules that run in the 'final' stage
cloud_final_modules:
  - package_update_upgrade_install
  - write_files_deferred
    ...
  - scripts_user
    ...

If the modules aren’t listed under one of the _modules sections, they won’t be run, even if the functionality may work!

If your datasources aren’t present, then the userdata won’t be fetched, even if its being provided by the vendor system!

After first run

The following files are only available at runtime, after cloudinit has run. Most are self-explanatory, but the result.json and status.json are particularly useful for debugging.

Files under ...datasource-... are specific to the datasource used, in this example, they were sourced from a fake NoCloud datasource. This would typically be populated by the vendor’s metadata server, with and user data merged in already.

# fd -tf . /var/lib/cloud/
/var/lib/cloud/data/instance-id
/var/lib/cloud/data/previous-datasource
/var/lib/cloud/data/previous-instance-id
/var/lib/cloud/data/python-version
/var/lib/cloud/data/result.json
/var/lib/cloud/data/set-hostname
/var/lib/cloud/data/status.json
/var/lib/cloud/instances/nocloud/boot-finished
/var/lib/cloud/instances/nocloud/cloud-config.txt
/var/lib/cloud/instances/nocloud/datasource
/var/lib/cloud/instances/nocloud/obj.pkl
/var/lib/cloud/instances/nocloud/scripts/runcmd
/var/lib/cloud/instances/nocloud/sem/config_install_hotplug
/var/lib/cloud/instances/nocloud/sem/config_keys_to_console
/var/lib/cloud/instances/nocloud/sem/config_locale
/var/lib/cloud/instances/nocloud/sem/config_package_update_upgrade_install
/var/lib/cloud/instances/nocloud/sem/config_reset_rmc
/var/lib/cloud/instances/nocloud/sem/config_runcmd
/var/lib/cloud/instances/nocloud/sem/config_scripts_per_instance
/var/lib/cloud/instances/nocloud/sem/config_scripts_user
/var/lib/cloud/instances/nocloud/sem/config_scripts_vendor
/var/lib/cloud/instances/nocloud/sem/config_seed_random
/var/lib/cloud/instances/nocloud/sem/config_set_hostname
/var/lib/cloud/instances/nocloud/sem/config_set_passwords
/var/lib/cloud/instances/nocloud/sem/config_ssh
/var/lib/cloud/instances/nocloud/sem/config_ssh_authkey_fingerprints
/var/lib/cloud/instances/nocloud/sem/config_users_groups
/var/lib/cloud/instances/nocloud/sem/config_write_files
/var/lib/cloud/instances/nocloud/sem/config_write_files_deferred
/var/lib/cloud/instances/nocloud/sem/consume_data
/var/lib/cloud/instances/nocloud/sem/update_sources
/var/lib/cloud/instances/nocloud/user-data.txt
/var/lib/cloud/instances/nocloud/user-data.txt.i
/var/lib/cloud/instances/nocloud/vendor-data.txt
/var/lib/cloud/instances/nocloud/vendor-data.txt.i
/var/lib/cloud/instances/nocloud/vendor-data2.txt
/var/lib/cloud/instances/nocloud/vendor-data2.txt.i
/var/lib/cloud/sem/config_scripts_per_once.once

Depending on the exact version of cloudinit, these files might be in /run/cloud-init/ instead of /var/run/cloud-init. The linuxisms are slowly being eradicated.

 # fd -tf . /var/run/cloud-init/
/var/run/cloud-init/cloud-id-none
/var/run/cloud-init/cloud.cfg
/var/run/cloud-init/combined-cloud-config.json
/var/run/cloud-init/ds-identify.log
/var/run/cloud-init/instance-data-sensitive.json
/var/run/cloud-init/instance-data.json

Deploying

cloudinit suffers from almost infinite configurability. I’ll assume that in reality, you’re creating a VM or physical server using a vendor tool that accepts the user-data yaml format.

Here’s an example provisioning a FreeBSD 14.1-RELEASE server via the Equinix command line tool:

$ metal device create \
      --operating-system freebsd_14 \
      --plan m3.small.x86 \
      --metro any \
      --hostname clown(random 0 9)(random 0 9) \
      --termination-time=(date -Iseconds -juv +1H) \
      --userdata '#cloud-config ... '

On Amazon EC2, use this syntax:

$ aws ec2 run-instances ... --user-data "#cloud-config ..."
$ aws ec2 run-instances ... --user-data file://my.yaml

Working Config

This exercises most of working cloud-init functionality as of 2024Q4 using cloud-init 24.1.4 from FreeBSD ports quarterly.

There’s other cloudinit functionality but in my experience the rest is either broken or unreliable. It’s FLOSS so feel free to file bugs and fix them, the cloudinit project is both BSD-friendly and very helpful.

  • create a user called ansible
  • and add it to a single group wheel
  • create a homedir under /home/ansible
  • add a sudo config for that user
  • deploy ssh keys to primary user
  • create a custom file using write_files
  • run an arbitrary command via runcmd
  • install a package
#cloud-config
# deploy ssh key to primary user
# create a new account, one true shell, sudo, join wheel
users:
  - default
  - name: ansible
    groups: wheel
    shell: /bin/sh
    sudo: 'ALL=(ALL) NOPASSWD:ALL'
    ssh_authorized_keys:
      - ssh-ed25519 AAAAC3Nzadaves_secret_backdoor_9a567b567f9ace
# run an arbitrary command very early on
bootcmd:
  - echo bootcmd | tee -a /root/cloud/cloudinit_was_here
# touch arbitrary files very early on, note YAML list for multiple files
write_files:
- content: |
    writefiles
  path: /root/cloud/cloudinit_was_here
  append: true
- content: |
    writefiles
  path: /root/cloud/writefiles_was_here
  append: true
# run an arbitrary command later on
runcmd:
  - echo runcmd | tee -a /root/cloud/cloudinit_was_here
packages:
  - www/gurl

Failing functionality includes at least:

  • using a custom homedir key
  • adding multiple groups using the groups key, a single group does work

Testing in a jail

Cloudinit expects a metadata server to provide vendor, server, and user metadata. These need to be faked using the NoCloud data source. You can put these anywhere, or serve them over HTTP on 169.245.169.254.

Install Snakes

  • create a jail, with network access, in the usual way
# pkg install -qy net/cloud-init
# sysrc cloudinit_enable=YES

This installs a pile of pythonic snakes, and 4 daemons.

Set up the Data Source

Inform cloudinit of the new data source, and disable fetching from network as otherwise, this takes a while to time out:

# printf 'datasource_list: ["NoCloud","None"]
datasource:
  NoCloud:
  seedfrom: file:///root/cloud/
network:
  config: disabled
  timeout: 1
'  > /usr/local/etc/cloud/cloud.cfg.d/00_nocloud.cfg

And populate it with a minimal cloud.cfg. These files can go anywhere, so long as it matches the seedfrom path above. The datasource_listmust be on a single line, and as a quoted list, or everything will break.

# mkdir -p /root/cloud
# cd /root/cloud
# touch meta-data
# printf '#cloud-config\nbootcmd:\n  - touch /root/cloud/hello\n' > user-data

I recommend doing a zfs snapshot of your jail at this point, to roll back easily while testing and re-testing cloudinit.

Validate the user-data schema

cloudinit allows validating the schema. This should also tell you if any keys are present in your userdata file but not enabled or available in the current cloudinit installation.

# cloud-init schema --annotate -c user-data
Valid schema user-data

Cleaning up previous runs

cloudinit does provide a clean function, but it’s not extensive enough. Use the axe wisely. This won’t undo any work that cloudinit performed, like adding users and groups, of course.

# rm -rf /run/cloud-init /var/*/cloud*

After this you can just run service cloudinit start again and again without restarting your jail.

Phases

Modules run in the order as defined in /usr/local/etc/cloud/cloud.cfgsrc:cloud.cfg

Run the local phase

This is the first phase of the daemon scripts, run manually. Typically this is used for early stage manipulation of filesystems, and bringing up the network, so that cloudinit can do further configurations and fetch additional data source providers.

This may run dhcp and similar scripts, except in our specific case, these were already disabled in 00_nocloud.cfg earlier, via network: disabled.

# cloud-init --debug init --local

Run the main phase

This typically does what you’d expect now. Things happened, and you can finally see what your supplied user-data was merged as, with cloud-init query -a.

# cloud-init --debug init

Using the earlier user-data example above, we see that:

  • users are created, and groups have been updated
  • write_files have run
  • but bootcmd, runcmd, and packages have not

Run module config

I haven’t found anything that uses this stage yet, let me know if you find one.

# cloud-init  --debug modules --mode config

Run final module stage

Extensions such as OS-specific package installs run at this stage.

#  cloud-init modules --mode final

If all is as you expect, clean all the runtime directories already mentioned, and “reboot” the jail from scratch.

Debugging

Various helpful functions, once cloudinit has successfully run.

# cloud-init query userdata
... prints out the userdata file that it received from server

# cloud-init analyze show
... prints out the duration of each step and final state
-- Boot Record 01 --
The total time elapsed since completing an event is printed after the "@" character.
The time the event takes is printed after the "+" character.

Starting stage: init-local
|`->no cache found @00.00600s +00.00000s
|`->no local data found from DataSourceEc2Local @00.02800s +00.00600s
Finished stage: (init-local) 00.03700 seconds

Starting stage: init-network
|`->no cache found @03.44200s +00.00000s
|`->no network data found from DataSourceEc2 @03.44600s +126.15600s
Finished stage: (init-network) 126.17500 seconds

Starting stage: init-network
|`->no cache found @132.35100s +00.00000s
|`->found network data from DataSourceEc2 @132.35500s +02.71100s
|`->setting up datasource @135.08500s +00.00000s
|`->reading and applying user-data @135.09300s +00.00400s
|`->reading and applying vendor-data @135.09700s +00.00000s
|`->reading and applying vendor-data2 @135.09700s +00.00000s
|`->activating datasource @135.11500s +00.00000s
|`->config-migrator ran successfully @135.12100s +00.00100s
|`->config-ssh ran successfully @135.12200s +00.12300s
Finished stage: (init-network) 02.90200 seconds

Starting stage: modules-final
|`->config-phone-home ran successfully @136.95300s +00.09400s
|`->config-scripts-user ran successfully @137.04700s +00.00000s
|`->config-ssh-authkey-fingerprints ran successfully @137.04800s +00.00000s
|`->config-keys-to-console ran successfully @137.04800s +00.01900s
|`->config-final-message ran successfully @137.06700s +00.00400s
Finished stage: (modules-final) 00.12900 seconds

Total Time: 129.24300 seconds

1 boot records analyzed