A RAM-disk based workflow

Friday, 5 Sep 2014 Tags: hacks shell workflow

It’s easy to pick up a laptop today with 16 or more GB of memory. Or spin up a cloud instance with as much as you need. ramdisks are scary fast, and as most cloud instances have poor IO, it’s a great way of getting high performance servers without striping multiple disks. As an added benefit, if your cloud provider offers sub-hour pricing, as GCE does, you’ll even save cash as well as time, by finishing well before an equivalent disk workload, despite using a very fast instance.

I use ramdisks almost exclusively now, with a couple of quick tricks to avoid data loss. This also has the benefit of keeping the load on my disk or SSD to an absolute minimum.

The general idea is that for each project, I typically have some virtual machines for testing stuff out, and a git repo that contains source.

What I want is that all git commits are stored on permanent storage, and that the VM contains a fully reproducible environment for both build and deployment.

By using a ramdisk, I can be pretty comfortable that the environment is reproducible, as I need to rebuild it each reboot, which is roughly weekly for me.

Let’s break this down into 3 parts:

create a ramdisk
link our git repo into it
spin up a ramdisk-based vm to work in

Creating and deleting a ramdisk

These two zsh shell functions are pretty straightforwards. You could easily do this in bash with minor changes.

calculate ramdisk size from 1st parameter and the sectors/GB conversion factor
ensure we have no existing ramdisk
use hdiutil attach to create a new ramdisk device in /dev/disk* somewhere
format and mount it implicitly using diskutil erasevolume

ramdisk() {size=$(($1 * 2097152))
    diskutil eject /Volumes/ramdisk > /dev/null 2>&1
    diskutil erasevolume HFS+ 'ramdisk' `hdiutil attach -nomount ram://$size`
    cd /ramdisk }
rdestroy() {hdiutil eject /ramdisk}

The second function is straightforwards. When the ramdisk is ejected, the corresponding /dev/disk* device and allocated RAM is also freed.

Pulling in the git repo

The key point here is that the git repo we are working from will have a permanent copy of any commits on disk, and we’ll be working from a ramdisk copy all the time. This uses a neat git trick called git-new-workdir that I learned from Markus Prinz.

git-new-workdir /projects/couch/git /ramdisk/couch 1.6.x

How git-new-workdir works is ludicrously simple, it creates the .git/ dir using softlinks to the original data, so any commits, stashes, config or branch changes we make will get written to permanent storage, and uses the current directory (which in our case is a ramdisk) to store the working tree. So all we need to do to ensure our changes are stored permanently is to commit them, or push a branch. There’s no extra commands nor things to remember.

The first parameter is the on-disk location of the original repo we are using, the second is the new location we want to set up, and the optional third one is the branch we want to check out into our new ramdisk backed working dir.

I’ve got this aliased as gnw as I use it all the time.

Spinning up a VM

The same thing applies here with vagrant. It’s as simple as softlinking the Vagrantfile I am using into the ramdisk, assuming my working dir is in the ramdisk already:

ln -s /projects/couch/Vagrantfile

Then vagrant up as usual and Bob’s your uncle. As the image is already stored in ~/.vagrant.d/boxes/ we get a nice repeatable image for free. Finally, as part of my workflow, I have a provisioner built in for vagrant already, using ansible that ensures whether I run a local instance or a cloud server, the post-installation setup is identical and idempotent:

# -*- mode: ruby -*-
# vi: set ft=ruby :

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "precise64"
  config.vm.box_url = "http://files.vagrantup.com/precise64_vmware_fusion.box"
  config.vm.network  :public_network, ip: "10.0.0.199"

  config.vm.provider :vmware_fusion do |v|
    v.vmx["memsize"] = "1024"
  end

  config.vm.provision "ansible" do |ansible|
    ansible.host_key_checking = false
    ansible.playbook = "/projects/couch/deploy.yml"
  end
end

Bonus Hacks

Using ZFS on OSX

ZFS is an advanced filesystem supporting snapshots, inbuilt lz4 compression, automatic checksumming to prevent and detect bitrot, and many more features.

It was developed by Sun Microsystems, and luckily was open sourced before the oracle buyout. It’s now available on linux, OSX, FreeBSD, and many more variants of Solaris such as Illumos or SmartOS.

While I’m not worried about bitrot, the compression and snapshot based replication make ramdisks even better. Compression means that on my 16GB OSX laptop, I can comfortably run an entire 1GB RAM Windows 7 VM (20GB disk) in a 10GB ramdisk, and still have a reasonably functional OSX environment. On a larger 32GB FreeBSD server it’s scary fast. I can keep the original VM image safely snapshotted on my main disk, and replicate it into the ramdisk at almost raw disk throughput. With an SSD this is under 5 seconds to copy, and launch a fully encapsulated VM in the ramdisk.

zsh functions

We need just 2 function again, one to create a zdisk as I named it, and another to destroy it:

zdisk() {size=$(($1 * 2097152))
         sudo zpool create -O compression=lz4 -fm /zram zram \
         `hdiutil attach -nomount ram://$size`
         sudo chown -R $USER /zram
         cd /zram}
zdestroy() {sudo zpool export -f zram}

Vagrant and ZFS

I store a gold image of all my projects in a zfs dataset, along with the provisioning script that sets that image up from a base OS. The base OS images themselves either come from the cloud provider (EC2 or GCE for example) or from a reference vagrant box. My entire vagrant setup is also stored in zfs, as VMs compress really well, i.e. ~/.vagrant.d/ is just a softlink to a zfs mountpoint. And as we are using the ramdisk based workflow above this data rarely changes.