Gandi.net
Refreshing the filers
- Nexenta based since 2007
- Difficulty to provide non attended setup
- Kernel patches for multipath on new disks
- Python stuck to an old (and buggy) version
- Very long boot time
- OS
- zpool import
- iSCSI export
Study
requirements
- ZFS
- Ability to server 1000 NFS and 900 iSCSI
- Possible extension to 1500 NFS and 1000 iSCSI
- Support NFSv4 with delegation
- Powerfull debugging tools (in particular dtrace)
- Support accessing JBOD with multipath
- OpenSource with an active community
- Ability to easily upstream patches
- Ability to run containers
Study
Candidates
-
Illumos family:
- OpenIndiana
- OmniOS
- SmartOS
- Newer Nexenta
- FreeBSD
- Linux (with ZoL)
Linux
Rejected:
- ZoL cannot be upstreamed due to license incompatiblities
- Lots of regressions due to not being part of the upstream kernel
Illumos
Nexenta
Rejected:
- Community version limited to 18TB
- Upstreaming not easy
Illumos
OpenIndiana
Rejected:
- Small community
- Fragile build system
- Old python (2.6)
Illumos
SmartOS
Rejected:
- Global zone hard to customize
- no iSCSI/NFS management delegation
- Not design to make filers
Illumos
- Exporting lots of iSCSI targets still long: more than 5 minutes
- Kernel still has to be patched for new disk manufactureres
FreeBSD
The good
- Strong reputation on storage area
- Support modern ZFS and dtrace
- ctld(4)
- Very fast iSCSI export: few seconds
- good NFSv4 support
- mdb -> sysctl/kgdb
- Fast zpool import (tips: disable trim support)
FreeBSD
the bad
- Bad support for diskless netbooting
- Slow to boot on large MFSROOT
- No multiboot support == no proper iPXE support
Design: diskless
- Unattended setup via puppet
- Upgradability: just reboot
- Easy backtracking: just reboot
- Free from admin heroes
- Easy migration from Nexenta
- Safe migration from Nexenta
Design: booting sequence
Early boot
- DHCP request
- tftp get pxeboot
- tftp get /boot/ configs
- tftp get kernel, modules, miniroot
Design: booting sequence
Boot miniroot
- run custom rc
- create a ramdisk
- http get filer.txz config.txz puppet-.txz
- extrac into ramdisk
- reroot on ramdisk
Design: booting sequence
Final boot
- zpool import
- puppet run
- starts Gandi's middleware
- ready to serve
Contributions
py-libzfs (FreeNAS)
- Implement zfs clone support
- Implement zfs promote support
- Implement support for properties (including custom)
- Implement volume support
- Bug fixing
Contributions
mpsutil(8)/mprutil(8) (Netflix)
- Finish integration with FreeBSD build system
- implement flashing firmwares/bios
Contributions
Playing the guinea pig
- reroot (by trasz@)
- smarter mount root wait (by trasz@)
sesutil(8)
Managing SCSI Enclosure Services
- blink locate led (only disks)
- blink fault led (only disks)
- show the detailed map of an enclosure
- easy to use:
$ sesutil locate da3 on
$ sesutil locale all off
sesutil(8)
Vendor tools
- Lots of noise in the logs
- 2 differents tools for SAS2 and SAS3
- Unfriendly UI
sesutil(8)
sg_ses (sg3_utils)
- Unfriendly UI
- mapping disks complex
Rework pxeboot with TFTP support
Add support for root-path DHCP option to act like pxeboot with NFS support
Running HEAD
stable most of the time
needed features only available there
easier to upstream patches
find (and fix) as early as possible bugs
Gandi's workload very well identified
Test lab
- Drived by Zopkio
- Simulating broken disks using gnop(8)
- Simulating bad network access using ipfw(8) + dummynet(4)
- Simulating crash and reboot under high load from consumers
- Profile based test lab
Futur plans
Improve sesutil(8)
- libxoify(?)
- Add microcode update support
- Extend locate to support other devices
Futur plans
Improve ZFS(8)
- Improve zpool import speed
- Tuning tunable like arc_max into safe read/write tunables
- Maybe new features to improve reliability
Futur plans
Improve for iPXE support
- Implement a FreeBSD specific loader
or
- Turn the FreeBSD kernel into multiboot
Futur plans
Improve CTL(4)
- Convert the number of ports and lun per ports into sysctl
- Turn ctl(4) into using libucl (too late)
Futur plans
Storage related tooling
- Implement port some dtrace scripts from Illumos
- Improve geom_multipath algorithm to better match ZFS requirements