Random Musings

O for a muse of fire, that would ascend the brightest heaven of invention!


Adding a new nvme drive

Wednesday, 12 May 2021 Tags: freebsdnvmestoragezfs

# nvmecontrol devlist
 nvme0: Samsung SSD 970 PRO 1TB
    nvme0ns1 (976762MB)
# nvmecontrol identify nvme0
Controller Capabilities/Features
================================
Vendor ID:                   144d
Subsystem Vendor ID:         144d
Serial Number:               S5JXNS0N606756R
Model Number:                Samsung SSD 970 PRO 1TB
Firmware Version:            1B2QEXP7
Recommended Arb Burst:       2
IEEE OUI Identifier:         38 25 00
Multi-Path I/O Capabilities: Not Supported
Max Data Transfer Size:      2097152 bytes
Controller ID:               0x0004
Version:                     1.3.0

Admin Command Set Attributes
============================
Security Send/Receive:       Supported
Format NVM:                  Supported
Firmware Activate/Download:  Supported
Namespace Managment:         Not Supported
Device Self-test:            Supported
Directives:                  Supported
NVMe-MI Send/Receive:        Not Supported
Virtualization Management:   Not Supported
Doorbell Buffer Config:      Not Supported
Get LBA Status:              Not Supported
Sanitize:                    Not Supported
Abort Command Limit:         8
Async Event Request Limit:   4
Number of Firmware Slots:    3
Firmware Slot 1 Read-Only:   No
Per-Namespace SMART Log:     Yes
Error Log Page Entries:      64
Number of Power States:      5
Total NVM Capacity:          1024209543168 bytes
Unallocated NVM Capacity:    0 bytes
Firmware Update Granularity: 00 (Not Reported)
Host Buffer Preferred Size:  0 bytes
Host Buffer Minimum Size:    0 bytes

NVM Command Set Attributes
==========================
Submission Queue Entry Size
  Max:                       64
  Min:                       64
Completion Queue Entry Size
  Max:                       16
  Min:                       16
Number of Namespaces:        1
Compare Command:             Supported
Write Uncorrectable Command: Supported
Dataset Management Command:  Supported
Write Zeroes Command:        Supported
Save Features:               Supported
Reservations:                Not Supported
Timestamp feature:           Supported
Verify feature:              Not Supported
Fused Operation Support:     Not Supported
Format NVM Attributes:       Crypto Erase, Per-NS Erase, All-NVM Format
Volatile Write Cache:        Present

NVM Subsystem Name:          

root@straylight /u/h/dch [64]# nvmecontrol perftest -n 32 -o read -s 512 -t 30 nvme0ns1
Threads: 32 Size:    512  READ Time:  30 IO/s:  401712 MB/s:  196

root@straylight /u/h/dch [1]# gpart create -s gpt nda0
nda0 created
# gpart add -t efi -s 600M -a 4k -i 1 -l efiboot nda0
nda0p1 added
# newfs_msdos -F 16 -L efi /dev/nda0p1 
/dev/nda0p1: 1228448 sectors in 38389 FAT16 clusters (16384 bytes/cluster)
BytesPerSec=512 SecPerClust=32 ResSectors=1 FATs=2 RootDirEnts=512 Media=0xf0 FATsecs=150 SecPerTrack=63 Heads=255 HiddenSecs=0 HugeSectors=1228800

# sysctl vfs.zfs.min_auto_ashift=12
vfs.zfs.min_auto_ashift: 12 -> 12

# gpart show nda0
=>        40  2000409184  nda0  GPT  (954G)
          40     1228800     1  efi  (600M)
     1228840  1999180384        - free -  (953G)

# gpart add -t freebsd-zfs  -a 4k -i 2 -l zfs -s 940G /dev/nda0 
nda0p2 added
# gpart add -t freebsd-swap -a 4k -i 3 -l swap /dev/nda0
nda0p3 added
# gpart show nda0
=>        40  2000409184  nda0  GPT  (954G)
          40     1228800     1  efi  (600M)
     1228840  1971322880     2  freebsd-zfs  (940G)
  1972551720    27857504     3  freebsd-swap  (13G)

# zpool create \
    -o failmode=continue \
    -o autotrim=on \
    -O canmount=off \
    -O mountpoint=none \
    -O compress=zstd-9 \
    -O atime=off \
    -O checksum=skein  \
    -R /mnt newpool \
    /dev/nda0p2
# zpool list -v
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
newpool     936G   516K   936G        -         -     0%     0%  1.00x    ONLINE  /mnt
  nda0p2    936G   516K   936G        -         -     0%  0.00%      -  ONLINE  
zroot       424G   160G   264G        -         -    37%    37%  1.00x    ONLINE  -
  ada0p3    424G   160G   264G        -         -    37%  37.8%      -  ONLINE  
# 

migration

# time zfs send -LevpR zroot@migration | zfs recv -Fuvs -o compression=zstd-9 -o checksum=skein newpool
full send of zroot@pristine estimated size is 12.6K
send from @pristine to zroot@20200805-0922 estimated size is 624B
send from @20200805-0922 to zroot@20200805-2014 estimated size is 624B
send from @20200805-2014 to zroot@20200807-0807 estimated size is 624B
send from @20200807-0807 to zroot@20200901-1647 estimated size is 624B
send from @20200901-1647 to zroot@wtf estimated size is 624B
send from @wtf to zroot@20201202-1423 estimated size is 624B
send from @20201202-1423 to zroot@20201216-1748 estimated size is 624B
send from @20201216-1748 to zroot@20201216-2343 estimated size is 624B
send from @20201216-2343 to zroot@20201230:weed-raft-panic estimated size is 624B
send from @20201230:weed-raft-panic to zroot@monthly-2021-04 estimated size is 624B
...
receiving incremental stream of zroot/jailed/w03@daily-2021-05-11 into newpool/jailed/w03@daily-2021-05-11
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zroot/jailed/w03@daily-2021-05-12 into newpool/jailed/w03@daily-2021-05-12
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of zroot/jailed/w03@migration into newpool/jailed/w03@migration
received 312B stream in 1 seconds (312B/sec)

________________________________________________________
Executed in  776.41 secs    fish           external
   usr time    0.75 secs    0.00 millis    0.75 secs
   sys time  826.31 secs    5.98 millis  826.31 secs

# zpool list -v
NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
newpool     936G   135G   801G        -         -     0%    14%  1.00x    ONLINE  /mnt
  nda0p2    936G   135G   801G        -         -     0%  14.4%      -  ONLINE  
zroot       424G   160G   264G        -         -    37%    37%  1.00x    ONLINE  -
  ada0p3    424G   160G   264G        -         -    37%  37.8%      -  ONLINE  

# zfs destroy -vrf newpool@migration

EFI boot selection

# efibootmgr -c -L "rEFInd" -l /mnt/EFI/refind/refind_aa64.efi -b 1
Boot to FW : false
BootCurrent: 0001
Timeout    : 5 seconds
BootOrder  : 0001, 001A, 0002, 0000, 001B, 001C
+Boot0001  rEFInd
 Boot001A* UEFI: Generic Flash Disk 8.07, Partition 1
 Boot0002* UEFI HTTP /pub/straylight
 Boot0000* Freeside HTTP Boot
 Boot001B* UEFI: HTTP IP4 Mellanox Network Adapter
 Boot001C* UEFI: HTTP IP4 Mellanox Network Adapter

/# efibootmgr -v
Boot to FW : false
BootCurrent: 0001
Timeout    : 5 seconds
BootOrder  : 0001, 001A, 0002, 0000, 001B, 001C
+Boot0001  rEFInd HD(1,GPT,eaf39753-b35c-11eb-bac1-001b21e07d7b,0x28,0x12c000)/File(\EFI\refind\refind_aa64.efi)
                     nda0p1:/EFI/refind/refind_aa64.efi /mnt//EFI/refind/refind_aa64.efi
 Boot001A* UEFI: Generic Flash Disk 8.07, Partition 1 PciRoot(0xff)/Pci(0x4,0x0)/USB(0x0,0x0)/USB(0x1,0x0)/HD(1,GPT,c59336d6-dd58-11ea-b05a-002590ec5bf2,0x3,0x10418)
                                                     VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,350031003200340033003600410038000000)
 Boot0002* UEFI HTTP /pub/straylight PcieRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(001b21e07d7b,0x0)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(http://freeside.skunkwerks.at/pub/straylight)
 Boot0000* Freeside HTTP Boot PcieRoot(0x2)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(001b21e07d7b,0x0)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri(http://freeside.skunkwerks.at/pub/HTTPboot/arm64/boot/loader.efi)
 Boot001B* UEFI: HTTP IP4 Mellanox Network Adapter PcieRoot(0x0)/Pci(0x0,0x0)/Pci(0x0,0x0)/MAC(b8599f1a8226,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri()
                                                  VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,4800540054005000200049005000340020004d0065006c006c0061006e006f00780020004e006500740077006f0072006b00200041006400610070007400650072000000)
 Boot001C* UEFI: HTTP IP4 Mellanox Network Adapter PcieRoot(0x0)/Pci(0x0,0x0)/Pci(0x0,0x1)/MAC(b8599f1a8227,0x1)/IPv4(0.0.0.0,0x0,DHCP,0.0.0.0,0.0.0.0,0.0.0.0)/Uri()
                                                  VenHw(2d6447ef-3bc9-41a0-ac19-4d51d01b4ce6,4800540054005000200049005000340020004d0065006c006c0061006e006f00780020004e006500740077006f0072006b00200041006400610070007400650072000000)

# zpool set bootfs=newpool/ROOT/13.0-RELEASE newpool

resizing the FreeBSD boot image

root@straylight /v/t/installer [1]# gpart show da0
=>     3  679128  da0  GPT  (2.0G) [CORRUPT]
       3   66584    1  efi  (33M)
   66587  612544    2  freebsd-ufs  (299M)

root@straylight /v/t/installer [1]# gpart recover da0
da0 recovered

root@straylight /v/t/installer# gpart show da0
=>      3  4108277  da0  GPT  (2.0G)
        3    66584    1  efi  (33M)
    66587   612544    2  freebsd-ufs  (299M)
   679131  3429149       - free -  (1.6G)

root@straylight /v/t/installer# gpart resize -i 2 da0
da0p2 resized

root@straylight /v/t/installer# growfs -y /dev/da0p2
super-block backups (for fsck_ffs -b #) at:
 612736, 1225280, 1837824, 2450368, 3062912, 3675456
root@straylight /v/t/installer# gpart show da0
=>      3  4108277  da0  GPT  (2.0G)
        3    66584    1  efi  (33M)
    66587  4041693    2  freebsd-ufs  (1.9G)

root@straylight /v/t/installer# 

tuning for reckless performance

For datasets where we can either simply retrieve from src (git or tarballs), or are build artefacts, and thus easily regenerated, we tell ZFS to lie about:

  • sync writes (say yes to app, do it async at leisure)
  • optimise for throughput (don’t sync as much as we are expected to)
  • metadata (reduce metadata means less redundancy of key zfs structures)

See zfsprops(8) for details.

# zfs set sync=disabled cache
# zfs set logbias=throughput cache
# zfs set redundant_metadata=most cache

## do the same for src/ports dirs where 
# zfs set redundant_metadata=most zroot/usr/obj
# zfs set redundant_metadata=most zroot/usr/ports
# zfs set redundant_metadata=most zroot/usr/src
# zfs set logbias=throughput zroot/usr/obj
# zfs set logbias=throughput zroot/usr/ports
# zfs set logbias=throughput zroot/usr/src
# zfs set sync=disabled zroot/usr/obj
# zfs set sync=disabled zroot/usr/src
# zfs set sync=disabled zroot/usr/ports