Proxmox VE 7 replace zfs boot disk
Everything dies, even enterprise hardware.
This is why having a failover is a good thing.
I am running my Proxmox VE 7 servers with a mirrored ZFS root pool, so I can protect myself against a single drive dying and taking down a proxmox server.
Today I received 8 SATADOM’s that I wanted to use for my boot drives intead of my SATA disks that were tiny and slow.
So I had to dig up the proper way to replace the drives.
Proxmox partition schema for a boot disk is:
Partition 1 = BIOS Boot Partition 2 = EFI Boot Partition 3 = ZFS
Install the new boot drive(s) into the server. If you are lucky you have hot-plug drives and don’t need to power down the server.
For simplicity’s sake I will use this example of hardware and zfs pool.
Old disks are:
sda & /dev/
New disks are:
sdc & /dev/
The root pool looks like this:
pool: rpool state: ONLINE scan: resilvered 1.72G in 00:00:28 with 0 errors on Mon May 2 17:28:09 2022 config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sda-part3 ONLINE 0 0 0 sdb-part3 ONLINE 0 0 0
The names are simplified - in reality they would be called something similar to:ata-TOSHIBA_THNSN8960PCSE_26MS10GLTB1V-part3
So what needs to be done is simply - replace a disk one by one, waiting for the resilver process to complete and then initialize the disk so it can be booted from.
With that in mind, this is the process - we want to replace
Partitions & ZFS
sgdisk /dev/sda -R /dev/sdc sgdisk -G /dev/sdc zpool replace -f rpool sda-part3 /dev/disk/by-id/sdc-part3
The above steps copies the partition table from
sdc and initializes new guids for the partitions, then it replaces
sdc-part3 into the zfs pool
When the last command has been entered then zfs will start to resilver, which means basically copy data from the old disk to the new disk.
You can check the status of the resilver process by entering
zpool status -v rpool
This command will output some stats about the resilver speed and the percentage its done.
Proxmox boot refreshing
When the resilver process is done then the proxmox environment needs to be installed on the EFI partition.
This is done via:
proxmox-boot-tool format /dev/sdc2 proxmox-boot-tool init /dev/sdc2
If you want to be 100% sure that everything is okay with the new disk, you can run:
This refreshes the boot environments on all EFI/BIOS boot partitions in the system. At this stage it will refresh
sdc - since all 3 disks are bootable at this stage.
If you only had to replace one disk you can stop here and congratulate yourself on having paid for the insurance of being able to replace a failed boot drive.
If you want to replace the next drive, you simply repeat the process and just replace
When both drives has been placed a
zpool status will show this:
config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdc-part3 ONLINE 0 0 0 sdd-part3 ONLINE 0 0 0
Expanding the size of the root partitions
If you like me replaced the boot drives with higher capacity disks, then you could consider expanding the zfs partition, so proxmox gets a little more disk space on the root partition.
This is done in multiple steps.
First ensure that you have parted installed - if not then install it by running
apt-get install parted
# resize partition 3 of sdc to use 50% of the available space (partition 3 is the ZFS partition) parted /dev/sdc resizepart 3 50% # expand zfs on sdc to use the entire expanded partition zpool online -e rpool /dev/disk/by-id/sdc-part3 # resize partition 3 of sdd to use 50% of the available space (partition 3 is the ZFS partition) parted /dev/sdd resizepart 3 50% # expand zfs on sdd to use the entire expanded partition zpool online -e rpool /dev/disk/by-id/sdd-part3
I the above example I have expanded the partition to 50% of the available size.
This is called over provisioning - which basically means that the SSD controller have more room to reallocate failed sectors to cells not failed.
Which in turn mean your disk will last longer.
I also did this on my own servers, since the boot drive is not being used for much - certainly not storing virtual machines if you know what you are doing - and then you do not require a lot of space, so what is important is that the drives last long - and this where under provisioning can help.