Proxmox 6 to 7 to 8 Upgrade

While I've been away my Proxmox instance has fallen out of date, as this is the base OS on all my hardware I really need to preform this upgrade. I've been putting it off for far too long due to how labor intensive it seems from their documentation.

Backup

Before getting into this major upgade, I really should take backups of my nodes, which shamefully I've never actually done before.

For Proxmox the best practice is to take two layers of backups:

  • Proxmox Host configuration
  • Guest VM Snapshots

Proxmox Host

The easiest way I found was a script on Github, you can store the result wherever you want. In my case I mounted a CIFS disk and set that as my target.

Here is the process I used:

  1. Login to the machine to backup via SSH
  2. Download the script
wget -q0- https://raw.githubusercontent.com/DerDanilo/proxmox-stuff/master/prox_config_backup.sh > prox_config_backup.sh
  1. Edit line 16 DEFAULT_BACK_DIR to be the storage location in your proxmox cluster you want to write the backup file to. If you want to add another location, do it from the Proxmox UI Datacenter -> Storage.
  2. Make the script executable chmod +x prox_config_backup.sh
  3. Execute the script! It is very verbose, I suggest you read the output.

Guest Snapshots

This can and should be completed through the Proxmox UI. Just be sure that you are moving the snapshots to a remote disk.

  1. Click each VM you need to backup and take a snapshot! again store this somewhere external.

6.x to 7

I have two hosts to update, we'll call them Primary and Secondary. The secondary server obtains it's internet gateway through the Primary so I have to add a direct ethernet connection for preforming the upgrade instead.

Proxmox has a guide for preforming this procedure that I will reiterate in more concise steps here:

My process will be:

  1. Connect Serial Console and Ethernet Cable for direct internet connection
  2. Update to latest 6.x apt update && apt upgrade && apt dist-upgrade
  3. Execute preflight check pve6to7 --full
  4. Confirm MAC Addresses of adaptors are hardcoded at /etc/network/interfaces
  5. Update all debian repos to Bullseye sed -i 's/buster\/updates/bullseye-security/g;s/buster/bullseye/g' /etc/apt/sources.list
  6. sed no subscription repos from 6 to 7 sed -i -e 's/buster/bullseye/g' /etc/apt/sources.list.d/pve-install-repo.list
  7. View /etc/apt/sources.list.d/pve-enterprise.list and /etc/apt/sources.list to confirm repositories match expected, see repositories
  8. Upgrade apt update && apt dist-upgrade
  9. Move serial console and cable to next host and repeat 2-8

7.x to 8

Again a guide is provided by Proxmox which I will reiterate into a concise process list:

  1. Connect serial console and ethernet cable
  2. Run preflight checks pve7to8 --full
  3. Confirm pve version 7.4-15 or newer with pveversion
  4. Update to 'Bookworm' sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list
  5. Confirm no Bullseye repos remain in /etc/apt/sources.list.d/pve-enterprise.list and /etc/apt/sources.list
  6. Replace bullseye with bookworm for pve repos: sed -i -e 's/bullseye/bookworm/g' /etc/apt/sources.list.d/pve-install-repo.list
  7. Update and upgrade apt update && apt dist-upgrade

6 to 7 Upgrade Log

I ran initial updates and scripts over SSH instead: Alt text

Primary Node:

= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.203-1-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 2
INFO: configured votes - qdevice: 0
INFO: current expected votes: 2
INFO: current total votes: 2
WARN: cluster consists of less than three quorum-providing nodes!

Checking nodelist entries..
PASS: nodelist settings OK

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

SKIP: no hyper-converged ceph setup detected!

= CHECKING CONFIGURED STORAGES =

WARN: storage 'external-backup' enabled but not active!
PASS: storage 'local' enabled and active.
PASS: storage 'local-lvm' enabled and active.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
WARN: 1 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'salmonsec' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '10.100.0.11' configured and active on single interface.
INFO: Checking backup retention settings..
INFO: storage 'local' - no backup retention settings defined - by default, PVE 7.x will no longer keep only the last backup, but all backups
PASS: no problems found.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list:10
SKIP: No containers on node detected.

= SUMMARY =

TOTAL:    25
PASSED:   20
SKIPPED:  2
WARNINGS: 3
FAILURES: 0

ATTENTION: Please check the output for detailed information!

Secondary Node:

= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages uptodate

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 6.4-1

Checking running kernel version..
PASS: expected running kernel '5.4.203-1-pve'.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 2
INFO: configured votes - qdevice: 0
INFO: current expected votes: 2
INFO: current total votes: 2
WARN: cluster consists of less than three quorum-providing nodes!

Checking nodelist entries..
PASS: nodelist settings OK

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

SKIP: no hyper-converged ceph setup detected!

= CHECKING CONFIGURED STORAGES =

WARN: storage 'external-backup' enabled but not active!
PASS: storage 'local' enabled and active.
PASS: storage 'local-lvm' enabled and active.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for running guests..
WARN: 1 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'pveworker0' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '10.100.0.12' configured and active on single interface.
INFO: Checking backup retention settings..
INFO: storage 'local' - no backup retention settings defined - by default, PVE 7.x will no longer keep only the last backup, but all backups
PASS: no problems found.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking custom roles for pool permissions..
INFO: Checking node and guest description/note legnth..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking storage content type configuration..
PASS: no problems found
INFO: Checking if the suite for the Debian security repository is correct..
INFO: Make sure to change the suite of the Debian security repository from 'buster/updates' to 'bullseye-security' - in /etc/apt/sources.list:6
SKIP: No containers on node detected.

= SUMMARY =

TOTAL:    25
PASSED:   20
SKIPPED:  2
WARNINGS: 3
FAILURES: 0

ATTENTION: Please check the output for detailed information!

Warnings:

  • WARN: storage 'external-backup' enabled but not active!
  • WARN: 1 running guest(s) detected - consider migrating or stopping them.
  • WARN: cluster consists of less than three quorum-providing nodes!

These are all acceptable:

  • 'external-backup' is a Samba drive that is offline, this is fine
  • I'll stop all guests before I do the update procedure, once I connect via serial console
  • I only have two nodes, sad me but that's all I can do!

My network interfaces do not have hardcoded MAC Addresses on either node. What exactly does the proxmox Wiki say for this?

With Proxmox VE 7, the MAC address of the Linux bridge itself may change, as noted in Upgrade from 6.x to 7.0#Linux Bridge MAC-Address Change.

In hosted setups, the MAC address of a host is often restricted, to avoid spoofing by other hosts.

Each of my subnets has a bridge so I certainly don't want them to mess up, however the MAC address is not restricted in my environment so perhaps I do not need to worry? Solution A is to use ifupdown2, I'm not sure if I'm already using that.

Both hosts show it is installed and is above the minimum declared version from the docs, so I'm just going to not worry about this and hope for the best! Alt text

Finally, I updated the repos then logged out of SSH session.

I stopped all Virtual Machines on the cluster, then plugged into the serial console and ran apt update && apt dist-upgrade

I hit the issue:

Upgrade wants to remove package 'proxmox-ve'

To resolve I followed the Wiki suggestion and did apt remove linux-image-amd64 but this package was not installed so it changed nothing.

I first installed the kernel helper, and rebooted:

apt install pve-kernel-helper && reboot now

This didn't help.

Then I tried a suggestion from here. I used to have a Ceph cluster but stopping using it. Apparently you need to add those repos back to get the upgrade to work:

echo "deb http://download.proxmox.com/debian/ceph-octopus bullseye main" > /etc/apt/sources.list.d/ceph.list
apt update
apt dist-upgrade -y

It worked: Alt text

I preformed the same steps on the secondary node, however that one got stuck at 99% with a repeated error message: proc: Bad value for 'hidepid'

Apparently this is harmless so I just continued to wait... I know my disks on this machine are failing and extremely slow. Turns out I just had to hit enter... there was a prompt buried under all the messages from proc.

I fixed my routes and reboot both nodes to get the cluster back to a healthy state. One of my VM's wouldn't start on the secondary nodes with the error:

TASK ERROR: activating LV 'pve/data' failed: Activation of logical volume pve/data is prohibited while logical volume pve/data_tmeta is active.

The following commands resolved the issue:

lvchange -an pve/data_tdata
lvchange -an pve/data_tmeta
lvchange -ay pve/data

This actually failed, with errors citing my PV metadata was corrupted... thankfully a restart resolved the issue.

7 to 8 Upgrade Log

Primary Node:

= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages up-to-date

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 7.4-1

Checking running kernel version..
PASS: running kernel '5.15.116-1-pve' is considered suitable for upgrade.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 2
INFO: configured votes - qdevice: 0
INFO: current expected votes: 2
INFO: current total votes: 2
WARN: cluster consists of less than three quorum-providing nodes!

Checking nodelist entries..
PASS: nodelist settings OK

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

SKIP: no hyper-converged ceph setup detected!

= CHECKING CONFIGURED STORAGES =

WARN: storage 'external-backup' enabled but not active!
PASS: storage 'local' enabled and active.
PASS: storage 'local-lvm' enabled and active.
INFO: Checking storage content type configuration..
PASS: no storage content problems found
WARN: activating 'external-backup' failed - storage 'external-backup' is not online

PASS: no storage re-uses a directory for multiple content types.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvescheduler.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for supported & active NTP service..
WARN: systemd-timesyncd is not the best choice for time-keeping on servers, due to only applying updates on boot.
  While not necessary for the upgrade it's recommended to use one of:
    * chrony (Default in new Proxmox VE installations)
    * ntpsec
    * openntpd

INFO: Checking for running guests..
WARN: 1 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'salmonsec' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '10.100.0.11' configured and active on single interface.
INFO: Check node certificate's RSA key size
PASS: Certificate 'pve-root-ca.pem' passed Debian Busters (and newer) security level for TLS connections (4096 >= 2048)
PASS: Certificate 'pve-ssl.pem' passed Debian Busters (and newer) security level for TLS connections (2048 >= 2048)
INFO: Checking backup retention settings..
PASS: no backup retention problems found.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking permission system changes..
INFO: Checking custom role IDs for clashes with new 'PVE' namespace..
PASS: no custom roles defined, so no clash with 'PVE' role ID namespace enforced in Proxmox VE 8
INFO: Checking if LXCFS is running with FUSE3 library, if already upgraded..
SKIP: not yet upgraded, no need to check the FUSE library version LXCFS uses
INFO: Checking node and guest description/note length..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking if the suite for the Debian security repository is correct..
PASS: found no suite mismatch
INFO: Checking for existence of NVIDIA vGPU Manager..
PASS: No NVIDIA vGPU Service found.
INFO: Checking bootloader configuration...
SKIP: not yet upgraded, no need to check the presence of systemd-boot
SKIP: No containers on node detected.

= SUMMARY =

TOTAL:    36
PASSED:   27
SKIPPED:  4
WARNINGS: 5
FAILURES: 0

ATTENTION: Please check the output for detailed information!

Secondary Node:

= CHECKING VERSION INFORMATION FOR PVE PACKAGES =

Checking for package updates..
PASS: all packages up-to-date

Checking proxmox-ve package version..
PASS: proxmox-ve package has version >= 7.4-1

Checking running kernel version..
PASS: running kernel '5.15.116-1-pve' is considered suitable for upgrade.

= CHECKING CLUSTER HEALTH/SETTINGS =

PASS: systemd unit 'pve-cluster.service' is in state 'active'
PASS: systemd unit 'corosync.service' is in state 'active'
PASS: Cluster Filesystem is quorate.

Analzying quorum settings and state..
INFO: configured votes - nodes: 2
INFO: configured votes - qdevice: 0
INFO: current expected votes: 2
INFO: current total votes: 2
WARN: cluster consists of less than three quorum-providing nodes!

Checking nodelist entries..
PASS: nodelist settings OK

Checking totem settings..
PASS: totem settings OK

INFO: run 'pvecm status' to get detailed cluster status..

= CHECKING HYPER-CONVERGED CEPH STATUS =

SKIP: no hyper-converged ceph setup detected!

= CHECKING CONFIGURED STORAGES =

WARN: storage 'external-backup' enabled but not active!
PASS: storage 'local' enabled and active.
PASS: storage 'local-lvm' enabled and active.
INFO: Checking storage content type configuration..
PASS: no storage content problems found
WARN: activating 'external-backup' failed - storage 'external-backup' is not online

PASS: no storage re-uses a directory for multiple content types.

= MISCELLANEOUS CHECKS =

INFO: Checking common daemon services..
PASS: systemd unit 'pveproxy.service' is in state 'active'
PASS: systemd unit 'pvedaemon.service' is in state 'active'
PASS: systemd unit 'pvescheduler.service' is in state 'active'
PASS: systemd unit 'pvestatd.service' is in state 'active'
INFO: Checking for supported & active NTP service..
WARN: systemd-timesyncd is not the best choice for time-keeping on servers, due to only applying updates on boot.
  While not necessary for the upgrade it's recommended to use one of:
    * chrony (Default in new Proxmox VE installations)
    * ntpsec
    * openntpd

INFO: Checking for running guests..
WARN: 1 running guest(s) detected - consider migrating or stopping them.
INFO: Checking if the local node's hostname 'pveworker0' is resolvable..
INFO: Checking if resolved IP is configured on local node..
PASS: Resolved node IP '10.100.0.12' configured and active on single interface.
INFO: Check node certificate's RSA key size
PASS: Certificate 'pve-root-ca.pem' passed Debian Busters (and newer) security level for TLS connections (4096 >= 2048)
PASS: Certificate 'pve-ssl.pem' passed Debian Busters (and newer) security level for TLS connections (2048 >= 2048)
INFO: Checking backup retention settings..
PASS: no backup retention problems found.
INFO: checking CIFS credential location..
PASS: no CIFS credentials at outdated location found.
INFO: Checking permission system changes..
INFO: Checking custom role IDs for clashes with new 'PVE' namespace..
PASS: no custom roles defined, so no clash with 'PVE' role ID namespace enforced in Proxmox VE 8
INFO: Checking if LXCFS is running with FUSE3 library, if already upgraded..
SKIP: not yet upgraded, no need to check the FUSE library version LXCFS uses
INFO: Checking node and guest description/note length..
PASS: All node config descriptions fit in the new limit of 64 KiB
PASS: All guest config descriptions fit in the new limit of 8 KiB
INFO: Checking container configs for deprecated lxc.cgroup entries
PASS: No legacy 'lxc.cgroup' keys found.
INFO: Checking if the suite for the Debian security repository is correct..
PASS: found no suite mismatch
INFO: Checking for existence of NVIDIA vGPU Manager..
PASS: No NVIDIA vGPU Service found.
INFO: Checking bootloader configuration...
SKIP: not yet upgraded, no need to check the presence of systemd-boot
SKIP: No containers on node detected.

= SUMMARY =

TOTAL:    36
PASSED:   27
SKIPPED:  4
WARNINGS: 5
FAILURES: 0

ATTENTION: Please check the output for detailed information!

Warning Summary:

  • Cluster not big enough for HA pair
  • external-backup not online
  • Guests are running
  • systemd-timesyncd is not the best choice for time-keeping on servers, due to only applying updates on boot.

These are all acceptable except for the recommendation to change the time-keeping method being used. I have ran into sync issues in the past on these nodes so changing it is welcomed.

To make this change, you simply install chrony which will automatically clean up the previously used systemd-timesyncd.

On both nodes:

apt install chrony

I then re-ran pve7to8 to confirm that warning was resolved.

Then:

  • Confirm pve version >7.4-15
  • Update package repos
  • This time there is no bookworm ceph, so I removed it entirely
  • Connect via serial console, connect ethernet and fix default route
  • Preform Update

Alt text

Once again, everything went more smooth than I expected. The biggest time sink on this project was performing backups and debugging issues with storage.

References