VM migration seems to have cleared VM secure boot state
-
We are hitting a rather interesting case.
In a production environment (XCP-ng 8.2.1) the secure boot changes according to KB5025885 were implemented on a Windows Server 2019 VM (this changed the VM DB and KEK) back in June. After the changes were completed, the VM got live migrated from one pool node to the other without a reboot a month or so later.
For some reason, this seems to have cleared the Secure Boot state of the VM and probably applied the pool's default entries again. Because a subsequent reboot another month later landed the VM in the UEFI shell. After endless hours of debugging this makes sense, since the new Windows bootloader is signed by a certificate that XCP-ng does not know about.
Disabling Secure Boot on the VM allows it to start. So we can get the following output from it:
[System.Text.Encoding]::ASCII.GetString((Get-SecureBootUEFI db).bytes) -match 'Windows UEFI CA 2023' False [System.Text.Encoding]::ASCII.GetString((Get-SecureBootUEFI dbx).bytes) -match 'Microsoft Windows Production PCA 2011' True
Does anyone have any insights into how to re-enable secure boot on such a VM again? One option is probably to include the UEFI 2023 DB / KEK entries as described here: https://github.com/xcp-ng/uefistored/issues/52.
Other suggestions are more than welcome.
References:
- https://support.microsoft.com/en-us/topic/kb5025885-how-to-manage-the-windows-boot-manager-revocations-for-secure-boot-changes-associated-with-cve-2023-24932-41a975df-beb2-40c1-99a3-b3ff139f832d
- https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/windows-secure-boot-key-creation-and-management-guidance?view=windows-11
-
Thanks for this report. EFI variables are supposed to be migrated or exported along with the VM, this is a surprising case. I wonder if there's a specific issue with cross-pool live migration.
-
You can download the recent certificates from Microsoft, and manually install them to the pool. See https://docs.xcp-ng.org/guides/guest-UEFI-Secure-Boot/#install-the-default-uefi-certificates-manually, to be adapted to the newer files.
Then you can trigger the propagation from the pool to the VM.
-
Thanks for your insights. After a lot of trial and error we were able to get the VM back online with secure boot enabled. The recovery was as follows:
- Disable secure boot in Xen orchestra for the VM
- Boot Windows without secure boot
- Drop into the UEFI firmware settings via
shutdown /f /r /o /t 0
and selectingTroubleshoot
->Advanced Options
->UEFI Firmware Settings
- Then select
Boot Maintenance Manager
->Boot from File
- Select the right volume and browse to
EFI\Microsoft\Boot
- Select
SecureBootRecovery.efi
and hit enter to start the program, this will re-apply the certificate "Windows UEFI CA 2023" to the secure boot DB
-
@stormi This did not work on a test system. The command simply errored out.
-
@stormi I have seen this state clearing for two VM migrations on the same pool. The hardware on all machines is identical and the migration is from one machine to the other, so no cross-pool migration involved.
What we have also observed after the fact, is that in Xen orchestra it states that the VM has been created on the day of the migration, not the day the VM was actually created. So it seems as it was indeed "re-created" after the migration.
For "failed" machines it says:
Created by Unknown on 2024-08-25 18:11 with template Windows Server 2019 (64-bit)
For machines which were not migration but created at the same instant:
Created by Unknown on 2022-01-07 10:09 with template Windows Server 2019 (64-bit)
-
@conitrade-as said in VM migration seems to have cleared VM secure boot state:
@stormi This did not work on a test system. The command simply errored out.
Which command exactly, and with which error?
-
A VM's UEFI variables are not supposed to be erased after a migration . What kind of migration was it? Can you describe the exact operations, and whether the storage is local or shared?