Posts made by flakpyro | XCP-ng and XO forum

flakpyro

@dinhngtu The output of that command seems to result in the output of "Scanningn every UUID in my pool", most of which do not use SB. Im guessing that normal and i do not have any affected VMs any longer?

flakpyro

Awesome thanks for the response. I took a snapshot and tried rebooting a VM and it booted back up without issue simply by clicking the propagate button on each affecting VM after reverting and running "secureboot-certs install"

flakpyro

@dinhngtu Ah okay, i was wondering if "d719b2cb-3d3a-4596-a3bc-dad00e67656f" indicated i was back on the safe certs since it is the same on all VMs since reverting and clicking "Copy the pools default UEFI Certificates to the VM"

So i need to run

varstore-rm f9166a11-3c3f-33f1-505c-542ce8e1764d d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx

while powered off to be safe?

flakpyro

@stormi

I reverted the package however i initially followed the directions provided by vates in the release blog post and ran "secureboot-certs clear" then on each VM with Secure boot enabled i clicked "Copy the pools default UEFI Certificates to the VM".

After reverting the updates and running secureboot-certs install again i went back and clicked "Copy the pools default UEFI Certificates to the VM" again thinking it would put the old certs back.

It sounds like this may not be enough and i need to remove the dbx record from each of these VMs. Am i correct or was that enough to fix these VMs?

Per the docs:

varstore-rm <vm-uuid> d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx

Note that the GUID may be found by using varstore-ls <vm-uuid>.

When i run the command i see

Exmaple:

varstore-ls f9166a11-3c3f-33f1-505c-542ce8e1764d
8be4df61-93ca-11d2-aa0d-00e098032b8c SecureBoot
8be4df61-93ca-11d2-aa0d-00e098032b8c DeployedMode
8be4df61-93ca-11d2-aa0d-00e098032b8c AuditMode
8be4df61-93ca-11d2-aa0d-00e098032b8c SetupMode
8be4df61-93ca-11d2-aa0d-00e098032b8c SignatureSupport
8be4df61-93ca-11d2-aa0d-00e098032b8c PK
8be4df61-93ca-11d2-aa0d-00e098032b8c KEK
d719b2cb-3d3a-4596-a3bc-dad00e67656f db
d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx
605dab50-e046-4300-abb6-3dd810dd8b23 SbatLevel
fab7e9e1-39dd-4f2b-8408-e20e906cb6de HDDP
e20939be-32d4-41be-a150-897f85d49829 MemoryOverwriteRequestControl
bb983ccf-151d-40e1-a07b-4a17be168292 MemoryOverwriteRequestControlLock
9d1947eb-09bb-4780-a3cd-bea956e0e056 PPIBuffer
9d1947eb-09bb-4780-a3cd-bea956e0e056 Tcg2PhysicalPresenceFlagsLock
eb704011-1402-11d3-8e77-00a0c969723b MTC
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0000
8be4df61-93ca-11d2-aa0d-00e098032b8c Timeout
8be4df61-93ca-11d2-aa0d-00e098032b8c Lang
8be4df61-93ca-11d2-aa0d-00e098032b8c PlatformLang
8be4df61-93ca-11d2-aa0d-00e098032b8c ConIn
8be4df61-93ca-11d2-aa0d-00e098032b8c ConOut
8be4df61-93ca-11d2-aa0d-00e098032b8c ErrOut
9d1947eb-09bb-4780-a3cd-bea956e0e056 Tcg2PhysicalPresenceFlags
8be4df61-93ca-11d2-aa0d-00e098032b8c Key0000
8be4df61-93ca-11d2-aa0d-00e098032b8c Key0001
5b446ed1-e30b-4faa-871a-3654eca36080 0050569B1890
937fe521-95ae-4d1a-8929-48bcd90ad31a 0050569B1890
9fb9a8a1-2f4a-43a6-889c-d0f7b6c47ad5 ClientId
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0003
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0004
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0005
4c19049f-4137-4dd3-9c10-8b97a83ffdfa MemoryTypeInformation
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0006
8be4df61-93ca-11d2-aa0d-00e098032b8c BootOrder
8c136d32-039a-4016-8bb4-9e985e62786f SecretKey
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0001
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0002

So the command would be:

varstore-rm f9166a11-3c3f-33f1-505c-542ce8e1764d d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx correct? 

Does "d719b2cb-3d3a-4596-a3bc-dad00e67656f " indicate the old certs have been re-installed?

flakpyro

Updates 3 pools. All fine except for the last host on the last pool ran into a yum mirror error and failed. Manually running yum update and rebooting the host worked. Sadly i then had to move VMs back around since the RPU failed and the process of migrating VMs back did not kick off as a result.

Error was

One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=<repoid> ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>
        or
            subscription-manager repos --disable=<repoid>

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

xcp-ng-base: Check uncompressed DB failed

flakpyro

@stormi Installed on my usual test hosts (Intel Minisforum MS-01, and Supermicro running a Xeon E-2336 CPU). Also installed onto a 2 host AMD epyc pool. Updates went smooth, backups continue to function as before.

3 windows 11 VMs had secure boot enabled. In XOA i clicked "Copy pool's default UEFI certificates to the VM" after the update was complete. The VMs continued to boot without issue after.

flakpyro

@Kreeblah Not related to your i40e issue but when you are running the standard kernel, before the system shuts down due to a thermal shutdown do you see anything when running the command "xl dmesg" from ssh? I have seen xen thermal throttle cpu cores there and report when it happens. On some SFF systems (Minisforum MS-01) i have simply adjusted the boost clocks (which you can do from inside xcp-ng) to stop it from happening. Though you mention you are not going over 60C which should not trigger any throttling.

flakpyro

I believe this was a bug that was fixed in later versions:
https://support.citrix.com/support-home/kbsearch/article?articleNumber=CTX678047

flakpyro

@Pilow You can watch it via the cli using Journal using this command. Not as nice as having it in the UI but useful if you are waiting for a merge for finish before running maintenance on your remotes.

sudo journalctl -u xo-server -f -n 50

flakpyro

@dinhngtu Thats great news!

flakpyro

I had a job what was interrupted for some reason during its backup run, upon retrying a full backup was required (expected) however the UI shows contradicting information about the VMs backup job.

I think ideally it should be clarified that a Delta is not running but instead a full? (At the bottom)

flakpyro

@dinhngtu Well that is slightly concerning! Ill be sure to not remove Xen Tools on any of these VMs until we can get this resolved. We have a handful of production server 2025 VMs that im now slightly worried about! I should note that last week we did update them to version 9.4.2 of the tools, which tends to require 2 reboots to fully install and we didn't run into any issues.

flakpyro

@dinhngtu Very strange! I wonder why installing from an older ISO and then installing all the updates has no issues at all? Im relieved to know that our current 2025 VMs wont stop working after a reboot!

flakpyro

@McHenry No the pool master must always been patched and rebooted first. Do you have a pool metadata backup? Are your VMs on shared storage of some sort? In case you need to rebuild the pool.

flakpyro

@McHenry I am wondering if you are in a situation where you need to reboot. You have patches installed but have not rebooted, but have restarted tool stacks on the salves, which means some components have restarted and are running on their new versions? If you have support it may be best to reach out to Vates for guidance.

flakpyro

@McHenry Have you tried restarting the toolstack on the hosts since running pool-recover-slave?

I have only had to do this once before but remember it going fairly smoothly at the time. (As smooth as you can expect a host failure to be anyways)

flakpyro

@McHenry

Yes you'd run that command on the pool master.

I linked this earlier in the thread, it outlines the process you need to follow:
https://docs.xenserver.com/en-us/xenserver/8/dr/machine-failures.html

Let us know if this works!

flakpyro

@McHenry did you run

xe pool-recover-slave

after selecting a new master?

flakpyro

@McHenry No moving the pool master shouldn't result in any config loss.

flakpyro

@McHenry i think that us your best bet, you can also leave the new master as the master, and the rebuilt host as a salve. Did the other hosts all update ok? Kind of concerning that running updates resulted in an unbootable host.