flakpyro

flakpyro

@stormi Installed on our 2 production pools, DR and remote sites, 46 hosts total ranging from Dell, Lenovo, HP, and Supermicro servers, no issues to report!

flakpyro

@gduperrey Updated my usual test hosts, (Minisforum and Supermicro X11) as well as an two sets of 2 host AMD pools (one pool of HP DL320 Gen10s and another of Asus Epyc servers of some sort, and lastly a Dell R360 without issue.

flakpyro

One of our pools. (5 hosts, 6 NFS SRs) had this issue when we first deployed it. I engaged with support from Vates and they changed a setting that reduced the frequency of the SR.scan job from 30 seconds to every 2 mins instead. This totally fixed the issue for us going on a year and a half later.

I dug back in our documentation and found the command they gave us

    xe host-param-set other-config:auto-scan-interval=120 uuid=<Host UUID>

Where hosts UUID is your pool master.

flakpyro

@stormi Installed on my usual test hosts (Intel Minisforum MS-01, and Supermicro running a Xeon E-2336 CPU). Also installed onto a 2 host AMD epyc pool. Updates went smooth, backups continue to function as before.

3 windows 11 VMs had secure boot enabled. In XOA i clicked "Copy pool's default UEFI certificates to the VM" after the update was complete. The VMs continued to boot without issue after.

flakpyro

@gduperrey

installed on 2 test machines

Machine 1:
Intel Xeon E-2336
SuperMicro board.

Machine 2:
Minisforum MS-01
i9-13900H
32 GB Ram
Using Intel X710 onboard NIC

Both machines installed fine and all VMs came up without issue after. My one test backup job also seemed to run without any issues.

flakpyro

@gduperrey installed on 2 test machines

Machine 1:
Intel Xeon E-2336
SuperMicro board.

Machine 2:
Minisforum MS-01
i9-13900H
32 GB Ram
Using Intel X710 onboard NIC

Both machines installed fine and all VMs came up without issue after.

I ran a backup job after to test snapshot coalesce, no issues there.

flakpyro

@stormi Updated a test machine running only couple VMs. Everything installed fine and rebooted without issue.

Machine is:
Intel Xeon E-2336
SuperMicro board.
One VM happens to be windows based with an Nvidia GPU passed though to it running Blue Iris using the MSR fixed found elsewhere on these forums, fix continues to work with this version of Xen.

flakpyro

@dthenot @olivierlambert thanks guys ill hold off on submitting a ticket for now to keep the conversation centralized here but if you need any more info, would like me to try anything or would like a remote support tunnel opened just let me know!

flakpyro

@McHenry i think the command you need to run on a current slave is

xe pool-emergency-transition-to-master
followed by
xe-toolstack-restart

This will make that slave the new pool master. You should only do this though if the current pool master for sure dead.

This may also be useful:
https://docs.xenserver.com/en-us/xenserver/8/dr/machine-failures.html

flakpyro

@olivierlambert On my test instance which is admittedly not very busy (only 2 hosts) this fixed the issue!

flakpyro

One of our pools. (5 hosts, 6 NFS SRs) had this issue when we first deployed it. I engaged with support from Vates and they changed a setting that reduced the frequency of the SR.scan job from 30 seconds to every 2 mins instead. This totally fixed the issue for us going on a year and a half later.

I dug back in our documentation and found the command they gave us

    xe host-param-set other-config:auto-scan-interval=120 uuid=<Host UUID>

Where hosts UUID is your pool master.

flakpyro

@henri9813 This is 100% related to the merge process still running in the background. I have seen it happen myself.

flakpyro

@dinhngtu Thanks. We plan to migrate all Windows VMs from the Citrix tools down the road and only have a handful of VMs running these so far so will maybe hold off until the next version. Have been running the Linux rust tools for over a year with zero issues..

flakpyro

@dinhngtu An issue i ran into in testing the final signed release.

On a Server 2025 VM after installing the tools the version is properly displayed in XOA, however after a migration the version is not displayed and instead "Management agent not detected" is shown, memory stats also stop displaying in XOA. After restarting the Rust tools service in windows however XOA picks up on the tools running within the VM again.

flakpyro

@dinhngtu said in VM start stuck on "Guest has not initialized the display (yet).":

You must run secureboot-certs clear if you're updating from 1.2.0-2.4 or 1.2.0-3.1 and have previously run secureboot-certs install with the above versions installed.

Should we run this before installing the update or after 1.2.0-3.2 has been installed?

flakpyro

@dinhngtu The output of that command seems to result in the output of "Scanningn every UUID in my pool", most of which do not use SB. Im guessing that normal and i do not have any affected VMs any longer?

flakpyro

Awesome thanks for the response. I took a snapshot and tried rebooting a VM and it booted back up without issue simply by clicking the propagate button on each affecting VM after reverting and running "secureboot-certs install"

flakpyro

@dinhngtu Ah okay, i was wondering if "d719b2cb-3d3a-4596-a3bc-dad00e67656f" indicated i was back on the safe certs since it is the same on all VMs since reverting and clicking "Copy the pools default UEFI Certificates to the VM"

So i need to run

varstore-rm f9166a11-3c3f-33f1-505c-542ce8e1764d d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx

while powered off to be safe?

flakpyro

@stormi

I reverted the package however i initially followed the directions provided by vates in the release blog post and ran "secureboot-certs clear" then on each VM with Secure boot enabled i clicked "Copy the pools default UEFI Certificates to the VM".

After reverting the updates and running secureboot-certs install again i went back and clicked "Copy the pools default UEFI Certificates to the VM" again thinking it would put the old certs back.

It sounds like this may not be enough and i need to remove the dbx record from each of these VMs. Am i correct or was that enough to fix these VMs?

Per the docs:

varstore-rm <vm-uuid> d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx

Note that the GUID may be found by using varstore-ls <vm-uuid>.

When i run the command i see

Exmaple:

varstore-ls f9166a11-3c3f-33f1-505c-542ce8e1764d
8be4df61-93ca-11d2-aa0d-00e098032b8c SecureBoot
8be4df61-93ca-11d2-aa0d-00e098032b8c DeployedMode
8be4df61-93ca-11d2-aa0d-00e098032b8c AuditMode
8be4df61-93ca-11d2-aa0d-00e098032b8c SetupMode
8be4df61-93ca-11d2-aa0d-00e098032b8c SignatureSupport
8be4df61-93ca-11d2-aa0d-00e098032b8c PK
8be4df61-93ca-11d2-aa0d-00e098032b8c KEK
d719b2cb-3d3a-4596-a3bc-dad00e67656f db
d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx
605dab50-e046-4300-abb6-3dd810dd8b23 SbatLevel
fab7e9e1-39dd-4f2b-8408-e20e906cb6de HDDP
e20939be-32d4-41be-a150-897f85d49829 MemoryOverwriteRequestControl
bb983ccf-151d-40e1-a07b-4a17be168292 MemoryOverwriteRequestControlLock
9d1947eb-09bb-4780-a3cd-bea956e0e056 PPIBuffer
9d1947eb-09bb-4780-a3cd-bea956e0e056 Tcg2PhysicalPresenceFlagsLock
eb704011-1402-11d3-8e77-00a0c969723b MTC
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0000
8be4df61-93ca-11d2-aa0d-00e098032b8c Timeout
8be4df61-93ca-11d2-aa0d-00e098032b8c Lang
8be4df61-93ca-11d2-aa0d-00e098032b8c PlatformLang
8be4df61-93ca-11d2-aa0d-00e098032b8c ConIn
8be4df61-93ca-11d2-aa0d-00e098032b8c ConOut
8be4df61-93ca-11d2-aa0d-00e098032b8c ErrOut
9d1947eb-09bb-4780-a3cd-bea956e0e056 Tcg2PhysicalPresenceFlags
8be4df61-93ca-11d2-aa0d-00e098032b8c Key0000
8be4df61-93ca-11d2-aa0d-00e098032b8c Key0001
5b446ed1-e30b-4faa-871a-3654eca36080 0050569B1890
937fe521-95ae-4d1a-8929-48bcd90ad31a 0050569B1890
9fb9a8a1-2f4a-43a6-889c-d0f7b6c47ad5 ClientId
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0003
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0004
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0005
4c19049f-4137-4dd3-9c10-8b97a83ffdfa MemoryTypeInformation
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0006
8be4df61-93ca-11d2-aa0d-00e098032b8c BootOrder
8c136d32-039a-4016-8bb4-9e985e62786f SecretKey
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0001
8be4df61-93ca-11d2-aa0d-00e098032b8c Boot0002

So the command would be:

varstore-rm f9166a11-3c3f-33f1-505c-542ce8e1764d d719b2cb-3d3a-4596-a3bc-dad00e67656f dbx correct? 

Does "d719b2cb-3d3a-4596-a3bc-dad00e67656f " indicate the old certs have been re-installed?

flakpyro

Updates 3 pools. All fine except for the last host on the last pool ran into a yum mirror error and failed. Manually running yum update and rebooting the host worked. Sadly i then had to move VMs back around since the RPU failed and the process of migrating VMs back did not kick off as a result.

Error was

One of the configured repositories failed (Unknown),
 and yum doesn't have enough cached data to continue. At this point the only
 safe thing yum can do is fail. There are a few ways to work "fix" this:

     1. Contact the upstream for the repository and get them to fix the problem.

     2. Reconfigure the baseurl/etc. for the repository, to point to a working
        upstream. This is most often useful if you are using a newer
        distribution release than is supported by the repository (and the
        packages for the previous distribution release still work).

     3. Run the command with the repository temporarily disabled
            yum --disablerepo=<repoid> ...

     4. Disable the repository permanently, so yum won't use it by default. Yum
        will then just ignore the repository until you permanently enable it
        again or use --enablerepo for temporary usage:

            yum-config-manager --disable <repoid>
        or
            subscription-manager repos --disable=<repoid>

     5. Configure the failing repository to be skipped, if it is unavailable.
        Note that yum will try to contact the repo. when it runs most commands,
        so will have to try and fail each time (and thus. yum will be be much
        slower). If it is a very temporary problem though, this is often a nice
        compromise:

            yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true

xcp-ng-base: Check uncompressed DB failed

flakpyro

@flakpyro

Best posts made by flakpyro

Latest posts made by flakpyro