@olivierlambert Thank you for bringing this to the attention of other folks
Posts
-
RE: USB Passthrough has stopped working after update and updating usb-policy.conf
-
RE: USB Passthrough has stopped working after update and updating usb-policy.conf
@andriy.sultanov great, thank you for taking that on!
I must have skimmed over that part of the docs too quickly to notice the recommendation to run usb_scan.py.
I think it would be helpful to explain more verbosely how the order and specificity impact the final outcomes as it's not clear to me. It seems that if I make a more specific rule for my Yubikey before a more general rule that would block devices of the same class, it seems to work, but perhaps not in reverse?
I'm reminded of the apache Order directive where it can be AllowDeny or DenyAllow — my dyslexic brain has a hard time keeping track of binaries.
-
RE: USB Passthrough has stopped working after update and updating usb-policy.conf
tl;dr - empty line(s) in
/etc/xensource/usb-policy.conf
crashes/opt/xensource/libexec/usb_scan.py
I did a bit of scanning through the xapi source, in particular https://github.com/xapi-project/xen-api/blob/master/python3/libexec/usb_scan.py
I'm not a python expert, so I could generally follow the flow of things, but I wasn't totally sure what was happening at a detailed level. Then I did some googling and found this xenserver help doc regarding troubleshooting usb passthrough: https://support.citrix.com/external/article/235040/how-to-troubleshoot-xenserver-usb-passth.html
This article suggested running
/opt/xensource/libexec/usb_scan.py
with the -d parameter for additional details and that lead me to discover that the script fails when it encounters an empty line inusb-policy.conf
[23:46 xcp-ng-4 ~]# /opt/xensource/libexec/usb_scan.py -d Traceback (most recent call last): File "/opt/xensource/libexec/usb_scan.py", line 681, in <module> pusbs = make_pusbs_list(devices, interfaces) File "/opt/xensource/libexec/usb_scan.py", line 660, in make_pusbs_list policy = Policy() File "/opt/xensource/libexec/usb_scan.py", line 384, in __init__ self.parse_line(line) File "/opt/xensource/libexec/usb_scan.py", line 444, in parse_line if action.lower() == "allow": UnboundLocalError: local variable 'action' referenced before assignment
After removing all empty lines,
usb_scan.py
scanned output properly, but it was giving me an empty array[23:53 xcp-ng-4 ~]# /opt/xensource/libexec/usb_scan.py -d []
Since it as no longer crashing, I decided to go back to the default
usb-policy.conf
then try only adding my single allow rule without any extra lines and then test.cp /etc/xensource/usb-policy.conf.default /etc/xensource/usb-policy.conf
I inserted a new line at line 10 just above the first rule and added my allow rule:
ALLOW:vid=1050 pid=0407
and that was all that was needed! Now I can see the device after running
usb_scan.py
[23:53 xcp-ng-4 ~]# /opt/xensource/libexec/usb_scan.py -d [{"path": "2-1.1", "version": "2.00", "vendor-id": "1050", "product-id": "0407", "vendor-desc": "Yubico.com", "product-desc": "Yubikey 4 OTP+U2F+CCID", "speed": "12", "serial": "", "description": "Yubico.com_Yubikey 4 OTP+U2F+CCID"}]
I also learned that the last good output of
xe usb-scan
seems to be cached somewhere and is quietly returned without hesitation whenusb_policy.py
fails. Maybe it's logged somewhere, but I don't know.In any case, it was as simple as an empty line -- don't take anything for granted!
Lastly, I did a bit of testing to confirm that for my Yubikey to be detected and allowed, the allow rule must be BEFORE the rule that denies HID Boot Keyboards.
This results in detection:
# When you change this file, run 'xe pusb-scan' to confirm # the file can be parsed correctly. # # Syntax is an ordered list of case insensitive rules where # is line comment # and each rule is (ALLOW | DENY) : ( match )* # and each match is (class|subclass|prot|vid|pid|rel) = hex-number # Maximum hex value for class/subclass/prot is FF, and for vid/pid/rel is FFFF # # USB Hubs (class 09) are always denied, independently of the rules in this file DENY: vid=17e9 # All DisplayLink USB displays DENY: class=02 # Communications and CDC-Control ALLOW:vid=056a pid=0315 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=0314 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=00fb class=03 # Wacom DTU tablet # @jeff - allow passthrough of Yubikey 5 FIPS, "Yubikey 4 OTP+U2F+CCID" ALLOW:vid=1050 pid=0407 DENY: class=03 subclass=01 prot=01 # HID Boot keyboards DENY: class=03 subclass=01 prot=02 # HID Boot mice DENY: class=0a # CDC-Data DENY: class=0b # Smartcard DENY: class=e0 # Wireless controller DENY: class=ef subclass=04 # Miscellaneous network devices ALLOW: # Otherwise allow everything else
This does not:
# When you change this file, run 'xe pusb-scan' to confirm # the file can be parsed correctly. # # Syntax is an ordered list of case insensitive rules where # is line comment # and each rule is (ALLOW | DENY) : ( match )* # and each match is (class|subclass|prot|vid|pid|rel) = hex-number # Maximum hex value for class/subclass/prot is FF, and for vid/pid/rel is FFFF # # USB Hubs (class 09) are always denied, independently of the rules in this file DENY: vid=17e9 # All DisplayLink USB displays DENY: class=02 # Communications and CDC-Control ALLOW:vid=056a pid=0315 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=0314 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=00fb class=03 # Wacom DTU tablet DENY: class=03 subclass=01 prot=01 # HID Boot keyboards # @jeff - allow passthrough of Yubikey 5 FIPS, "Yubikey 4 OTP+U2F+CCID" ALLOW:vid=1050 pid=0407 DENY: class=03 subclass=01 prot=02 # HID Boot mice DENY: class=0a # CDC-Data DENY: class=0b # Smartcard DENY: class=e0 # Wireless controller DENY: class=ef subclass=04 # Miscellaneous network devices ALLOW: # Otherwise allow everything else
Thanks to everyone who took a look. Hopefully you don't get caught by this same gotcha!
-
RE: Xen Orchestra from Sources unreachable after applying XCPng Patch updates
For what it's worth, since you're building from sources, you might look into using a community tool that does most of the hard work for you:
- https://github.com/Jarli01/xenorchestra_installer
- https://github.com/ronivay/XenOrchestraInstallerUpdater (forked from above. I use this one in my homelab, personally)
These tools are given no official support from Vates as they are 3rd party tools, and as per Vates, you ought not be using the compiled from sources version in production, so YMMV..
IIRC Jarli01 is an active user on this forum but they have a different username here that I can't recall.
If nothing else, scripts like these make the process consistent which makes troubleshooting MUCH easier when things do go wrong.
Just a bit of food for thought.
Cheers -
RE: USB Passthrough has stopped working after update and updating usb-policy.conf
Does anyone have any suggestions for troubleshooting or investigating this further? I didn't really change my process that has worked with several updates but now things are behaving differently..
-
RE: USB Passthrough has stopped working after update and updating usb-policy.conf
@knightjoel Thanks for the suggestion. Since my original message, I've tried moving my allow rule to the top, before any Deny rules, after any deny rules, I even tried experimenting with commenting all of the deny rules to see if any of those would make a difference; unfortunately none of them made a difference.
I've tried simply saving the file then initiating a
xe pusb-scan
on the host, I also tried rebooting to see if that would have an effect, but it doesn't seem to. -
USB Passthrough has stopped working after update and updating usb-policy.conf
Hello,
As always, I am grateful for such an awesome open-source solution! Thank you all!
Asking regarding my home lab -- I have a VM whose disk is located on my master host's local storage and I've been using USB passthrough to pass a Yubikey to the VM.
I know that in the past, after an update, I often had to update
/etc/xensource/usb-policy.conf
to allow my YubiKey, reboot the host, then runxe pusb-scan host-uuid=<host-uuid>
before I could configure passthrough of my yubikey.I installed updates the other day, but as the weather was hot and I wasn't using my homelab, I shut everything down before any of my TLS certificates expired, so I didn't notice that the
usb-policy.conf
file had been overwritten by the update (I know that this is to be expected).Today, I booted up my system and noticed the
usb-policy.conf
file had overwritten during the update, so I backed up/etc/xensource/usb-policy.conf
, then added the line plus comments that has historically allowed my YubiKey to be passed through.Original
/etc/xensource/usb-policy.conf
:# When you change this file, run 'xe pusb-scan' to confirm # the file can be parsed correctly. # # Syntax is an ordered list of case insensitive rules where # is line comment # and each rule is (ALLOW | DENY) : ( match )* # and each match is (class|subclass|prot|vid|pid|rel) = hex-number # Maximum hex value for class/subclass/prot is FF, and for vid/pid/rel is FFFF # # USB Hubs (class 09) are always denied, independently of the rules in this file DENY: vid=17e9 # All DisplayLink USB displays DENY: class=02 # Communications and CDC-Control ALLOW:vid=056a pid=0315 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=0314 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=00fb class=03 # Wacom DTU tablet DENY: class=03 subclass=01 prot=01 # HID Boot keyboards DENY: class=03 subclass=01 prot=02 # HID Boot mice DENY: class=0a # CDC-Data DENY: class=0b # Smartcard DENY: class=e0 # Wireless controller DENY: class=ef subclass=04 # Miscellaneous network devices ALLOW: # Otherwise allow everything else
Updated
/etc/xensource/usb-policy.conf
# When you change this file, run 'xe pusb-scan' to confirm # the file can be parsed correctly. # # Syntax is an ordered list of case insensitive rules where # is line comment # and each rule is (ALLOW | DENY) : ( match )* # and each match is (class|subclass|prot|vid|pid|rel) = hex-number # Maximum hex value for class/subclass/prot is FF, and for vid/pid/rel is FFFF # # USB Hubs (class 09) are always denied, independently of the rules in this file DENY: vid=17e9 # All DisplayLink USB displays DENY: class=02 # Communications and CDC-Control ALLOW:vid=056a pid=0315 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=0314 class=03 # Wacom Intuos tablet ALLOW:vid=056a pid=00fb class=03 # Wacom DTU tablet DENY: class=03 subclass=01 prot=01 # HID Boot keyboards DENY: class=03 subclass=01 prot=02 # HID Boot mice DENY: class=0a # CDC-Data DENY: class=0b # Smartcard DENY: class=e0 # Wireless controller DENY: class=ef subclass=04 # Miscellaneous network devices ### Jeff # YubiKey 5 FIPS Series PID 0x0407 - YubiKey OTP+FIDO+CCID ALLOW: VID=1050 PID=0407 ALLOW: # Otherwise allow everything else
Today, however, the YubiKey won't show up as a PUSB device when viewing the host's advanced tab, nor is it in the list of available devices when I attempt to create a VUSB for my VM.
I have rebooted the system, I have run
xe pusb-scan host-uuid=...
for the appropriate host, I have physically disconnected and reconnected the YubiKey, I have powered down the host, then powered back on, but runningxe pusb-list
doesn't show the yubikey and I can't select it for passthrough.When I run
lsusb
I do see the YubiKey listed (thoigh it detects it as a Yubikey 4 series instead of 5 series. Can't recall whether that's consistent with past behavior)[14:03 xcp-ng-4 ~]# lsusb Bus 002 Device 004: ID 1050:0407 Yubico.com Yubikey 4 OTP+U2F+CCID Bus 002 Device 005: ID 413c:2113 Dell Computer Corp. Bus 002 Device 003: ID 0557:8021 ATEN International Co., Ltd Hub Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0557:2221 ATEN International Co., Ltd Winbond Hermon Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub [14:03 xcp-ng-4 ~]#
Is there anything I'm missing? Any suggestions for where to look or what to check?
Thank you!
-
RE: Invalid Health Check SR causes Bakup to fail with no error
@olivierlambert - I'm not sure who would be the best person at Vates to ping or whether there is another channel I should be using to request enhancements. I'm happy to be directed to the correct place if that's not here.
Despite the fact that I brought this upon myself...
I do think that it would be nice if Xen Orchestra could improve the error handling/messaging for situations where a task fails due to an invalid object UUID. It seems like the UI is already making a simple XAPI call to lookup the name-label of the SR, which, upon failure results in the schedule where an invalid/unknown UUID is configured displaying the invalid/unknown UUID in Red text with a red triangle.
-
RE: Default templates
It's also not hard to "copy" a template from one pool to another. So if you create your "golden image" template, you can just copy that template to another pool.
You can see the template
Intangible Debian Bookworm 12 (Cloud Init)_2023-09-26T21:48:00.318Z
that I originally created in my "performance" pool, then later when I set up my "efficiency" pool, I simply copied to an SR in my "efficiency" pool.In order for a pool to utilize a template, the template needs to be within one of the shared SRs within that pool. Once it has been copied to an SR in the destination pool, that pool can now create new VMs using that template.
-
RE: Invalid Health Check SR causes Bakup to fail with no error
@DustinB said in Invalid Health Check SR causes Bakup to fail with no error:
but I hadn't made any changes to the shares or the underlying storage on that host so I really wasn't sure what could have caused it.
But you did make a change to the pool, you
Correct... And, I spoke somewhat ambiguously. I was using the term "host" in the generic sense to describe the TrueNAS Scale that was hosting my backup SRs not in the sense of a proper xcp-ng host. In retrospect, NAS would have been more appropriate.
I have 2 TrueNASs, tns-01 and tns-02. tns-01 is the "primary" with solid state drives which hosts both the Old SR I had deleted and the new SR with which I replaced it. tns-02 is the "backup" with spinning drives and it hosts the SR where my backups are stored.
My backup Jobs backup to the Remotes on tns-02, but I use the primary SR backed with solid state drives for restoring health checks because I don't want to wait all night.
So I was confused because I hadn't modified any of the Remotes or shares or anything on tns-02, but because my backup jobs use the old SR that I had removed from tns-01, it failed and didn't give me much information to figure out why.
If I wanted to externalize the responsibility, I would probably attribute it to the Health Check configuration being inside the schedule configuration which has always seemed not intuitive to me, though that might just be my brain
-
RE: Invalid Health Check SR causes Bakup to fail with no error
@DustinB Yup, that's correct. I did it to myself!
I overlooked that the SR I had removed was being utilized for restoring Health Check vms in that Backup job...and a few others too--yay homelab fun! lol
Naturally, when I attempted to run the backup job it failed, presumably because it detected that the UUID of the Health Check SR was invalid / not in the database; however, the error I got was essentially a default or fallback without any context-specific details. This feels like XO attempted to run the job, detected that the UUID wasn't valid, but didn't have an specific error message to describe the exception or erroneous situation that was caught/encountered.
I agree that it would be extra nice if the cautionary yellow triangle used to denote warnings elsewhere in the application could be used to denote a backup job with one or more "invalid" configuration entries.
Also, my guess is that XAPI is unaware of the Health Checks beyond Xen Orchestra using discrete calls to facilitate the health check process, and if that's the case, the error is suspected by me to have be generated by Xen Orchestra. If that's the case, then I have hope that XO Devs could simply add an additional call to
xe sr-list uuid={health-check-sr-uuuid}
for example to validate that the SR does in fact exist.I do quality assurance testing and report bugs for a living, but I'm not familiar with this exact codebase, so my message is intended for illustrative, inspirational purposes.
-
Invalid Health Check SR causes Bakup to fail with no error
TL-DR - does your Health Check SR still exist? It turns out mine didn't!
This a story about finding an unhandled edge case in the Xen Orchestra Backup [NG...? it's just the backup tool now and we don't call it "NG" anymore, right?] utility: When you delete the SR to which your backup job restores vms for Health Checks, it fails without much helpful information.
On the latest commit to master:
So I recently moved my vm disks to a new SR, made the new SR the pool default, and removed the previous SR. Then I noticed that my backups were failing and I was getting no error message. It was quite strange. I decided to update my Xen Orchestra "community edition" (installed using the ronivay XenOrchestraInstallerUpdater tool) to the latest master commit, but the issue was still happening.
An example log from this evening before I solved the mystery:
{ "data": { "mode": "delta", "reportWhen": "failure" }, "id": "1740978958075", "jobId": "9017a533-4a2a-42ad-9319-cba19247e062", "jobName": "Daily Delta Backup of step-ca at 7:05pm", "message": "backup", "scheduleId": "a2229c74-dc47-42f6-90fd-a86ef7e6529d", "start": 1740978958075, "status": "failure", "end": 1740978959190, "result": {} }
And when I went to the log entry under the Settings menu in Xen Orchestra, I saw an empty error message and this text when I clicked the eyeball icon to display details:
And it wasn't just one backup job either, over the course of the next day 3/4 backup jobs that all point to different shares on the same backup host were all failing -- but I hadn't made any changes to the shares or the underlying storage on that host so I really wasn't sure what could have caused it. Anyway it was the end of the weekend and time to go to bed.
This is all in my homelab so it's not a big deal if i miss backups for a few days, I was doing this on the weekend near the end of February and I knew I was a few days before an update which is probably when a number of last-minute approved commits get merged, so i figured I would wait a few days for the dust to settle and it would sort itself out after I update again once the next official release at the end of the month.
Just tonight I decided to update to the latest Xen Orchestra again, and my jobs still failed, like immediately with no error message. I did a bit of googling and found One of the backups fail with no error.
After skimming through I noticed their reported results were really similar to mine, but I hadn't restored from a backup and I didn't know what was wrong. I figured it would be just as easy for me to follow the same advice given, however: to recreate the job.
As I was referencing the schedule of the original job I noticed that the Health Check SR was in red text and just showed an unknown uuid which is when I realized that my backup jobs were still configured to restore Health Check VMs to the SR that I had destroyed and I had forgotten to update my jobs to restore to the new SR.
So, I am partially sharing a learning experience, and also report that the error handling for this situation ought to be improved.
I have replicated this many times in my instance and I would be happy to provide any logs that might be useful beyond what I've already included.
Anyway, thanks for making this awesome project open-source so people like me can tinker at home.
-
RE: Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"
@tjkreidl I'm using the "Backup" feature of Xen Orchestra that was previously called Backup-ng, IIRC. A person creates a backup job that determines the type of backup, i.e. Delta, Continuous Replication, etc. (Those are the old names, though the terminology is going to be changing with XO6/XOLite), the destination "remote" for the backup, either a discrete list of VMs to backup or "smart mode" which is dynamic based on pools to in/exclude and VM tags to in/exclude, and lastly a schedule which has an option to perform a health check (XO restores the backed up VM to the SR of your choice, waits for it to boot successfully, then deletes the restored VM since it was only temporary and not needed). The schedule displays the equivalent cron job syntax, but I'm not sure whether that is implemented by cron or if it's just displayed like that as a convenience.
AFAIK, the backup tool is a higher-level abstraction built on top of XAPI, but with additional niceties, like health checks in this particular case.
My two overlapping jobs are both using "smart mode" to determine the list of VMs to backup based on the tags assigned to the VMs and they both perform health checks. The first is a Delta backup that starts at midnight and usually completes fairly quickly, but sometimes it runs later than 2am when my other backup job starts (continuous Replication to the local storage of one of my xcp-ng hosts).
The issue I'm encountering is that sometimes the second backup begins before the first is finished and sometimes a healthcheck VM is in the middle of booting which results in the second backup job including that healthcheck VM in the list of VMs that it needs to backup. Later, by the time the second backup gets around to actually backing up the healthcheck VM, that VM will have been deleted (the health check is complete), but the second backup job doesn't know that it was deleted, so when it starts making XAPI calls against that healthcheck VM's UUID, XAPI responds indicating that no VM exists with that UUID and XO reports the
INVALID_UUID
for that particular VM in the backup. Thankfully the backup job is smart enough to know that only that VM failed and it continues with the other VMs. -
RE: Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"
Thanks for the suggestion, @tjkreidl.
I'm not sure what commands I would run with this cron/systemd job/service.
I assume I would need to utilize the XO API calls to determine the list of running backups and then kill the second if the first is still running.. the issue I see with your suggestion is that my backup log would end up with many failures when I currently only get just one, if any.
While this home-lab thing is a hobby and platform for learning, I have a feeling that your suggestion would require that I invest time into learning how to and then building a Rube Goldberg machine that would results in me becoming dependent upon it, or I could let the seemingly amenable devs work on my low-hanging suggested improvement to their relatively new feature: backup health checks. I suppose I could also look into submitting a pull-request
Regardless, these backups don't hold anything critical per se; only the feeling of satisfaction I get from maintaining moderately resilient backups (I can't afford "3-2-1", but I can afford "2") and getting that sweet notification from my xcp-ng hosted internal mail server that the backup was successful. TBH, I could lose "everything" and not really lose anything because I still have the knowledge and experience and it would give me the excuse to practice settings things up again from scratch.
Also, solutions like adding/upgrading hardware to speed up backups are not options at this point in life due to financial, electrical, and space limitations. As it stands, all of my hardware is 5-10+ years old, second-hand (probably 3rd, 4th or more in some cases--several pieces were donated to GoodWill they were so poorly valued several years ago), and I have only a single 20A 120V circuit breaker powering all lights and outlets in the upstairs of the apartment
-- the joys of being an American millennial that graduated high school with little familial wealth just before the great recession that has never managed to get a degree
The neat thing is that these computers give me computational power and learning potential while heating our apartment instead of turning on a heater which only consumes money. I really need to move the computational heaters downstairs for more effective heating.. one of these days!
TL;DR - After some consideration I don't think your suggestion fits my use case, but it did provide for a good thought experiment!
-
RE: how to get syslog to remote to work?
@djingo, FWIW, configuring with XO (from sources) has worked just fine for me. I actually just tested this the other day because I saw that my log server wasn't getting anything. When I looked at the pool settings I realized that I had made a typo. After I fixed it the logs started flowing.
TL;DR, I think just setting the remote syslog host is enough, but I could be wrong.
The host's logs might have some insights you can glean: https://xcp-ng.org/docs/troubleshooting.html#log-files
-
RE: Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"
@florent thank you! Please let me know if you would any more information or further assistance from me. As of yet, the scenario I described is a just a theory as I wanted to get feedback about whether it is a reasonable hypothesis before attempting to conclusively replicate it.
Also, I realized the other day that I had a typo in my remote syslog host address (for who knows how long--apparently I don't check my logs often which I'm calling a sign of reliable tools and setup
) so I don't have logs beyond the backup report which doesn't give much more information than the UUID of a VM that doesn't exist anymore..
In any case, now that my logging is fixed, if I see this happen again, I'll try to gather more details and share them.
-
RE: VMs migrated from xcp-ng-3 to xcp-ng-3 (the same host!)
@pdonias thanks for the update! I took a look at the latest commit and apparently I was right even though it didn't feel right.
Thanks again to you all for this amazing set of FOSS tools!
-
Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"
Hello,
I have two backup jobs that I attempted to offset to prevent them from running at the same time, but sometimes they take longer than others and they end up overlapping which causes problems when using "smart mode" to match VMs to backup by their tags.
I've noticed that sometimes if health check VM from backup job A is being "restored" while the other is running then I will get the
UUID_INVALID
error for a single VM that doesn't exist and I suspect that backup job B is attempting to backup the healthcheck vm because it has matching tags, but then the healthcheck vm is deleted after the check is complete which triggers the error I'm seeing.Obviously, I could make efforts to avoid the two backup jobs running at the same time, but I'm hoping that there may be some sort of tag applied to a healthcheck VM that indicates that it is being used for a health-check which would allow me to configure the "smart mode" to exclude those VMs.
If this isn't already feature, I would like to vote for it being added -- the tag could be something like
xo-backup-healthcheck
it would be fitting with similar tags.Any other advice or suggestions are appreciated as well.
Thanks again!
-
VMs migrated from xcp-ng-3 to xcp-ng-3 (the same host!)
Hello. I don't have a bug to report per se, but more of a curious observation. I'm looking to see if there is an obvious explanation for what I saw that I'm not familiar with. Also not sure if this belongs here or in the XO category.
I'm using XO built from sources (just updated to commit 3c047 this morning after the 5.87 Release).
I have 3 xcp-ng 8.2.1 hosts, xcp-ng-1, xcp-ng-2, xcp-ng-3 (master). To save power, I have been only using xcp-ng-1 and xcp-ng-3 while keeping xcp-ng-2 powered off. I have several VMs with host-affinity set to xcp-ng-3 because it has slower, lower TDP CPUs and I want to reserve xcp-ng-1 for vms running a game servers as it has faster CPUs.
Today I powered up xcp-ng-2 to perform a rolling pool update which went smoothly and was uneventful except that once the final host had updated, several VMs appeared to migrate from xcp-ng-3 to xcp-ng-3 (the same host, no typo) which seemed strange to me...
I checked in several places--list of running vms, the VMs themselves, list of hosts, tasks, etc.--but they all consistently showed that the VMs were being migrated from xcp-ng-3 to xpc-ng-3 and the only machine that was "busy" (yellow/amber) was xcp-ng-3 which is the pool master.
Perhaps the load balancer plugin decided that it should run after the rolling pool update to redistribute compute resources and some of the VMs it picked had host-affinity for the host they were already running on which resulted in them being migrated to the same host? This sounds silly to say haha -- I didn't think it was possible or that there would ever be a reason to migrate this way and I'm feeling a bit gaslit
All jokes aside, I'm "reporting" this behavior in case it might indicative to devs of some greater issue, but mostly I'm curious why this would happen. Has anyone seen something like this before?
Thanks in advance!
EDIT Here's a screenshot of my task history that shows what I saw: