XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. techjeff
    3. Posts
    T
    Offline
    • Profile
    • Following 1
    • Followers 1
    • Topics 9
    • Posts 52
    • Groups 0

    Posts

    Recent Best Controversial
    • RE: Invalid Health Check SR causes Bakup to fail with no error

      @olivierlambert - I'm not sure who would be the best person at Vates to ping or whether there is another channel I should be using to request enhancements. I'm happy to be directed to the correct place if that's not here.

      Despite the fact that I brought this upon myself... 😅 I do think that it would be nice if Xen Orchestra could improve the error handling/messaging for situations where a task fails due to an invalid object UUID. It seems like the UI is already making a simple XAPI call to lookup the name-label of the SR, which, upon failure results in the schedule where an invalid/unknown UUID is configured displaying the invalid/unknown UUID in Red text with a red triangle.

      posted in Backup
      T
      techjeff
    • RE: Default templates

      @irtaza9 happy to help!

      posted in Management
      T
      techjeff
    • RE: Default templates

      It's also not hard to "copy" a template from one pool to another. So if you create your "golden image" template, you can just copy that template to another pool.

      You can see the template Intangible Debian Bookworm 12 (Cloud Init)_2023-09-26T21:48:00.318Z that I originally created in my "performance" pool, then later when I set up my "efficiency" pool, I simply copied to an SR in my "efficiency" pool.

      Screenshot 2025-03-03 153911.png

      In order for a pool to utilize a template, the template needs to be within one of the shared SRs within that pool. Once it has been copied to an SR in the destination pool, that pool can now create new VMs using that template.

      posted in Management
      T
      techjeff
    • RE: Invalid Health Check SR causes Bakup to fail with no error

      @DustinB said in Invalid Health Check SR causes Bakup to fail with no error:

      but I hadn't made any changes to the shares or the underlying storage on that host so I really wasn't sure what could have caused it.
      

      But you did make a change to the pool, you

      Correct... And, I spoke somewhat ambiguously. I was using the term "host" in the generic sense to describe the TrueNAS Scale that was hosting my backup SRs not in the sense of a proper xcp-ng host. In retrospect, NAS would have been more appropriate.

      I have 2 TrueNASs, tns-01 and tns-02. tns-01 is the "primary" with solid state drives which hosts both the Old SR I had deleted and the new SR with which I replaced it. tns-02 is the "backup" with spinning drives and it hosts the SR where my backups are stored.

      My backup Jobs backup to the Remotes on tns-02, but I use the primary SR backed with solid state drives for restoring health checks because I don't want to wait all night.

      So I was confused because I hadn't modified any of the Remotes or shares or anything on tns-02, but because my backup jobs use the old SR that I had removed from tns-01, it failed and didn't give me much information to figure out why.

      If I wanted to externalize the responsibility, I would probably attribute it to the Health Check configuration being inside the schedule configuration which has always seemed not intuitive to me, though that might just be my brain 😁

      posted in Backup
      T
      techjeff
    • RE: Invalid Health Check SR causes Bakup to fail with no error

      @DustinB Yup, that's correct. I did it to myself! 😆 I overlooked that the SR I had removed was being utilized for restoring Health Check vms in that Backup job...and a few others too--yay homelab fun! lol

      Naturally, when I attempted to run the backup job it failed, presumably because it detected that the UUID of the Health Check SR was invalid / not in the database; however, the error I got was essentially a default or fallback without any context-specific details. This feels like XO attempted to run the job, detected that the UUID wasn't valid, but didn't have an specific error message to describe the exception or erroneous situation that was caught/encountered.

      I agree that it would be extra nice if the cautionary yellow triangle used to denote warnings elsewhere in the application could be used to denote a backup job with one or more "invalid" configuration entries.

      Also, my guess is that XAPI is unaware of the Health Checks beyond Xen Orchestra using discrete calls to facilitate the health check process, and if that's the case, the error is suspected by me to have be generated by Xen Orchestra. If that's the case, then I have hope that XO Devs could simply add an additional call to xe sr-list uuid={health-check-sr-uuuid} for example to validate that the SR does in fact exist.

      I do quality assurance testing and report bugs for a living, but I'm not familiar with this exact codebase, so my message is intended for illustrative, inspirational purposes.

      posted in Backup
      T
      techjeff
    • Invalid Health Check SR causes Bakup to fail with no error

      TL-DR - does your Health Check SR still exist? It turns out mine didn't!

      This a story about finding an unhandled edge case in the Xen Orchestra Backup [NG...? it's just the backup tool now and we don't call it "NG" anymore, right?] utility: When you delete the SR to which your backup job restores vms for Health Checks, it fails without much helpful information.

      On the latest commit to master:

      Screenshot from 2025-03-02 22-44-52.png

      So I recently moved my vm disks to a new SR, made the new SR the pool default, and removed the previous SR. Then I noticed that my backups were failing and I was getting no error message. It was quite strange. I decided to update my Xen Orchestra "community edition" (installed using the ronivay XenOrchestraInstallerUpdater tool) to the latest master commit, but the issue was still happening.

      Screenshot from 2025-03-02 21-55-55.png

      An example log from this evening before I solved the mystery:

      {
        "data": {
          "mode": "delta",
          "reportWhen": "failure"
        },
        "id": "1740978958075",
        "jobId": "9017a533-4a2a-42ad-9319-cba19247e062",
        "jobName": "Daily Delta Backup of step-ca at 7:05pm",
        "message": "backup",
        "scheduleId": "a2229c74-dc47-42f6-90fd-a86ef7e6529d",
        "start": 1740978958075,
        "status": "failure",
        "end": 1740978959190,
        "result": {}
      }
      

      And when I went to the log entry under the Settings menu in Xen Orchestra, I saw an empty error message and this text when I clicked the eyeball icon to display details:

      Screenshot from 2025-03-02 21-59-18.png

      Screenshot from 2025-03-02 21-55-32.png

      And it wasn't just one backup job either, over the course of the next day 3/4 backup jobs that all point to different shares on the same backup host were all failing -- but I hadn't made any changes to the shares or the underlying storage on that host so I really wasn't sure what could have caused it. Anyway it was the end of the weekend and time to go to bed.

      This is all in my homelab so it's not a big deal if i miss backups for a few days, I was doing this on the weekend near the end of February and I knew I was a few days before an update which is probably when a number of last-minute approved commits get merged, so i figured I would wait a few days for the dust to settle and it would sort itself out after I update again once the next official release at the end of the month.

      Just tonight I decided to update to the latest Xen Orchestra again, and my jobs still failed, like immediately with no error message. I did a bit of googling and found One of the backups fail with no error.

      After skimming through I noticed their reported results were really similar to mine, but I hadn't restored from a backup and I didn't know what was wrong. I figured it would be just as easy for me to follow the same advice given, however: to recreate the job.

      As I was referencing the schedule of the original job I noticed that the Health Check SR was in red text and just showed an unknown uuid which is when I realized that my backup jobs were still configured to restore Health Check VMs to the SR that I had destroyed and I had forgotten to update my jobs to restore to the new SR.

      Screenshot from 2025-03-02 21-46-21.png

      So, I am partially sharing a learning experience, and also report that the error handling for this situation ought to be improved.

      I have replicated this many times in my instance and I would be happy to provide any logs that might be useful beyond what I've already included.

      Anyway, thanks for making this awesome project open-source so people like me can tinker at home.

      posted in Backup
      T
      techjeff
    • RE: Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"

      @tjkreidl I'm using the "Backup" feature of Xen Orchestra that was previously called Backup-ng, IIRC. A person creates a backup job that determines the type of backup, i.e. Delta, Continuous Replication, etc. (Those are the old names, though the terminology is going to be changing with XO6/XOLite), the destination "remote" for the backup, either a discrete list of VMs to backup or "smart mode" which is dynamic based on pools to in/exclude and VM tags to in/exclude, and lastly a schedule which has an option to perform a health check (XO restores the backed up VM to the SR of your choice, waits for it to boot successfully, then deletes the restored VM since it was only temporary and not needed). The schedule displays the equivalent cron job syntax, but I'm not sure whether that is implemented by cron or if it's just displayed like that as a convenience.

      AFAIK, the backup tool is a higher-level abstraction built on top of XAPI, but with additional niceties, like health checks in this particular case.

      My two overlapping jobs are both using "smart mode" to determine the list of VMs to backup based on the tags assigned to the VMs and they both perform health checks. The first is a Delta backup that starts at midnight and usually completes fairly quickly, but sometimes it runs later than 2am when my other backup job starts (continuous Replication to the local storage of one of my xcp-ng hosts).

      The issue I'm encountering is that sometimes the second backup begins before the first is finished and sometimes a healthcheck VM is in the middle of booting which results in the second backup job including that healthcheck VM in the list of VMs that it needs to backup. Later, by the time the second backup gets around to actually backing up the healthcheck VM, that VM will have been deleted (the health check is complete), but the second backup job doesn't know that it was deleted, so when it starts making XAPI calls against that healthcheck VM's UUID, XAPI responds indicating that no VM exists with that UUID and XO reports the INVALID_UUID for that particular VM in the backup. Thankfully the backup job is smart enough to know that only that VM failed and it continues with the other VMs.

      posted in Backup
      T
      techjeff
    • RE: Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"

      Thanks for the suggestion, @tjkreidl.

      I'm not sure what commands I would run with this cron/systemd job/service.

      🤔

      I assume I would need to utilize the XO API calls to determine the list of running backups and then kill the second if the first is still running.. the issue I see with your suggestion is that my backup log would end up with many failures when I currently only get just one, if any.

      While this home-lab thing is a hobby and platform for learning, I have a feeling that your suggestion would require that I invest time into learning how to and then building a Rube Goldberg machine that would results in me becoming dependent upon it, or I could let the seemingly amenable devs work on my low-hanging suggested improvement to their relatively new feature: backup health checks. I suppose I could also look into submitting a pull-request 🤔

      Regardless, these backups don't hold anything critical per se; only the feeling of satisfaction I get from maintaining moderately resilient backups (I can't afford "3-2-1", but I can afford "2") and getting that sweet notification from my xcp-ng hosted internal mail server that the backup was successful. TBH, I could lose "everything" and not really lose anything because I still have the knowledge and experience and it would give me the excuse to practice settings things up again from scratch.

      Also, solutions like adding/upgrading hardware to speed up backups are not options at this point in life due to financial, electrical, and space limitations. As it stands, all of my hardware is 5-10+ years old, second-hand (probably 3rd, 4th or more in some cases--several pieces were donated to GoodWill they were so poorly valued several years ago), and I have only a single 20A 120V circuit breaker powering all lights and outlets in the upstairs of the apartment 😢 -- the joys of being an American millennial that graduated high school with little familial wealth just before the great recession that has never managed to get a degree 😑

      The neat thing is that these computers give me computational power and learning potential while heating our apartment instead of turning on a heater which only consumes money. I really need to move the computational heaters downstairs for more effective heating.. one of these days! 🤣

      TL;DR - After some consideration I don't think your suggestion fits my use case, but it did provide for a good thought experiment!

      posted in Backup
      T
      techjeff
    • RE: how to get syslog to remote to work?

      @djingo, FWIW, configuring with XO (from sources) has worked just fine for me. I actually just tested this the other day because I saw that my log server wasn't getting anything. When I looked at the pool settings I realized that I had made a typo. After I fixed it the logs started flowing.

      TL;DR, I think just setting the remote syslog host is enough, but I could be wrong.

      The host's logs might have some insights you can glean: https://xcp-ng.org/docs/troubleshooting.html#log-files

      posted in Xen Orchestra
      T
      techjeff
    • RE: Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"

      @florent thank you! Please let me know if you would any more information or further assistance from me. As of yet, the scenario I described is a just a theory as I wanted to get feedback about whether it is a reasonable hypothesis before attempting to conclusively replicate it.

      Also, I realized the other day that I had a typo in my remote syslog host address (for who knows how long--apparently I don't check my logs often which I'm calling a sign of reliable tools and setup 😂 ) so I don't have logs beyond the backup report which doesn't give much more information than the UUID of a VM that doesn't exist anymore..

      In any case, now that my logging is fixed, if I see this happen again, I'll try to gather more details and share them.

      posted in Backup
      T
      techjeff
    • RE: VMs migrated from xcp-ng-3 to xcp-ng-3 (the same host!)

      @pdonias thanks for the update! I took a look at the latest commit and apparently I was right even though it didn't feel right. 😂

      Thanks again to you all for this amazing set of FOSS tools!

      posted in Advanced features
      T
      techjeff
    • Overlapping backup schedules - healthcheck vms lead to "UUID_INVALID"

      Hello,

      I have two backup jobs that I attempted to offset to prevent them from running at the same time, but sometimes they take longer than others and they end up overlapping which causes problems when using "smart mode" to match VMs to backup by their tags.

      I've noticed that sometimes if health check VM from backup job A is being "restored" while the other is running then I will get the UUID_INVALID error for a single VM that doesn't exist and I suspect that backup job B is attempting to backup the healthcheck vm because it has matching tags, but then the healthcheck vm is deleted after the check is complete which triggers the error I'm seeing.

      Obviously, I could make efforts to avoid the two backup jobs running at the same time, but I'm hoping that there may be some sort of tag applied to a healthcheck VM that indicates that it is being used for a health-check which would allow me to configure the "smart mode" to exclude those VMs.

      If this isn't already feature, I would like to vote for it being added -- the tag could be something like xo-backup-healthcheck it would be fitting with similar tags.

      Any other advice or suggestions are appreciated as well.

      Thanks again!

      posted in Backup
      T
      techjeff
    • VMs migrated from xcp-ng-3 to xcp-ng-3 (the same host!)

      Hello. I don't have a bug to report per se, but more of a curious observation. I'm looking to see if there is an obvious explanation for what I saw that I'm not familiar with. Also not sure if this belongs here or in the XO category.

      I'm using XO built from sources (just updated to commit 3c047 this morning after the 5.87 Release).

      I have 3 xcp-ng 8.2.1 hosts, xcp-ng-1, xcp-ng-2, xcp-ng-3 (master). To save power, I have been only using xcp-ng-1 and xcp-ng-3 while keeping xcp-ng-2 powered off. I have several VMs with host-affinity set to xcp-ng-3 because it has slower, lower TDP CPUs and I want to reserve xcp-ng-1 for vms running a game servers as it has faster CPUs.

      Today I powered up xcp-ng-2 to perform a rolling pool update which went smoothly and was uneventful except that once the final host had updated, several VMs appeared to migrate from xcp-ng-3 to xcp-ng-3 (the same host, no typo) which seemed strange to me...

      I checked in several places--list of running vms, the VMs themselves, list of hosts, tasks, etc.--but they all consistently showed that the VMs were being migrated from xcp-ng-3 to xpc-ng-3 and the only machine that was "busy" (yellow/amber) was xcp-ng-3 which is the pool master.

      Perhaps the load balancer plugin decided that it should run after the rolling pool update to redistribute compute resources and some of the VMs it picked had host-affinity for the host they were already running on which resulted in them being migrated to the same host? This sounds silly to say haha -- I didn't think it was possible or that there would ever be a reason to migrate this way and I'm feeling a bit gaslit 🤣

      All jokes aside, I'm "reporting" this behavior in case it might indicative to devs of some greater issue, but mostly I'm curious why this would happen. Has anyone seen something like this before?

      Thanks in advance! 🙂

      EDIT Here's a screenshot of my task history that shows what I saw:
      Screenshot from 2023-09-29 13-23-00.png

      posted in Advanced features
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      @olivierlambert Thank you and your team again for your commitment to this fantastic FOSS tool and for allowing me to build it myself!

      I very much appreciate the personal touch of my issues being triaged by the CEO and Co-Founder. It's refreshing to see an executive officer stay in touch with their customer base.

      posted in Xen Orchestra
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      @florent the CR job was completed with health checks. The issue appears to be fixed in the fix_cr_healthcheck branch.

      posted in Xen Orchestra
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      @florent I reviewed the change you made and got slightly embarrassed that my brain didn't notice the missing "l" when I had previously reviewed MixinXapiWriter.mjs 😆

      After checking out the fix_cr_healthcheck branch and rebuilding, I restarted the backup of just one VM in the backup job and it was successful!

      Now, I will run the full backup job and report back for the sake of being thorough.

      posted in Xen Orchestra
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      @florent thank you for your effort as well!

      I will checkout that branch, attempt to restart the backup job for one of the small VMs, then I'll report back here, hopefully before the end of the day (UTC-7).

      posted in Xen Orchestra
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      Hi, @florent. The CR job is configured for "Normal" snapshots, not with memory.

      Also, for curiosity's sake, I decided to rebuild this same rule from scratch--the hypothesis was that the record of the original CR job could have been somehow "broken" or now incompatible. The backup job was rebuilt with exactly the same settings apart from "Rebuild of " being added to the beginning of the job and schedule names.

      Because it's a new CR backup of ~20 VMs to really cheap spinning disks, the backup is still in progress after 8 hours (another delta-backup job also started ~30 minutes after I started the new job), but so far it has yielded the same results: apart from the 2 VMs that are in progress, all other VMs have failed with the same TypeError.

      I'm currently running Xen Orchestra, commit abd0a built from sources using ronivay's XenOrchestraInstallerUpdater.

      posted in Xen Orchestra
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      Thank you for your assistance, @florent! Please let me know if you need additional information or want me to perform tests.

      posted in Xen Orchestra
      T
      techjeff
    • RE: Continuous Replication job fails "TypeError: Cannot read properties of undefined (reading 'uuid')" at #isAlreadyOnHealthCheckSr

      @florent I just updated to Xen Orchestra, commit 6b936, then attempted to restart the failed backup (only a single VM, not the whole job) and I am still getting the error.

      It looks like there was another commit about an hour ago, but it doesn't look to be related to this at all.

      Here's the full JSON error log:

      {
        "data": {
          "mode": "delta",
          "reportWhen": "failure"
        },
        "id": "1690565798795",
        "jobId": "7ee43819-d12c-416c-ad07-95dad15bc47d",
        "jobName": "Weekly continuous replication of core services and Work VMs to TrueNAS - NFS - HDDs-3.5in - VM Backups at 2am on Mondays",
        "message": "backup",
        "scheduleId": "d9076c80-5b4f-4f8e-ae59-9621ed2575c6",
        "start": 1690565798795,
        "status": "failure",
        "infos": [
          {
            "data": {
              "vms": [
                "69ed2d54-b28a-4ec5-195f-dee283f730cf"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "69ed2d54-b28a-4ec5-195f-dee283f730cf",
              "name_label": "mgmt-win10.jeff.intangible.home.arpa"
            },
            "id": "1690565809617",
            "message": "backup VM",
            "start": 1690565809617,
            "status": "failure",
            "tasks": [
              {
                "id": "1690565811948",
                "message": "snapshot",
                "start": 1690565811948,
                "status": "success",
                "end": 1690565818268,
                "result": "97927cb2-10fc-e0ff-5c6a-0e3a0b3e405d"
              },
              {
                "data": {
                  "id": "abbf0a1c-d358-3e0d-f697-94791671b9d9",
                  "isFull": false,
                  "name_label": "TrueNAS - NFS - HDDs-3.5in - VM Backups",
                  "type": "SR"
                },
                "id": "1690565818269",
                "message": "export",
                "start": 1690565818269,
                "status": "failure",
                "tasks": [
                  {
                    "id": "1690565821451",
                    "message": "transfer",
                    "start": 1690565821451,
                    "status": "success",
                    "end": 1690565843248,
                    "result": {
                      "size": 312320
                    }
                  },
                  {
                    "id": "1690565868762",
                    "message": "health check",
                    "start": 1690565868762,
                    "status": "failure",
                    "end": 1690565868794,
                    "result": {
                      "message": "Cannot read properties of undefined (reading 'uuid')",
                      "name": "TypeError",
                      "stack": "TypeError: Cannot read properties of undefined (reading 'uuid')\n    at #isAlreadyOnHealthCheckSr (file:///opt/xo/xo-builds/xen-orchestra-202307280948/@xen-orchestra/backups/_runners/_writers/_MixinXapiWriter.mjs:21:49)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///opt/xo/xo-builds/xen-orchestra-202307280948/@xen-orchestra/backups/_runners/_writers/_MixinXapiWriter.mjs:43:17"
                    }
                  }
                ],
                "end": 1690565868794,
                "result": {
                  "message": "Cannot read properties of undefined (reading 'uuid')",
                  "name": "TypeError",
                  "stack": "TypeError: Cannot read properties of undefined (reading 'uuid')\n    at #isAlreadyOnHealthCheckSr (file:///opt/xo/xo-builds/xen-orchestra-202307280948/@xen-orchestra/backups/_runners/_writers/_MixinXapiWriter.mjs:21:49)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///opt/xo/xo-builds/xen-orchestra-202307280948/@xen-orchestra/backups/_runners/_writers/_MixinXapiWriter.mjs:43:17"
                }
              }
            ],
            "end": 1690565868806,
            "result": {
              "message": "Cannot read properties of undefined (reading 'uuid')",
              "name": "TypeError",
              "stack": "TypeError: Cannot read properties of undefined (reading 'uuid')\n    at #isAlreadyOnHealthCheckSr (file:///opt/xo/xo-builds/xen-orchestra-202307280948/@xen-orchestra/backups/_runners/_writers/_MixinXapiWriter.mjs:21:49)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///opt/xo/xo-builds/xen-orchestra-202307280948/@xen-orchestra/backups/_runners/_writers/_MixinXapiWriter.mjs:43:17"
            }
          }
        ],
        "end": 1690565868807
      }
      
      posted in Xen Orchestra
      T
      techjeff