XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. DustyArmstrong
    3. Posts
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 9
    • Posts 32
    • Groups 0

    Posts

    Recent Best Controversial
    • RE: Console keyboard problems using Firefox

      For anyone who comes across this, you can just add an exception for your management page and shift will work on the console.

      Settings > Privacy & Security > Enhanced Tracking Protection > Manage Exceptions > Add the site url e.g. https://xo.fqdn.com.

      posted in Xen Orchestra
      DustyArmstrongD
      DustyArmstrong
    • RE: Lots of "host.getMdadmHealth" Failure Logs

      Updated all my hosts but ended up with a bunch of stuck tasks for API host calls, didn't seem too healthy! It looks like they were stuck, kept seeing a host unhealthy power state repeatedly pop up and disappear.

      I opted to select all tasks and delete, same with my logs (I monitor externally anyway) which appears to have resolved this for the moment. I no longer see these mdadm logs being generated and everything appears normal.

      posted in Management
      DustyArmstrongD
      DustyArmstrong
    • RE: Lots of "host.getMdadmHealth" Failure Logs

      @stormi thanks for the reply, the output is (on both hosts):

      mdadm: cannot open /dev/md127

      I do have a 3rd host that does make use of a software RAID, but that also outputs nothing for /dev/md127.

      I am updating the hosts today so it's possible they're just so far behind.

      posted in Management
      DustyArmstrongD
      DustyArmstrong
    • Lots of "host.getMdadmHealth" Failure Logs

      I'm getting tons of Mdadm errors from Xen Orchestra, but not really sure why.

      host.getMdadmHealth
      {
        "id": "d2de9e76-ffbf-4640-9d68-43178c7c4006"
      }
      {
        "code": "-1",
        "params": [
          "Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1",
          "",
          "Traceback (most recent call last):
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 101, in wrapper
          return func(*args, **kwds)
        File \"/etc/xapi.d/plugins/raid.py\", line 21, in check_raid_pool
          result = run_command(['mdadm', '--detail', device])
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 70, in run_command
          raise subprocess.CalledProcessError(process.returncode, command, None)
      CalledProcessError: Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1
      "
        ],
        "task": {
          "uuid": "34429da6-56ee-9b5c-c465-b0493920b3f4",
          "name_label": "Async.host.call_plugin",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20250117T09:42:09Z",
          "finished": "20250117T09:42:09Z",
          "status": "failure",
          "resident_on": "OpaqueRef:f0015d71-0ac1-4a79-bf0d-3700f79ba394",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "-1",
            "Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1",
            "",
            "Traceback (most recent call last):
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 101, in wrapper
          return func(*args, **kwds)
        File \"/etc/xapi.d/plugins/raid.py\", line 21, in check_raid_pool
          result = run_command(['mdadm', '--detail', device])
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 70, in run_command
          raise subprocess.CalledProcessError(process.returncode, command, None)
      CalledProcessError: Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1
      "
          ],
          "other_config": {},
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename ocaml/xapi-client/client.ml)(line 7))((process xapi)(filename ocaml/xapi-client/client.ml)(line 19))((process xapi)(filename ocaml/xapi-client/client.ml)(line 8780))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
        },
        "message": "-1(Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1, , Traceback (most recent call last):
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 101, in wrapper
          return func(*args, **kwds)
        File \"/etc/xapi.d/plugins/raid.py\", line 21, in check_raid_pool
          result = run_command(['mdadm', '--detail', device])
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 70, in run_command
          raise subprocess.CalledProcessError(process.returncode, command, None)
      CalledProcessError: Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1
      )",
        "name": "XapiError",
        "stack": "XapiError: -1(Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1, , Traceback (most recent call last):
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 101, in wrapper
          return func(*args, **kwds)
        File \"/etc/xapi.d/plugins/raid.py\", line 21, in check_raid_pool
          result = run_command(['mdadm', '--detail', device])
        File \"/etc/xapi.d/plugins/xcpngutils/__init__.py\", line 70, in run_command
          raise subprocess.CalledProcessError(process.returncode, command, None)
      CalledProcessError: Command '['mdadm', '--detail', '/dev/md127']' returned non-zero exit status 1
      )
          at Function.wrap (file:///home/node/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)
          at default (file:///home/node/xen-orchestra/packages/xen-api/_getTaskResult.mjs:13:29)
          at Xapi._addRecordToCache (file:///home/node/xen-orchestra/packages/xen-api/index.mjs:1068:24)
          at file:///home/node/xen-orchestra/packages/xen-api/index.mjs:1102:14
          at Array.forEach (<anonymous>)
          at Xapi._processEvents (file:///home/node/xen-orchestra/packages/xen-api/index.mjs:1092:12)
          at Xapi._watchEvents (file:///home/node/xen-orchestra/packages/xen-api/index.mjs:1265:14)"
      }
      

      Neither host with ID 2de9e76-ffbf-4640-9d68-43178c7c4006 or f0015d71-0ac1-4a79-bf0d-3700f79ba394 are using a software RAID. It may be because I haven't updated the hosts in quite some time. There is no output on either host for cat /proc/mdstat.

      Is there a way I can just turn off this check?

      posted in Management
      DustyArmstrongD
      DustyArmstrong
    • RE: Backups (Config & VMs) Fail Following Updates

      An update, if anyone ever comes across this via search engine.

      Turns out it was my container's timezone. The image was set to pure UTC, no timezone, by default, so I believe when it was writing files to my network storage it introduced a discrepancy. My network share was recording the file metadata accurately to real-time, so I assume when it came time to do another backup, the file time XO expected was different, making it think it was "stale" or still being "held".

      Have now run both scheduled metadata and VM backups without any errors 😊.

      In summary: make sure your time, date and timezone are set correctly!

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • RE: Backups (Config & VMs) Fail Following Updates

      @magran17 thanks Mark.

      My config backup runs on a Tuesday and my VMs Friday night, so that happened last night. It did fail at first with the lockfile error as expected but then was successful on the retry. My concurrency is currently set to 2, I did have it on 1 originally but it doesn't seem to make a difference.

      I use Ronivay's image too, it seems to work but yeah just these random 3 errors that I can only get rid of by blowing away all my backups/schedules and starting the chain(s) again.

      I'm not really sure why it happens, I can only assume rebooting/updating causes some sort of cache to break in the way I have it set up. I am running it very unintended way (Raspi4 ARM64 using Binfmt emulation of x86) so I can't really expect perfection. It's slightly slow but it works super well other than this!

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • RE: Backups (Config & VMs) Fail Following Updates

      Update: this seems to happen every time I reboot the server or, in particular, update XO. I get the same 3 errors and have to rebuild my backup schedules from scratch each time. Once rebuilt they run perfectly until the next time I update. It may be because I run it in Docker, I'm not sure, but I'd love to understand what causes this and if there's any way to rectify without the rebuild. I don't really understand it and would appreciate any insight.

      I get the following 3 problems every time.

      EEXIST - this happens on my configuration backups.

      Error: EEXIST: file already exists, open '/run/xo-server/mounts/f5bb7b65-ddea-496b-b193-878f19ba137c/xo-config-backups/d166d7fa-5101-4aff-9e9d-11fb58ec1694/20240819T140003Z/data.json'
      

      ENOENT - this also happens on my configuration backups, on the same job.

      Error: ENOENT: no such file or directory, rmdir '/run/xo-server/mounts/f5bb7b65-ddea-496b-b193-878f19ba137c/xo-pool-metadata-backups/d166d7fa-5101-4aff-9e9d-11fb58ec1694/ff3e6fa0-6552-e96a-989c-fc8db748d984/20240729T140002Z'
      

      LOCKFILE HELD - This happens on my VM incremental backups. This log is from a prior run a while ago, but I expect my next run will do this as I rebooted.

      >> the writer IncrementalRemoteWriter has failed the step writer.beforeBackup() with error Lock file is already being held. It won't be used anymore in this job execution.
      Retry the VM backup due to an error
      the writer IncrementalRemoteWriter has failed the step writer.beforeBackup() with error Lock file is already being held. It won't be used anymore in this job execution.
      
      Start: 2024-06-29 01:01
      End: 2024-06-29 01:41
      Duration: 41 minutes
      Error: Lock file is already being held
      

      I only have one schedule for config and one schedule for VMs. The files for the config backup don't change, I don't reboot or anything mid-backup, but it seems to totally break the chain. For the VMs, I only have one backup schedule so there should never be another job running which has the lockfile held. Something about restarting the container causes an issue - it feels like something is being cached here but the cache isn't flushed on restart so it leaves some sort of zombified file(s) behind.

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert Thanks, don't worry in that case, was just to see if there was something like "oh yeah XCP does [something] with vUSBs when passing through which could explain it". The server is a mini PC so no PCIe card slots or capability unfortunately.

      I'll just live with 40ms via VirtualHere (don't know why that's so high either as others have 15-20 with that method)! It works well enough.

      posted in Compute
      DustyArmstrongD
      DustyArmstrong
    • RE: Google Coral TPU PCIe Passthrough Woes

      So I eventually got round to trying the USB Coral via passthrough, which worked great, but the TPU itself exhibited some behavior that made it nonviable which sucks. The USB was actually detected by XO as Google Inc. and Frigate actually loaded the TPU, but the inference speed was in excess of 180ms (it should be around 10, USB over IP it's 40). So it worked but, didn't.

      The normal procedure with a Coral is to run a make reset from their utilities which switches the TPU back to runtime mode. This worked under my current (and now reverted) system of VirtualHere USB over IP, but it didn't work when passed through.

      Output of make reset:

      dfu-util: Warning: Invalid DFU suffix signature
      dfu-util: A valid DFU suffix will be required in a future dfu-util release
      dfu-util: No DFU capable USB device available
      

      It should look like this:

      Opening DFU capable USB device...
      Device ID 1a6e:089a
      Device DFU version 0101
      Claiming USB DFU Interface...
      Setting Alternate Interface #0 ...
      Determining device status...
      DFU state(2) = dfuIDLE, status(0) = No error condition is present
      DFU mode device DFU version 0101
      Device returned transfer size 256
      Copying data from PC to DFU device
      Download	[=========================] 100%        10783 bytes
      Download done.
      DFU state(2) = dfuIDLE, status(0) = No error condition is present
      Done!
      Resetting USB to switch back to Run-Time mode
      

      Sorry to ping you @olivierlambert but would you happen to know what might cause this in XCP/XO? Is there something going on when the device is made into a vUSB that would cause it to error out/be inaccessible in DFU (I assume this means Device Firmware Update)?

      posted in Compute
      DustyArmstrongD
      DustyArmstrong
    • RE: Backups (Config & VMs) Fail Following Updates

      @magran17 Hey Mark, thanks for the info.

      I've also made those changes with concurrency to 2. I've rebuilt my backup from scratch so hoping it goes OK from here! My issue with double restore points was my own fault, I had an SMB and NFS remote running together (I had updated to NFS as it's much quicker).

      I'm not sure what will happen when I update the image again. Out of interest, which Docker image do you run? Ronivay or Ezka77?

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • Backups (Config & VMs) Fail Following Updates

      I am having an issue whereby any time I update XO my backups will start to fail for one of a few reasons. They generally fall into either:

      • ENOENT: no such file or directory

      • Lock file is already being held

      • EEXIST: file already exists

      The above, other than the lock file, all reference /run/xo-server/mounts followed by the relevant UUID paths. The lock file error returns the writer IncrementalRemoteWriter has failed the step writer.beforeBackup() with error Lock file is already being held. I am running XO in a Docker container via Docker Compose, but the run directory is not mounted as a volume. My backups are performed to an NFS remote.

      Is there a procedure to rectify this i.e. is a path being cached (that needs cleared) and the backups are referencing it but it's no longer present or has been modified after an update? Is there a procedure I should be doing pre/post update to account for this? I appreciate I may be running XO in a non-standard way and this may simply be a quirk of that, but it generally runs fine until I update. If I re-run the failed backup that does tend to be successful but it will then fail again on the next scheduled run.

      Also, to add, my backup retention is set to 3 but I am seeing 6 restore points. Seems each entry is duplicated (same size, date and time), both key and difference. I have verified on my remote that there are actually only 3.

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert Thanks that's good to know. That functionality would be great down the line!

      I do have a spare M.2 E-key on my XCP host running the VM Coral is needed for, but seems like I'd have trouble going by this thread. Might even have trouble with the USB Coral, it hasn't been much better so far in terms of whacky non-standard behavior...

      posted in Compute
      DustyArmstrongD
      DustyArmstrong
    • RE: XO Backups - Offline Storage Best Practices?

      @planedrop Not opposed to cloud of course, but it's a network with no internet!

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • RE: Google Coral TPU PCIe Passthrough Woes

      @olivierlambert Seems like a reasonable place to ask as any - I am currently using a USB Coral over IP (Virtualhere) but would rather load it into my VM directly - what's the current status of snapshots/backups with a vUSB?

      I've been reading that XO can now support disk exclusions with [NOBAK] but this probably doesn't apply to a Coral. Is an offline backup still the best available method?

      posted in Compute
      DustyArmstrongD
      DustyArmstrong
    • RE: XO Backups - Offline Storage Best Practices?

      Thanks for the replies, that gives me a good idea on things - appreciate it.

      Cloud isn't really an option for the environment but yes the drive is held separately in the event of a disaster.

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • XO Backups - Offline Storage Best Practices?

      Afternoon! I just have a quick query to see if what I'm doing is viable.

      I currently have a delta backup running and storing to a remote NAS, all seems to be working fine. I have started copying off the entire backup folder to an external drive once a week in the event of some catastrophic problem. This includes:

      xo-config-backups
      xo-pool-metadata-backups
      xo-vm-backups
      [somestring].dek
      xenserver - post drive fail.xbk
      xenserver - post fixes.xbk
      xenserver.xbk
      

      Essentially I am wondering, should all else fail, if I restore this folder to its original location (or could I place it anywhere and point XO to it?) and then restore the XO configuration from it (if needed), will this be treated as a valid repository I can restore VMs from? Will it simply take that folder as it was for that point in time and allow me to restore, or will it have issues as it's a "different" repository to what it might expect? If I am restoring the config from within this offline repository though, I guess that config would be in line with the deltas also in there, therefore it would be fine?

      I suppose I could run a second separate delta job once per week to an external drive that only gets plugged in when that job is due to run, and removed shortly after, should the above be inadvisable.

      Thank you!

      posted in Backup
      DustyArmstrongD
      DustyArmstrong
    • RE: Has REST API changed (Cannot GET backup logs)?

      @julien-f Thank you and thank you for the quick resolution, you guys rock.

      posted in REST API
      DustyArmstrongD
      DustyArmstrong
    • RE: Has REST API changed (Cannot GET backup logs)?

      @olivierlambert Yes but didn't see any changes for backup/logs which is where the issue seemed to arise from. I pull the status from each log entry not the job info itself.

      @julien-f awesome, thanks!

      posted in REST API
      DustyArmstrongD
      DustyArmstrong