XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. peo
    P
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 6
    • Posts 42
    • Groups 0

    peo

    @peo

    0
    Reputation
    1
    Profile views
    42
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    peo Unfollow Follow

    Latest posts made by peo

    • RE: Misleading messages during restore from backup

      @olivierlambert Thanks, that explains the "on .." part of the message, and as you mention, it's not a "problem", but just confusing in the context of that full message.
      The optimal would of course be that the task was launched on the host that is "closest" to the SR, in this case a local SR on xcp-ng-2.

      This one (unrelated list of previous tasks) makes the message more clear what the "on.." part is about:

      xo-tasks.png

      posted in Backup
      P
      peo
    • RE: Misleading messages during restore from backup

      @olivierlambert but the "importing content" message is incorrect:
      "on SR Local h2 SSD new (on xcp-ng-1)"

      That's the problem (not to where it's restored to, and has to be run from because it's a local SR)..

      In the message, what do "xcp-ng-1" have to do with this operation ? It's not involved at all..

      xo-local-SRs.png

      posted in Backup
      P
      peo
    • RE: Misleading messages during restore from backup

      @olivierlambert I can make it even more confusing 🙂

      But first, you're correct about me misnaming the BR as "Remote SR".

      Just found out that the machine I backed up is NOT running on xcp-ng-1, but on xcp-ng-3 (I assumed it was xcp-ng-1 because of the message):

      xo-misleading-restore-source-machine.png

      posted in Backup
      P
      peo
    • RE: Misleading messages during restore from backup

      @olivierlambert The source machine for the backup is on local SR on 'xcp-ng-1'. I'm restoring it to a local SR on 'xcp-ng-2' (the backup is read from a remote SR)

      posted in Backup
      P
      peo
    • Misleading messages during restore from backup

      Hi,

      I have seen this before, but not reported it. During a restore of a backup of a VM, to another host, the message during restore is a bit misleading.
      I'm restoring from the remote named "xcp-ng-appservers" to (as indicated by the ongoing status) "Local h2 SSD new". The misleading part is "(on xcp-ng-1)", as I restore it to a SR on "xcp-ng-2" (and will run it from there).

      Maybe a simple fix would be to just make the meaning of the "on" a bit clearer, or just remove it (the backed up machine runs on 'xcp-ng-1', and I will have my duplicate from the latest backup started on 'xcp-ng-2')

      xo-misleading-restore-message.png

      posted in Backup
      P
      peo
    • RE: Backups started to fail again (overall status: failure, but both snapshot and transfer returns success)

      @olivierlambert no, and all VMs were working at the time before I rebooted the two hosts (not the third one, since that didn't have problem accessing /run/sr-mount/)

      I understand that 'df' will lock up if a NFS or SMB share does not respond, but ls the /run/sr-mount/ (without trying to access a subfolder) should have no reason to lock up (unless /run/sr-mount is not a ordinary folder, which it seems to be)

      posted in Backup
      P
      peo
    • RE: Backups started to fail again (overall status: failure, but both snapshot and transfer returns success)

      @olivierlambert I found a "solution" to the problem, by just rebooting the two involved hosts, but this might still be an issue somewhere (XO or even xcp-ng):

      At the time I started up the hosts after the power failure, the dependencies had already been started a long time before (mainly my internet connectivity and the NAS which holds one of the SRs). All three hosts have their local 2TB SSD as well for different purposes (faster disk access, temporary storage and replication from other hosts).

      I actually forgot to connect the network cable (unplugged because I reorganized the cables to the switch at the same time) to the third host (not involved in these recent problems) and found out that it seemed like it didn't start up properly (or at least, I did not get any video output from it when I was going to check its status after connecting the network cable), so I gave that one a hard reboot and got it up and running.

      Machines with their disks on the local SSDs of the two other hosts have worked fine since I powered them up, so what follows (and the replication issue) was not expected at all:

      Lock up on 'df' and 'ls /run/sr-mount/':

      [11:21 xcp-ng-1 ~]# df -h
      ^C
      [11:21 xcp-ng-1 ~]# ^C
      
      [11:21 xcp-ng-1 ~]# ls /run/sr-mount/
      ^C
      [11:22 xcp-ng-1 ~]# ls /run/
      

      ('ls /run/' worked fine)

      According to XO the disks were accessible and their content showed up as usual.

      posted in Backup
      P
      peo
    • RE: Backups started to fail again (overall status: failure, but both snapshot and transfer returns success)

      Since yesterday, even the replication jobs started to fail (I'm again 12 commits behind the current version, but other scheduled jobs continued to fail when I was up to date with XO)

      The replication is set to run from one host and store on the SSD on another. I had a power failure yesterday, but both hosts needed for this job (xcp-ng-1 and xcp-ng-2) was back up and running at the time the job was started.

      {
        "data": {
          "mode": "delta",
          "reportWhen": "failure"
        },
        "id": "1753705802804",
        "jobId": "0bb53ced-4d52-40a9-8b14-7cd1fa2b30fe",
        "jobName": "Admin Ubuntu 24",
        "message": "backup",
        "scheduleId": "69a05a67-c43b-4d23-b1e8-ada77c70ccc4",
        "start": 1753705802804,
        "status": "failure",
        "infos": [
          {
            "data": {
              "vms": [
                "1728e876-5644-2169-6c62-c764bd8b6bdf"
              ]
            },
            "message": "vms"
          }
        ],
        "tasks": [
          {
            "data": {
              "type": "VM",
              "id": "1728e876-5644-2169-6c62-c764bd8b6bdf",
              "name_label": "Admin Ubuntu 24"
            },
            "id": "1753705804503",
            "message": "backup VM",
            "start": 1753705804503,
            "status": "failure",
            "tasks": [
              {
                "id": "1753705804984",
                "message": "snapshot",
                "start": 1753705804984,
                "status": "success",
                "end": 1753712867640,
                "result": "4afbdcd9-818f-9e3d-555a-ad0943081c3f"
              },
              {
                "data": {
                  "id": "46f9b5ee-c937-ff71-29b1-520ba0546675",
                  "isFull": false,
                  "name_label": "Local h2 SSD",
                  "type": "SR"
                },
                "id": "1753712867640:0",
                "message": "export",
                "start": 1753712867640,
                "status": "interrupted"
              }
            ],
            "infos": [
              {
                "message": "will delete snapshot data"
              },
              {
                "data": {
                  "vdiRef": "OpaqueRef:c2504c79-d422-3f0a-d292-169d431e5aee"
                },
                "message": "Snapshot data has been deleted"
              }
            ],
            "end": 1753717484618,
            "result": {
              "name": "BodyTimeoutError",
              "code": "UND_ERR_BODY_TIMEOUT",
              "message": "Body Timeout Error",
              "stack": "BodyTimeoutError: Body Timeout Error\n    at FastTimer.onParserTimeout [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202507262229/node_modules/undici/lib/dispatcher/client-h1.js:646:28)\n    at Timeout.onTick [as _onTimeout] (/opt/xo/xo-builds/xen-orchestra-202507262229/node_modules/undici/lib/util/timers.js:162:13)\n    at listOnTimeout (node:internal/timers:588:17)\n    at process.processTimers (node:internal/timers:523:7)"
            }
          }
        ],
        "end": 1753717484619
      }
      

      Also, the replication job for my Debian XO machine fails with the same 'timeout' problem.

      posted in Backup
      P
      peo
    • RE: Backups started to fail again (overall status: failure, but both snapshot and transfer returns success)

      Since I updated 'everything' involved yesterday, the problems remain (this night's backups failed with the similar problem). As I'm again 6 commits behind the current version, I cannot create a useful bug report, so I'll just update and wait for the next scheduled backups to run (nothing the night towards Thursday, the next sequence will run at the night towards Friday)

      posted in Backup
      P
      peo
    • RE: Backups started to fail again (overall status: failure, but both snapshot and transfer returns success)

      @DustinB said in Backups started to fail again (overall status: failure, but both snapshot and transfer returns success):

      @peo said in Backups started to fail again (overall status: failure, but both snapshot and transfer returns success):

      @olivierlambert Thanks, will update every machine and XO involved in the backup process, and possibly even the individual machines that fails. First failure on vm-cleanup was 15 July, that's a few days before I patched the hosts (as a part of troubleshooting and preventing further failures). Still these backups will (probably) be fully restorable (as I have tested out with the always-failing Docker vm)

      So you patch your host, but not the administrative tools for the hosts?

      Seems a little cart before the horse there, no?

      That's a fault-finding procedure: do not patch everything at once (but now I did, after finding out that patching the hosts did not solve the problem)

      posted in Backup
      P
      peo