XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    CBT: the thread to centralize your feedback

    Scheduled Pinned Locked Moved Backup
    439 Posts 37 Posters 386.6k Views 29 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • olivierlambertO Offline
      olivierlambert Vates 🪐 Co-Founder CEO @flakpyro
      last edited by

      @flakpyro Don't you have a small download button for the backup log to get the entire log?

      F 1 Reply Last reply Reply Quote 0
      • F Offline
        flakpyro @olivierlambert
        last edited by

        @olivierlambert My bad i see the button to download the log now!

        Here is the log: https://drive.google.com/file/d/1OO8Rs6pST-W0qplpVu-E5lafYFXzhJA2/view?usp=drive_link

        1 Reply Last reply Reply Quote 0
        • S Offline
          SylvainB
          last edited by SylvainB

          Hi,

          Same here, I updated XOA to 5.98 and I have this error

          "can't create a stream from a metadata VDI, fall back to a base" on some VM

          I have an active support contract.

          Here the detailed log

          {
                "data": {
                  "type": "VM",
                  "id": "96cfde06-61c0-0f3e-cf6d-f637d41cc8c6",
                  "name_label": "blabla_VM"
                },
                "id": "1725081943938",
                "message": "backup VM",
                "start": 1725081943938,
                "status": "failure",
                "tasks": [
                  {
                    "id": "1725081943938:0",
                    "message": "clean-vm",
                    "start": 1725081943938,
                    "status": "success",
                    "end": 1725081944676,
                    "result": {
                      "merge": false
                    }
                  },
                  {
                    "id": "1725081944876",
                    "message": "snapshot",
                    "start": 1725081944876,
                    "status": "success",
                    "end": 1725081978972,
                    "result": "46334bc0-cb3c-23f7-18e1-f25320a6c4b4"
                  },
                  {
                    "data": {
                      "id": "122ddf1f-090d-4c23-8c5e-fe095321f8b9",
                      "isFull": false,
                      "type": "remote"
                    },
                    "id": "1725081978972:0",
                    "message": "export",
                    "start": 1725081978972,
                    "status": "success",
                    "tasks": [
                      {
                        "id": "1725082089246",
                        "message": "clean-vm",
                        "start": 1725082089246,
                        "status": "success",
                        "end": 1725082089709,
                        "result": {
                          "merge": false
                        }
                      }
                    ],
                    "end": 1725082089719
                  },
                  {
                    "data": {
                      "id": "beee944b-e502-61d7-e03b-e1408f01db8c",
                      "isFull": false,
                      "name_label": "BLABLA_SR_HDD-01",
                      "type": "SR"
                    },
                    "id": "1725081978972:1",
                    "message": "export",
                    "start": 1725081978972,
                    "status": "pending"
                  }
                ],
                "infos": [
                  {
                    "message": "will delete snapshot data"
                  },
                  {
                    "data": {
                      "vdiRef": "OpaqueRef:1b614f6b-0f69-47a1-a0cd-eee64007441d"
                    },
                    "message": "Snapshot data has been deleted"
                  }
                ],
                "warnings": [
                  {
                    "data": {
                      "error": {
                        "code": "VDI_IN_USE",
                        "params": [
                          "OpaqueRef:989f7dd8-0b73-4a87-b249-6cfc660a90bb",
                          "data_destroy"
                        ],
                        "call": {
                          "method": "VDI.data_destroy",
                          "params": [
                            "OpaqueRef:989f7dd8-0b73-4a87-b249-6cfc660a90bb"
                          ]
                        }
                      },
                      "vdiRef": "OpaqueRef:989f7dd8-0b73-4a87-b249-6cfc660a90bb"
                    },
                    "message": "Couldn't deleted snapshot data"
                  }
                ],
                "end": 1725082089719,
                "result": {
                  "message": "can't create a stream from a metadata VDI, fall back to a base ",
                  "name": "Error",
                  "stack": "Error: can't create a stream from a metadata VDI, fall back to a base \n    at Xapi.exportContent (file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/xapi/vdi.mjs:202:15)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/backups/_incrementalVm.mjs:57:32\n    at async Promise.all (index 0)\n    at async cancelableMap (file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/backups/_cancelableMap.mjs:11:12)\n    at async exportIncrementalVm (file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/backups/_incrementalVm.mjs:26:3)\n    at async IncrementalXapiVmBackupRunner._copy (file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/backups/_runners/_vmRunners/IncrementalXapi.mjs:44:25)\n    at async IncrementalXapiVmBackupRunner.run (file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/backups/_runners/_vmRunners/_AbstractXapi.mjs:379:9)\n    at async file:///usr/local/lib/node_modules/xo-server/node_modules/@xen-orchestra/backups/_runners/VmsXapi.mjs:166:38"
                }
              },
          
          1 Reply Last reply Reply Quote 0
          • R Offline
            rtjdamen
            last edited by

            So far i did see this fall back to base error only once, it looks like it does finish correct in the retry action. I will keep an eye on this.

            F 1 Reply Last reply Reply Quote 0
            • C Offline
              CJ
              last edited by

              I was having a backup fail due to the VDI must be free error, but updating XOA to the latest commit fixed that.

              Now I'm getting VDI IN USE errors when backing up. Going to the Health tab of the Dashboard lists the VDIs still attached to the Control Domain. However, when I try to forget the VDI, I get the OPERATION NOT PERMITTED VBD still attached error.

              I've enabled maintenance mode on the node which migrated the VMs to my other node, but that didn't fix the issue. I assume because I'm using shared storage. I tried coalescing the leaf but there were none.

              Any suggestions for the next step?

              R 1 Reply Last reply Reply Quote 0
              • R Offline
                rtjdamen @CJ
                last edited by

                @CJ u need to check what host the vdi is attached to and reboot that host. That will release this vdi.

                C 1 Reply Last reply Reply Quote 0
                • C Offline
                  CJ @rtjdamen
                  last edited by CJ

                  @rtjdamen The VMs were running on the master so I had rebooted it since I don't recall how match the UUIDs. I'll try rebooting the other node and see if that works.

                  EDIT: That worked. Even though they were running on the master they were attached to the other node.

                  R 2 Replies Last reply Reply Quote 0
                  • F Offline
                    flakpyro @rtjdamen
                    last edited by

                    @rtjdamen I notice this too, on retry it does run, but it seems to take much longer than a normal incremental backup would take so not entirely sure whats going on there. It ONLY happens if i migrate a VM from one host to another for me. (On shared NFS storage)

                    R 1 Reply Last reply Reply Quote 0
                    • R Offline
                      rtjdamen @CJ
                      last edited by

                      @CJ nbd does pick a random host for transfer so it is not specific the poolmaster. U should be able to determine the host holding this vdi in the error message.

                      1 Reply Last reply Reply Quote 0
                      • R Offline
                        rtjdamen @flakpyro
                        last edited by

                        @flakpyro this is because it creates a new full, i think it has an issue with the cbt to be invallid what is causing it to run a new full.

                        F 1 Reply Last reply Reply Quote 0
                        • F Offline
                          flakpyro @rtjdamen
                          last edited by

                          @rtjdamen Correct, it does appear to run a full. Even though the backup report afterwards says it ran a "delta". After that initial run it will run backups error free again, unless i migrate the VM to another host in which case the same error occurs once again.

                          R 1 Reply Last reply Reply Quote 0
                          • R Offline
                            rtjdamen @flakpyro
                            last edited by

                            @flakpyro yes, we do not see this error in relation to a migration but it does sometimes just occur, in the prior version it failed in the retry, with the latest version it does resolve itself so it is improved a bit.

                            We also still see the vdi in use errors, would be nice if they will be improved.

                            1 Reply Last reply Reply Quote 0
                            • C Offline
                              CJ
                              last edited by

                              I'm not sure if this happened after my initial manual run of my backup job or the scheduled one that ran afterwards, but one of the VMs is now showing again as attached to the control domain.

                              Is this something I need to keep checking or should it resolve itself? The backup job completed hours ago.

                              R 1 Reply Last reply Reply Quote 0
                              • R Offline
                                rtjdamen @CJ
                                last edited by

                                @CJ normally this should not happen, i don’t see this at out end. Mostly this happens after an incomplete backupjob.

                                C 1 Reply Last reply Reply Quote 0
                                • R Offline
                                  rtjdamen @CJ
                                  last edited by

                                  @CJ said in CBT: the thread to centralize your feedback:

                                  @rtjdamen The VMs were running on the master so I had rebooted it since I don't recall how match the UUIDs. I'll try rebooting the other node and see if that works.

                                  EDIT: That worked. Even though they were running on the master they were attached to the other node.

                                  If anyone has this issue and they do not want to reboot hosts, u can migrate the vm to a different sr to fix the issue partly, the vdi will still be orphan and attached till the next host reboot but the backup will run without issues.

                                  1 Reply Last reply Reply Quote 0
                                  • C Offline
                                    CJ @rtjdamen
                                    last edited by

                                    @rtjdamen Unfortunately it's still happening to me and getting worse.

                                    Yesterday, I had one VM with the issue. When the backup ran I got the report stating that VM failed to backup but all others succeeded. When I just checked the dashboard health, I see that I now have three VMs with control domain attached VDIs. The backup job only lists the one VM as having failed.

                                    One unusual thing is that these three are three of the four VMs that had problems originally. So I'm not sure if there's an issue with the VMs themselves or something else causing those to error.

                                    R 1 Reply Last reply Reply Quote 0
                                    • Tristis OrisT Offline
                                      Tristis Oris Top contributor
                                      last edited by

                                      VM refused to launch untill i disable CBT.

                                      vm.start
                                      {
                                        "id": "59f0ba04-5814-7154-22d2-51ae24ecf146",
                                        "bypassMacAddressesCheck": false,
                                        "force": false
                                      }
                                      {
                                        "code": "FAILED_TO_START_EMULATOR",
                                        "params": [
                                          "OpaqueRef:064cdc5a-49c4-4c58-8bdf-5fe4f04b2624",
                                          "domid 29",
                                          "QMP failure at File \"xc/device.ml\", line 3491, characters 71-78"
                                        ],
                                        "call": {
                                          "method": "VM.start",
                                          "params": [
                                            "OpaqueRef:064cdc5a-49c4-4c58-8bdf-5fe4f04b2624",
                                            false,
                                            false
                                          ]
                                        },
                                        "message": "FAILED_TO_START_EMULATOR(OpaqueRef:064cdc5a-49c4-4c58-8bdf-5fe4f04b2624, domid 29, QMP failure at File \"xc/device.ml\", line 3491, characters 71-78)",
                                        "name": "XapiError",
                                        "stack": "XapiError: FAILED_TO_START_EMULATOR(OpaqueRef:064cdc5a-49c4-4c58-8bdf-5fe4f04b2624, domid 29, QMP failure at File \"xc/device.ml\", line 3491, characters 71-78)
                                          at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202408301255/packages/xen-api/_XapiError.mjs:16:12)
                                          at file:///opt/xo/xo-builds/xen-orchestra-202408301255/packages/xen-api/transports/json-rpc.mjs:38:21
                                          at runNextTicks (node:internal/process/task_queues:60:5)
                                          at processImmediate (node:internal/timers:454:9)
                                          at process.callbackTrampoline (node:internal/async_hooks:130:17)"
                                      }
                                      
                                      1 Reply Last reply Reply Quote 0
                                      • R Offline
                                        rtjdamen @CJ
                                        last edited by

                                        @CJ do you run XOA of XO from sources? are u on the latest build? we do not experience this issue like you do. i had some issues with hanging jobs that caused VDI's to stay attached to the Control Domain. Normally this should be handled by the backup job itself. However we have seen a case where this happens when a speed limit is set to the backup job, could it be that u have set one? Maybe u can try disabling it and see what it brings.

                                        I believe that this issue in general is one that should be resolved, having disks that stay attached to the control domain is causing issues and it's not doable to restart hosts everytime this happens, there needs to be a good mechanism to recover from this kind of issues. @olivierlambert is this something that we can expect in the near future?

                                        C 1 Reply Last reply Reply Quote 0
                                        • olivierlambertO Offline
                                          olivierlambert Vates 🪐 Co-Founder CEO
                                          last edited by

                                          Well, it's hard to get the problem if we cannot reproduce it either on our side. The QMP failures really looks like something else 🤔

                                          R 1 Reply Last reply Reply Quote 0
                                          • R Offline
                                            rtjdamen @olivierlambert
                                            last edited by

                                            @olivierlambert i agree, but there should be a better way to recover from them.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post