Latest posts made by tts

tts

Hi,

to my understanding the orphaned snapshot VDIs listed under Dashboard/Health or under SR/Disks/Orphaned are always safe to delete?

We noticed that if you create a snapshot with memory, XO lists this snapshot with an additional symbol BUT also lists the corresponding suspend_image under orphaned disks. If we clean these up as they should be safe to delete, the snapshot with memory is still listed but broken and not longer restorable.

So I guess there might be some bug as XO may not be aware that this suspend_image is still bound to a snapshot?

Can someone clarify this behaviour?

Thank you,
-T

tts

Looks like XOA on stable channel is only updating to v5.43.2,

I really don't get the point why 'rolling back' to a much older version could help solve the backup problems in the more recent version 'from sources' 5.56+?

Is this a completely different codebase?

Quite lost now!

tts

Ok thanks, will do that and report back later.

tts

I'm currently on it, we did setup the XOA appliance recently. Just needs activating the trial phase, I think I can do that today in a few hours.

Can we import the settings safely from our current 'from sources' version or do we need to setup everything from scratch?

tts

Maybe there are additional problems. I just noticed one of our rolling-snapshot task started twice again.

This time it's even more weird, both jobs are listed as successfull.

The first job displays just nothing in the detail window:

json:

{
  "data": {
    "mode": "full",
    "reportWhen": "failure"
  },
  "id": "1583917200011",
  "jobId": "22050542-f6ec-4437-a031-0600260197d7",
  "jobName": "odoo_hourly",
  "message": "backup",
  "scheduleId": "ab9468b4-9b8a-44a0-83a8-fc6b84fe8a95",
  "start": 1583917200011,
  "status": "success",
  "end": 1583917207066
}

The second job shows 4? VMs but only consists of one VM which failed, but is displayed as successful

json:

{
  "data": {
    "mode": "full",
    "reportWhen": "failure"
  },
  "id": "1583917200014",
  "jobId": "22050542-f6ec-4437-a031-0600260197d7",
  "jobName": "odoo_hourly",
  "message": "backup",
  "scheduleId": "ab9468b4-9b8a-44a0-83a8-fc6b84fe8a95",
  "start": 1583917200014,
  "status": "success",
  "tasks": [
    {
      "id": "1583917200019",
      "message": "snapshot",
      "start": 1583917200019,
      "status": "success",
      "end": 1583917201773,
      "result": "32ba0201-26bd-6f29-c507-0416129058b4"
    },
    {
      "id": "1583917201780",
      "message": "add metadata to snapshot",
      "start": 1583917201780,
      "status": "success",
      "end": 1583917201791
    },
    {
      "id": "1583917206856",
      "message": "waiting for uptodate snapshot record",
      "start": 1583917206856,
      "status": "success",
      "end": 1583917207064
    },
    {
      "data": {
        "type": "VM",
        "id": "b6b63fab-bd5f-8640-305d-3565361b213f"
      },
      "id": "1583917200022",
      "message": "Starting backup of HALOffice01. (22050542-f6ec-4437-a031-0600260197d7)",
      "start": 1583917200022,
      "status": "failure",
      "tasks": [
        {
          "id": "1583917200028",
          "message": "snapshot",
          "start": 1583917200028,
          "status": "success",
          "end": 1583917220144,
          "result": "62e06abc-6fca-9564-aad7-7933dec90ba3"
        },
        {
          "id": "1583917220148",
          "message": "add metadata to snapshot",
          "start": 1583917220148,
          "status": "success",
          "end": 1583917220167
        }
      ],
      "end": 1583917220405,
      "result": {
        "message": "no object with UUID or opaque ref: 1536b84b-412d-07ad-2dcd-9db3162296e6",
        "name": "Error",
        "stack": "Error: no object with UUID or opaque ref: 1536b84b-412d-07ad-2dcd-9db3162296e6\n    at Xapi.apply (/opt/xen-orchestra/packages/xen-api/src/index.js:573:11)\n    at Xapi.getObject (/opt/xen-orchestra/packages/xo-server/src/xapi/index.js:128:24)\n    at Xapi.deleteVm (/opt/xen-orchestra/packages/xo-server/src/xapi/index.js:692:12)\n    at iteratee (/opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.js:1268:23)\n    at /opt/xen-orchestra/@xen-orchestra/async-map/src/index.js:32:17\n    at Promise._execute (/opt/xen-orchestra/node_modules/bluebird/js/release/debuggability.js:384:9)\n    at Promise._resolveFromExecutor (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:518:18)\n    at new Promise (/opt/xen-orchestra/node_modules/bluebird/js/release/promise.js:103:10)\n    at /opt/xen-orchestra/@xen-orchestra/async-map/src/index.js:31:7\n    at arrayMap (/opt/xen-orchestra/node_modules/lodash/_arrayMap.js:16:21)\n    at map (/opt/xen-orchestra/node_modules/lodash/map.js:50:10)\n    at asyncMap (/opt/xen-orchestra/@xen-orchestra/async-map/src/index.js:30:5)\n    at BackupNg._backupVm (/opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.js:1265:13)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (internal/process/task_queues.js:97:5)\n    at handleVm (/opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.js:734:13)"
      }
    }
  ],
  "end": 1583917207065
}

So the results are clearly wrong, job is ran twice, this time at the same point in time.

Would it help to delete and recreate all of the backup jobs?
What extra information I could offer to get this issue fixed?

Already thinking about setting up another new xo-server from scratch now.

please help and advice,
thanks Toni

tts

It wasn't XOA, we used xo-server from sources.

I planned to test with XOA but Olivier yesterday wrote that it's probably fixed upstream.

So I updated our installation to xo-server 5.57.3 / xo-web 5.57.1 and will see if it's fixed.

fingers crossed

tts

Hi,

I can give a little update on our issues, but our 'workaround' is really time consuming.

Besides the problem of randomly stuck CR jobs, which could be fixed mostly by:

stop xo-server/ start xo-server (sometimes needed to restart xo-vm) and restart the interrupted jobs again and again until all VM are successfully backed up.

Now without changing anything the jobs are again started twice again most of the time, but not always.

I can not find a correlation to this.

Somehow it looked that this could be fixed by dis/enabling the backup jobs, but as of now this doesn't make any difference.

-Toni

tts

Some addional info that might be relevant to this issue:

When stopping / starting xo-server after a hanging job, the summary sent as report is interesting as it states that the SR and VM isn't available / not found - which is obviously wrong..

  ##  Global status: interrupted
  
  - **Job ID**: fd08162c-228a-4c49-908e-de3085f75e46
  - **Run ID**: 1582618375292
  - **mode**: delta
  - **Start time**: Tuesday, February 25th 2020, 9:12:55 am
  - **Successes**: 13 / 32
  
  ---
  
  ## 19 Interrupted

  ### VM not found
  
  - **UUID**: d627892d-661c-9b7c-f733-8f52da7325e2
  - **Start time**: Tuesday, February 25th 2020, 9:12:55 am
  - **Snapshot** ✔
    - **Start time**: Tuesday, February 25th 2020, 9:12:55 am
    - **End time**: Tuesday, February 25th 2020, 9:12:56 am
    - **Duration**: a few seconds
  - **SRs**
    - **SR Not found** (ea4f9bd7-ccae-7a1f-c981-217565c8e08e) ⚠️
      - **Start time**: Tuesday, February 25th 2020, 9:14:26 am
      - **transfer** ⚠️
        - **Start time**: Tuesday, February 25th 2020, 9:14:26 am
[snip]

tts

Additional to the above errors, we noticed a lot 'VDI_IO_ERROR' errors during the running jobs.

I remember there was a similar problem with CR which got fixed using the 'guessVhdSizeOnImport: true' option. Did anything change since then?

I will try with XOA trial when I have some spare time.

EDIT:
Can we import the config from our XO from sources to XOA or should we setup everything from scratch?

tts

Unfortunately, our CR jobs are not working properly after the upgrade. Some get backup successfully, some just facing randomly hanging jobs, not just pinned down to one particular VM as we thought first.

It looks like it suddenly stops transferring the deltas / importing the vdi. We noticed some strange backtrace errors in xensource.log, which are related to the point of time of backing up the VM.

xen02 xapi: [error|xen02|12074002 ||backtrace] Async.VM.snapshot_with_quiesce R:f6f758fd5e3f failed with exception Server_error(VM_SNAPSHOT_WITH_QUIESCE_NOT_SUPPORTED, [ OpaqueRef:80746e86-06bf-87e4-9b5f-e6bdb81094ae ])
xen02 xapi: [error|xen02|12074002 ||backtrace] Raised Server_error(VM_SNAPSHOT_WITH_QUIESCE_NOT_SUPPORTED, [ OpaqueRef:80746e86-06bf-87e4-9b5f-e6bdb81094ae ])                                                             
xen02 xapi: [error|xen02|12074002 ||backtrace] 1/1 xapi @ xen02 Raised at file (Thread 12074002 has no backtrace table. Was with_backtraces called?, line 0                                                                
xen02 xapi: [error|xen02|12074002 ||backtrace]

xen02 xapi: [error|xen02|12074140 INET :::80|VDI.add_to_other_config D:d82237120feb|sql] Duplicate key in set or map: table VDI; field other_config; ref OpaqueRef:317a9b0c-3f1b-c88c-f5ff-13d92395fa67; key xo:copy_of
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] VDI.add_to_other_config D:d82237120feb failed with exception Db_exn.Duplicate_key("VDI", "other_config", "OpaqueRef:317a9b0c-3f1b-c88c-f5ff-13d92395fa67"
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] Raised Db_exn.Duplicate_key("VDI", "other_config", "OpaqueRef:317a9b0c-3f1b-c88c-f5ff-13d92395fa67", "xo:copy_of")
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 1/8 xapi @ xen02 Raised at file db_cache_impl.ml, line 263
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 2/8 xapi @ xen02 Called from file lib/pervasiveext.ml, line 22
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 3/8 xapi @ xen02 Called from file rbac.ml, line 236
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 4/8 xapi @ xen02 Called from file server_helpers.ml, line 80 
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 5/8 xapi @ xen02 Called from file server_helpers.ml, line 99 
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 6/8 xapi @ xen02 Called from file lib/pervasiveext.ml, line 22
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 7/8 xapi @ xen02 Called from file map.ml, line 117 
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace] 8/8 xapi @ xen02 Called from file src/conv.ml, line 215 
xen02 xapi: [error|xen02|12074140 INET :::80|dispatch:VDI.add_to_other_config D:7048d4831ced|backtrace]

xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] VDI.get_by_uuid D:0438f795bf3a failed with exception Db_exn.Read_missing_uuid("VDI", "", "OLD_8693f679-6023-4776-accb-c4d2488d367b")
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] Raised Db_exn.Read_missing_uuid("VDI", "", "OLD_8693f679-6023-4776-accb-c4d2488d367b")                                              
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 1/9 xapi @ xen02 Raised at file db_cache_impl.ml, line 211                                                                          
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 2/9 xapi @ xen02 Called from file db_actions.ml, line 14838                                                                         
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 3/9 xapi @ xen02 Called from file rbac.ml, line 227                                                                                 
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 4/9 xapi @ xen02 Called from file rbac.ml, line 236                                                                                 
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 5/9 xapi @ xen02 Called from file server_helpers.ml, line 80                                                                        
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 6/9 xapi @ xen02 Called from file server_helpers.ml, line 99                                                                        
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 7/9 xapi @ xen02 Called from file lib/pervasiveext.ml, line 22                                                                      
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 8/9 xapi @ xen02 Called from file map.ml, line 117                                                                                  
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace] 9/9 xapi @ xen02 Called from file src/conv.ml, line 215                                                                             
xen02 xapi: [error|xen02|12074217 UNIX /var/lib/xcp/xapi|dispatch:VDI.get_by_uuid D:357c7a87bcbf|backtrace]

The hanging backup-job is still displayed as running in XO when this happens and can not be canceled in the webinterface. The only way to get out of this state is restarting xo-server.

Then the job switches to interrupted. If we start the backup again (restart failed or restart whole backup-job) the problematic VM eventually will back up successfully but the job will get stuck again, while backing up another randomly VM, leading to the same errors in the logs.

Please help, I am absolutely out of any ideas how to solve this. The remaining backups (delta, snapshot) are working as expected, only CR is now problematic.

tts

@tts

Latest posts made by tts