Continuous replication failing

mwxoa3

I'm using continuous replication between two hosts, each with local storage, that has been working wonderfully.

Recently I had a storage issue on the source host (thick provisioning/filesystem filled up). At this point continuous replication stopped working, I assume because it couldn't snapshot the VMs.

I have resolved the storage issue on the source host, but continuous replication is still broken - it immediately fails with no log messages that I've been able to find.

What do I need to do to get continuous replication working again? Should I remove the remaining older copies on the destination server, which (I presume) forces another initial seed copy?

Thanks!

BTW - LOVE Xen Orchestra/XCPNG!

Danp

Have you looked under Dashboard > Health in XO? If you ran out of disk space, then it's likely that you are encountering coalesce issues.

mwxoa3

@Danp
Thanks for the reply!

I did have one item under "VDIs to coalesce" - it was an old/unused VDI so I removed it.

Unfortunately continuous replication still isn't happy when I try to manually run it - I get an error popup "unknown error from peer" and nothing is getting logged to SMlog. Tried restarting the tool stack, no change.

Any suggestions?

Thanks!

Danp

How much free space is present on the source and destination SRs? Have you checked the XO logs?

florent

Do you have any snapshot related to the CR on the source ? You can remove them to ensure the next replication is full.
Maybe if you have a lot of VM, do it progressively, a few VM at a time

also the xo logs should have more informations, if you can paste here the related part

mwxoa3

No snapshots on src, which has 1TB free (out of 2.58TB); dest SR has 1 TB free (out of 2.6TB)

Ah yes - Xen Orchestra is complaining about a missing object:

2023-04-06T18:07:11.954Z xo:xo-mixins:backups-ng WARN no such object 67a9f559-cc92-c17f-50fc-24a7b58a8d5c {
  error: XoError: no such object 67a9f559-cc92-c17f-50fc-24a7b58a8d5c
      at noSuchObject (/opt/xen-orchestra/packages/xo-common/api-errors.js:26:11)
      at Xo.getObject (file:///opt/xen-orchestra/packages/xo-server/src/xo.mjs:81:13)
      at default.getXenServerIdByObject (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:199:26)
      at handleRecord (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.mjs:184:36)
      at executor (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.mjs:196:13)
      at file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/jobs/index.mjs:263:30
      at Jobs._runJob (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/jobs/index.mjs:292:22)
      at Jobs.runJobSequence (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/jobs/index.mjs:332:7)
      at Api.#callApiMethod (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:394:20) {
    code: 1,
    data: { id: '67a9f559-cc92-c17f-50fc-24a7b58a8d5c', type: undefined }
  }
}
2023-04-06T18:07:14.028Z xo:backups:backupWorker WARN incorrect value passed to logger {
  level: 40,
  value: {
    message: 'no XAPI associated to 67a9f559-cc92-c17f-50fc-24a7b58a8d5c',
    name: 'Error',
    stack: 'Error: no XAPI associated to 67a9f559-cc92-c17f-50fc-24a7b58a8d5c\n' +
      '    at BackupWorker.getConnectedRecord (/opt/xen-orchestra/@xen-orchestra/backups/_backupWorker.js:55:17)\n' +
      '    at getConnectedRecord.next (<anonymous>)\n' +
      '    at wrapCall (/opt/xen-orchestra/node_modules/promise-toolbox/wrapCall.js:7:23)\n' +
      '    at loop (/opt/xen-orchestra/node_modules/promise-toolbox/Disposable.js:96:25)\n' +
      '    at BackupWorker.getConnectedRecord (/opt/xen-orchestra/node_modules/promise-toolbox/Disposable.js:98:10)\n' +
      '    at /opt/xen-orchestra/@xen-orchestra/backups/Backup.js:249:58\n' +
      '    at pTry (/opt/xen-orchestra/node_modules/promise-toolbox/try.js:7:20)\n' +
      '    at module.exports (/opt/xen-orchestra/node_modules/promise-toolbox/_evalDisposable.js:5:49)\n' +
      '    at /opt/xen-orchestra/node_modules/promise-toolbox/Disposable.js:78:62\n' +
      '    at Function.from (<anonymous>)'

How can I find/remove (or otherwise remediate) this?

olivierlambert

@mwxoa3 said in Continuous replication failing:

no XAPI associated to 67a9f559-cc92-c17f-50fc-24a7b58a8d5c

This means you probably have a disconnect pool, check your Settings/server view.

mwxoa3

@olivierlambert

Nothing is jumping out at me that appears related to a disconnected pool; all pools I can see are valid and connected to a host

Settings > Server just shows my hosts, nothing unusual looking there (maybe I'm not looking at the right place?)

Dashboard > Health also is clean

mwxoa3

I've deleted all continuous replication VMs from the backup/dest SR and rebooted the dest host, but am still hitting this same error.

How can I tell what/where this UUID (67a9f559-cc92-c17f-50fc-24a7b58a8d5c) is, and either fix the issue with it or remove it if it isn't needed?

Thanks!

mwxoa3

Update: I disabled the CR job and created a new one, which did successfully start - so I'm hoping this fixed it