Continuous replication failing
-
I'm using continuous replication between two hosts, each with local storage, that has been working wonderfully.
Recently I had a storage issue on the source host (thick provisioning/filesystem filled up). At this point continuous replication stopped working, I assume because it couldn't snapshot the VMs.
I have resolved the storage issue on the source host, but continuous replication is still broken - it immediately fails with no log messages that I've been able to find.
What do I need to do to get continuous replication working again? Should I remove the remaining older copies on the destination server, which (I presume) forces another initial seed copy?
Thanks!
BTW - LOVE Xen Orchestra/XCPNG!
-
Have you looked under Dashboard > Health in XO? If you ran out of disk space, then it's likely that you are encountering coalesce issues.
-
@Danp
Thanks for the reply!I did have one item under "VDIs to coalesce" - it was an old/unused VDI so I removed it.
Unfortunately continuous replication still isn't happy when I try to manually run it - I get an error popup "unknown error from peer" and nothing is getting logged to SMlog. Tried restarting the tool stack, no change.
Any suggestions?
Thanks!
-
How much free space is present on the source and destination SRs? Have you checked the XO logs?
-
Do you have any snapshot related to the CR on the source ? You can remove them to ensure the next replication is full.
Maybe if you have a lot of VM, do it progressively, a few VM at a timealso the xo logs should have more informations, if you can paste here the related part
-
No snapshots on src, which has 1TB free (out of 2.58TB); dest SR has 1 TB free (out of 2.6TB)
Ah yes - Xen Orchestra is complaining about a missing object:
2023-04-06T18:07:11.954Z xo:xo-mixins:backups-ng WARN no such object 67a9f559-cc92-c17f-50fc-24a7b58a8d5c { error: XoError: no such object 67a9f559-cc92-c17f-50fc-24a7b58a8d5c at noSuchObject (/opt/xen-orchestra/packages/xo-common/api-errors.js:26:11) at Xo.getObject (file:///opt/xen-orchestra/packages/xo-server/src/xo.mjs:81:13) at default.getXenServerIdByObject (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/xen-servers.mjs:199:26) at handleRecord (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.mjs:184:36) at executor (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/backups-ng/index.mjs:196:13) at file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/jobs/index.mjs:263:30 at Jobs._runJob (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/jobs/index.mjs:292:22) at Jobs.runJobSequence (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/jobs/index.mjs:332:7) at Api.#callApiMethod (file:///opt/xen-orchestra/packages/xo-server/src/xo-mixins/api.mjs:394:20) { code: 1, data: { id: '67a9f559-cc92-c17f-50fc-24a7b58a8d5c', type: undefined } } } 2023-04-06T18:07:14.028Z xo:backups:backupWorker WARN incorrect value passed to logger { level: 40, value: { message: 'no XAPI associated to 67a9f559-cc92-c17f-50fc-24a7b58a8d5c', name: 'Error', stack: 'Error: no XAPI associated to 67a9f559-cc92-c17f-50fc-24a7b58a8d5c\n' + ' at BackupWorker.getConnectedRecord (/opt/xen-orchestra/@xen-orchestra/backups/_backupWorker.js:55:17)\n' + ' at getConnectedRecord.next (<anonymous>)\n' + ' at wrapCall (/opt/xen-orchestra/node_modules/promise-toolbox/wrapCall.js:7:23)\n' + ' at loop (/opt/xen-orchestra/node_modules/promise-toolbox/Disposable.js:96:25)\n' + ' at BackupWorker.getConnectedRecord (/opt/xen-orchestra/node_modules/promise-toolbox/Disposable.js:98:10)\n' + ' at /opt/xen-orchestra/@xen-orchestra/backups/Backup.js:249:58\n' + ' at pTry (/opt/xen-orchestra/node_modules/promise-toolbox/try.js:7:20)\n' + ' at module.exports (/opt/xen-orchestra/node_modules/promise-toolbox/_evalDisposable.js:5:49)\n' + ' at /opt/xen-orchestra/node_modules/promise-toolbox/Disposable.js:78:62\n' + ' at Function.from (<anonymous>)'
How can I find/remove (or otherwise remediate) this?
-
@mwxoa3 said in Continuous replication failing:
no XAPI associated to 67a9f559-cc92-c17f-50fc-24a7b58a8d5c
This means you probably have a disconnect pool, check your Settings/server view.
-
Nothing is jumping out at me that appears related to a disconnected pool; all pools I can see are valid and connected to a host
Settings > Server just shows my hosts, nothing unusual looking there (maybe I'm not looking at the right place?)
Dashboard > Health also is clean
-
I've deleted all continuous replication VMs from the backup/dest SR and rebooted the dest host, but am still hitting this same error.
How can I tell what/where this UUID (67a9f559-cc92-c17f-50fc-24a7b58a8d5c) is, and either fix the issue with it or remove it if it isn't needed?
Thanks!
-
Update: I disabled the CR job and created a new one, which did successfully start - so I'm hoping this fixed it