Categories

  • All news regarding Xen and XCP-ng ecosystem

    143 Topics
    4k Posts
    A
    @dthenot Great! I'm happy I was able to help test it. I look forward to the update release. Interesting note, CR is faster when the snapshots are not deleted.... or CR is faster because of the update, I'll test again after the fix.
  • Everything related to the virtualization platform

    1k Topics
    15k Posts
    T
    @yannsionneau said: Can you retry with an up-to-date xcp-ng 8.3 please? FYI on recent XCP-ng 8.3 versions the pci-passthrough will enable the ROM expansion bar. The guest VM will have access to it, so no need to pass it via qemu anymore. See my comment on GitHub: https://github.com/xcp-ng/xcp/issues/786#issuecomment-4281846490 Regards, Yann I tried, yes. I confirmed as well on github that the patch does what it suppose to do - it exposes ROM BAR, but that alone is not sufficient to get our cards working. More in my other topic.. I believe it will still fix a lot of issues, and it could potentially fix amd gpu passthrough for some card, but not Phoenix, Raphael and this generation of Ryzen iGPUs (not sure specifically about Barcelo).. But it is a good progress anyway as I wouldn't be able to fix this ever on my own, so I am really glad it is done. Now I will probably get back to the topic and try to patch whatever else is needit and give it to people in a form of rpm package for the time being.. we will see, it is all about time
  • 3k Topics
    28k Posts
    A
    @florent After some digging this is what I have come up with. Please double check everything... I can PM you the whole chat session if you like. Bug Report: XO Backup Intermittent Failure — RequestAbortedError During NBD Stream Init Environment: XCP-ng: 8.3.0 (build 20260408, xapi 26.1.3) xapi-nbd: 26.1.3-1.6.xcpng8.3 xo-server: community edition (xen-orchestra from source) Pool: 2-node pool (host1 10.100.2.10, host2 10.100.2.11) Backup NFS target: 10.100.2.23:/volume1/backup Symptom: Scheduled backup jobs intermittently fail with RequestAbortedError: Request aborted during NBD stream initialization. The failure is transient — the same VMs back up successfully on subsequent runs. xo:backups:worker ERROR unhandled error event error: RequestAbortedError [AbortError]: Request aborted at BodyReadable.destroy (undici/lib/api/readable.js:51:13) at QcowStream.close (@xen-orchestra/qcow2/dist/disk/QcowStream.mjs:40:22) at XapiQcow2StreamSource.close (@xen-orchestra/disk-transform/dist/DiskPassthrough.mjs:86:28) at XapiQcow2StreamSource.close (@xen-orchestra/xapi/disks/XapiQcow2StreamSource.mjs:61:18) at DiskLargerBlock.close (@xen-orchestra/disk-transform/dist/DiskLargerBlock.mjs:87:28) at TimeoutDisk.close (@xen-orchestra/disk-transform/dist/DiskPassthrough.mjs:34:29) at XapiStreamNbdSource.close (@xen-orchestra/disk-transform/dist/DiskPassthrough.mjs:34:29) at XapiStreamNbdSource.init (@xen-orchestra/xapi/disks/XapiStreamNbd.mjs:66:17) at async #openNbdStream (@xen-orchestra/xapi/disks/Xapi.mjs:108:7) Root Cause Analysis: The error chain is misleading — QcowStream.close and BodyReadable.destroy are cleanup, not the cause. The actual failure is inside connectNbdClientIfPossible() called at XapiStreamNbd.mjs:66. The sequence in #openNbdStream (Xapi.mjs) is: #openExportStream() — opens a qcow2/VHD HTTP stream from XAPI (succeeds) new XapiStreamNbdSource(streamSource, ...) — wraps it await source.init() — calls super.init() then connectNbdClientIfPossible() If connectNbdClientIfPossible() throws for any reason other than NO_NBD_AVAILABLE, execution goes to the catch block in #openNbdStream which calls source?.close() — this closes the already-open qcow2 HTTP stream, producing the BodyReadable.destroy → AbortError cascade The underlying NBD connection failure: MultiNbdClient.connect() opens nbdConcurrency (default 2) sequential connections. Each NbdClient.connect() failure causes the candidate host to be removed and retried with another candidate. With only 2 hosts in the pool and nbdConcurrency=2, a single transient TLS or TCP failure on one host during the NBD option negotiation can exhaust all candidates, causing MultiNbdClient to throw NO_NBD_AVAILABLE — but this error IS caught and falls back to stream export. So the failure here is something else: a connection that partially succeeds then aborts, throwing a non-NO_NBD_AVAILABLE error that propagates uncaught to #openNbdStream's catch block. Specific issue: When nbdClient.connect() throws with UND_ERR_ABORTED (an undici abort), the error code is not NO_NBD_AVAILABLE, so #openNbdStream re-throws it instead of falling back to stream export. The backup then fails entirely rather than gracefully degrading. Proposed Fix: In Xapi.mjs, the catch block in #openNbdStream should treat any NBD connection failure as fallback-eligible, not just NO_NBD_AVAILABLE: } catch (err) { if (err.code === 'NO_NBD_AVAILABLE' || err.code === 'UND_ERR_ABORTED') { warn(can't connect through NBD, fall back to stream export, { err }) if (streamSource === undefined) { throw new Error(Can't open stream source) } return streamSource } await source?.close().catch(warn) throw err } Or more robustly, treat any NBD connection error as fallback-eligible rather than hardcoding error codes: } catch (err) { warn(can't connect through NBD, fall back to stream export, { err }) if (streamSource === undefined) { throw new Error(Can't open stream source) } return streamSource } This matches the intent of the existing NO_NBD_AVAILABLE fallback — NBD is opportunistic, and any failure to establish it should degrade gracefully to HTTP stream export rather than failing the entire backup job. Observed Timeline: 02:22:11 — xo-server opens VHD + qcow2 export streams 02:22:12–15 — NBD connections attempted, fail mid-handshake 02:22:15 — backup fails with UND_ERR_ABORTED, no fallback 02:33:51 — retry attempt also fails in 5 seconds 23:03 — same VMs back up successfully (transient condition resolved) Impact: Backup jobs fail entirely on transient NBD connectivity issues instead of falling back to HTTP stream export, which is already implemented and working. You can file this at the XO GitHub issues or the XCP-ng forum. The fix is straightforward and low-risk — the fallback path already exists and works, it's just not being reached for UND_ERR_ABORTED errors.
  • Our hyperconverged storage solution

    44 Topics
    731 Posts
    olivierlambertO
    Different use cases: Ceph is better with more hosts (at least 6 or 7 minimum) while XOSTOR is better between 3 to 7/8. We might have better Ceph support in the future for large clusters.
  • 34 Topics
    102 Posts
    B
    La remarque a été intégrée dans l'article: https://www.myprivatelab.tech/xcp_lab_v2_ha#perte-master Merci encore pour le retour.