Latest posts made by peo

peo

@florent said in Continuous Replication jobs creates full backups every time since 2025-09-06 (xo from source):

@Andrew @peo
are you using purge snapshot data ?
In both case, can you try disabling it and disabling CBT on the relevant VMs ?

Yes, I was using CBT + purge snapshots, but that might not have been the reason for the sluggish transfer speeds with the later versions of XO.

I discovered that the SSD on my destination host were going bad, a write of a /dev/null-filled test file (1GB then 5GB) on the device started at expected 450MB/s, but slowed down to less than 10MB/s. Found a bunch of unexpected errors (because the disk was more or less new) in the logs by dmesg.
Doing the replication to another host made it fast even when transferred in full.
CBT+delete = still "Backup fell back to a full" (this was the third since my XO update)
CBT w/o delete = first backup=full transfer (4 min) and garbage collection (a few more minutes), second=delta transfer (a few seconds backup time)

peo

I'm currently testing the "fixed" version. At the time I ran the replications the first time, I expected and accepted it doing new "first" full backups/transfers, but even this small, singe-disk VM (the one for my XO) is still doing full transfers:

This simple operation previously taking 2-5 minutes now takes 44 minutes:

Compared to the previous "normal" backup time:

The same goes for "Admin Ubuntu 24", single but slightly larger disk (50GB) which yesterday took 4 hours (initial "new" "first" backup, so I don't know if it still takes that time)

peo

Too bad it becomes that unexplainable slow when it has to fall back to a full backup:

The first (full) backup was 10x faster:

peo

@olivierlambert said in Continuous Replication jobs creates full backups every time since 2025-09-06 (xo from source):

No, no confirmation so far, that's why we need more to investigate

Great (or maybe not) news.. At least on my system this is very easy to replicate.. On a tiny test machine that only have had a normal backup before, first time doing replication copy (to H2 SSD) it sends it as full as expected. Just did some simple operations on the machine (like apt update, no upgrade), then retried:

peo

@olivierlambert Has the problem been confirmed ? I can try to pinpoint where the always-full replications were introduced, but more time efficient to do it on a machine that is fast do replicate, not the one that now takes 2.5 hours (a few minutes for the diff before it borked starting at the backup made 6 Sept)

peo

@olivierlambert said in Continuous Replication jobs creates full backups every time since 2025-09-06 (xo from source):

git bissect

That might be quite a bunch of commits in between the working one and when I discovered the problems (I actually let it do it's thing a couple of days, quickly filling up the backup drive)
I keep only the current and the previous build of XO, so I only know it's somewhere between Aug 16 (from the 'lastlog' to find out when I was logged in) and late Sep 5.

peo

Anyone else experiencing the same problem, or is it just a problem of my imagination?

Recap:

Since 2025-09-06, all Continuous Replication jobs are creating full backups instead of incrementals.

Environment: XO from source, updated to commit c2144 (2 commits behind at the time of test).

Replication target: a standalone host that is part of the pool, on the same local network, with local SSD storage.

Logs and status screenshots were attached in my first post.

What I’ve done so far:
2025-09-11

Updated Xen Orchestra to the latest commits (as of 2025-09-11).
Restarted the XO VM and the source VM after update.

2025-09-12:

Applied pool patches as they became available.
Installed host updates and rebooted hosts.
Restarted the source and target hosts (since all hosts were restarted).
Verified behavior again today → still a full backup for the “Admin Ubuntu 24” VM.

Observation:
The issue began exactly after the 2025-09-05 updates (which included a set of host patches).
Since then, every daily replication runs as a full.

peo

Discarded my start of this post yesterday because I was too far behind with the updates. I yesterday updated to Xen Orchestra, commit c2144 (currently 2 commits behind), and the problem remains.
I have, after the update of XO restarted that machine, also updated and restarted the other (first below) machine.

Replication is transferred to a host with its own local storage on SSD.

{
  "data": {
    "mode": "delta",
    "reportWhen": "failure"
  },
  "id": "1757507401899",
  "jobId": "0bb53ced-4d52-40a9-8b14-7cd1fa2b30fe",
  "jobName": "Admin Ubuntu 24",
  "message": "backup",
  "scheduleId": "69a05a67-c43b-4d23-b1e8-ada77c70ccc4",
  "start": 1757507401899,
  "status": "success",
  "infos": [
    {
      "data": {
        "vms": [
          "1728e876-5644-2169-6c62-c764bd8b6bdf"
        ]
      },
      "message": "vms"
    }
  ],
  "tasks": [
    {
      "data": {
        "type": "VM",
        "id": "1728e876-5644-2169-6c62-c764bd8b6bdf",
        "name_label": "Admin Ubuntu 24"
      },
      "id": "1757507403766",
      "message": "backup VM",
      "start": 1757507403766,
      "status": "success",
      "tasks": [
        {
          "id": "1757507404364",
          "message": "snapshot",
          "start": 1757507404364,
          "status": "success",
          "end": 1757507406083,
          "result": "79b28f7e-ded1-ed95-4b3b-005f64e69796"
        },
        {
          "data": {
            "id": "9d2121f8-6839-39d4-4e90-850a5b6f1bbb",
            "isFull": false,
            "name_label": "Local h2 SSD new",
            "type": "SR"
          },
          "id": "1757507406083:0",
          "message": "export",
          "start": 1757507406083,
          "status": "success",
          "tasks": [
            {
              "id": "1757507408827",
              "message": "transfer",
              "start": 1757507408827,
              "status": "success",
              "end": 1757512215875,
              "result": {
                "size": 52571406336
              }
            }
          ],
          "end": 1757512216056
        }
      ],
      "warnings": [
        {
          "message": "Backup fell back to a full"
        }
      ],
      "infos": [
        {
          "message": "will delete snapshot data"
        },
        {
          "data": {
            "vdiRef": "OpaqueRef:47180590-40ab-f301-57bf-85c2fbe9b51d"
          },
          "message": "Snapshot data has been deleted"
        }
      ],
      "end": 1757512217031
    }
  ],
  "end": 1757512217032
}

{
  "data": {
    "mode": "delta",
    "reportWhen": "failure"
  },
  "id": "1757574000010",
  "jobId": "883e2ee8-00c8-43f8-9ecd-9f9aa7aa01d1",
  "jobName": "Deb12-XO",
  "message": "backup",
  "scheduleId": "19aab592-cd48-431e-a82c-525eba60fcc7",
  "start": 1757574000010,
  "status": "success",
  "infos": [
    {
      "data": {
        "vms": [
          "30829107-2a1b-6b20-a08a-f2c1e612b2ee"
        ]
      },
      "message": "vms"
    }
  ],
  "tasks": [
    {
      "data": {
        "type": "VM",
        "id": "30829107-2a1b-6b20-a08a-f2c1e612b2ee",
        "name_label": "Deb12-XO"
      },
      "id": "1757574001999",
      "message": "backup VM",
      "start": 1757574001999,
      "status": "success",
      "tasks": [
        {
          "id": "1757574002518",
          "message": "snapshot",
          "start": 1757574002518,
          "status": "success",
          "end": 1757574004197,
          "result": "73a6e7d5-fcd6-e65f-ddd8-9dd700596505"
        },
        {
          "data": {
            "id": "9d2121f8-6839-39d4-4e90-850a5b6f1bbb",
            "isFull": false,
            "name_label": "Local h2 SSD new",
            "type": "SR"
          },
          "id": "1757574004197:0",
          "message": "export",
          "start": 1757574004197,
          "status": "success",
          "tasks": [
            {
              "id": "1757574006959",
              "message": "transfer",
              "start": 1757574006959,
              "status": "success",
              "end": 1757576397226,
              "result": {
                "size": 28974252032
              }
            }
          ],
          "end": 1757576397820
        }
      ],
      "warnings": [
        {
          "message": "Backup fell back to a full"
        }
      ],
      "infos": [
        {
          "message": "will delete snapshot data"
        },
        {
          "data": {
            "vdiRef": "OpaqueRef:b0ab2f7c-6606-bc15-cdcc-f7f002f3181a"
          },
          "message": "Snapshot data has been deleted"
        }
      ],
      "end": 1757576398882
    }
  ],
  "end": 1757576398883
}

peo

@olivierlambert Thanks, that explains the "on .." part of the message, and as you mention, it's not a "problem", but just confusing in the context of that full message.
The optimal would of course be that the task was launched on the host that is "closest" to the SR, in this case a local SR on xcp-ng-2.

This one (unrelated list of previous tasks) makes the message more clear what the "on.." part is about:

peo

@olivierlambert but the "importing content" message is incorrect:
"on SR Local h2 SSD new (on xcp-ng-1)"

That's the problem (not to where it's restored to, and has to be run from because it's a local SR)..

In the message, what do "xcp-ng-1" have to do with this operation ? It's not involved at all..