[Solved] SR_SOURCE_SPACE_INSUFFICIENT - Problems enabling HA

jr-m4

Next step in our lab testing was going to be to enable HA. For this we chose to go with a 500MB Fibre Channel LUN.

According to the docs, the minimum requirements are 356MB. https://xcp-ng.org/blog/2024/08/22/xcp-ng-high-availability-a-guide/

However, when I try to enable HA I will find an error in the logs saying SR_SOURCE_SPACE_INSUFFICIENT. (Full log down below).

So my question then becomes. Are the docs out of date, or could this possibly be something else under the hood?
Note: I'm currently waiting on the storage guy to expand the SR to 1GB.

XO CE: 5811d
Node: 24
Pool: Fully updated with latest patches as of may 26, 2026

pool.enableHa
{
  "pool": "37e7a3b9-8c45-c7f2-7d09-249a935dd33d",
  "heartbeatSrs": [
    "3efef95a-4594-5a36-a182-f2b039d51ffa"
  ],
  "configuration": {}
}
{
  "code": "SR_SOURCE_SPACE_INSUFFICIENT",
  "params": [
    "OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567"
  ],
  "call": {
    "duration": 21,
    "method": "pool.enable_ha",
    "params": [
      "* session id *",
      [
        "OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567"
      ],
      {}
    ]
  },
  "message": "SR_SOURCE_SPACE_INSUFFICIENT(OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567)",
  "name": "XapiError",
  "stack": "XapiError: SR_SOURCE_SPACE_INSUFFICIENT(OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567)
    at XapiError.wrap (file:///opt/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)
    at file:///opt/xen-orchestra/packages/xen-api/transports/json-rpc.mjs:38:21
    at runNextTicks (node:internal/process/task_queues:65:5)
    at processImmediate (node:internal/timers:472:9)"
}

Update: Changed title to better describe problem

olivierlambert

Yes the doc is outdated. It's 4GiB now.

Ping @thomas-dkmt

Source:

XAPI code (this part was updated in 2023 upstream via commit 55f6d90a2)
XS doc https://docs.xenserver.com/en-us/xencenter/current-release/pools-ha-requirements.html

But there's a reason we missed it. Having a SR that full is truly a big mistake, so it never happened before (nobody will have less than 4GiB left on their NAS/SAN).

In your case, you have a dedicated LUN for it, why?

The fundamental issue: the heartbeat should monitor the same failure domain as the workload. A dedicated HA LUN tests a different storage path, which creates two dangerous failure modes:

HA LUN up, VM LUN down: Hosts keep heartbeating happily, HA sees no problem, but VMs are stuck on dead storage. No fencing, no restart. This is the worst case: silent failure.
HA LUN down, VM LUN up: HA fences hosts even though VMs are running fine. Unnecessary disruption.

jr-m4

Update:

We just expanded the SR to 2GB. And I'm still getting the same error.

Write permissions were verified by creating a vdi on the SR. And that was successfull with no problem.

olivierlambert

Question for @Team-OS-Platform-Release

stormi

Forwarding to @Team-Storage

ronan-a

@jr-m4 Hello, can you share your xensource.log file after using enable_ha call?

jr-m4

@ronan-a

Sure

xensource.txt

olivierlambert

Yes the doc is outdated. It's 4GiB now.

Ping @thomas-dkmt

Source:

XAPI code (this part was updated in 2023 upstream via commit 55f6d90a2)
XS doc https://docs.xenserver.com/en-us/xencenter/current-release/pools-ha-requirements.html

But there's a reason we missed it. Having a SR that full is truly a big mistake, so it never happened before (nobody will have less than 4GiB left on their NAS/SAN).

In your case, you have a dedicated LUN for it, why?

The fundamental issue: the heartbeat should monitor the same failure domain as the workload. A dedicated HA LUN tests a different storage path, which creates two dangerous failure modes:

HA LUN up, VM LUN down: Hosts keep heartbeating happily, HA sees no problem, but VMs are stuck on dead storage. No fencing, no restart. This is the worst case: silent failure.
HA LUN down, VM LUN up: HA fences hosts even though VMs are running fine. Unnecessary disruption.

jr-m4

@olivierlambert

Thank you for your reply.

The reasoning here is that we are experimenting quite heavily at this moment.
And the thinking here is to have three LUNs each for their intended purpose.

LUNs
1 for VHD-based VDIs
1 for qcow2 based VDIs
The ones above will be used and modified quite a bit
1 for the Heartbeat. Being left alone to be as standard out-of-the-box as possible.

All of the LUNs reside on the same storage systems (Dell PowerStore at the moment). So my resoning here is that they're all on the same storage cluster, and therefore will be affected similarily regardless. Exotic corner-cases may of course show up.

But I will absolutely take your recommendations into account! And when we've stopped messing with the storage for the VMs. I will have the Heartbeat there on that as well!

olivierlambert

It's fine when you experiment, and if you are using the exact same storage and storage path, it's probably a very small risk. The only one I see is the storage removing the LUN for whatever reason and HA statefile couldn't work anymore. I think it's acceptable.

Anyway, thanks for the report, we'll update our doc!

jr-m4

@olivierlambert

Thanks again for your input and recomendations! I'll verify that this is solved by having the LUN expanded to 8GB instead. Afterwards I'll mark your answer as the solution!