XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    [Solved] SR_SOURCE_SPACE_INSUFFICIENT - Problems enabling HA

    Scheduled Pinned Locked Moved Solved XCP-ng
    10 Posts 4 Posters 85 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • J Online
      jr-m4
      last edited by jr-m4

      Next step in our lab testing was going to be to enable HA. For this we chose to go with a 500MB Fibre Channel LUN.

      • According to the docs, the minimum requirements are 356MB. https://xcp-ng.org/blog/2024/08/22/xcp-ng-high-availability-a-guide/

      However, when I try to enable HA I will find an error in the logs saying SR_SOURCE_SPACE_INSUFFICIENT. (Full log down below).

      So my question then becomes. Are the docs out of date, or could this possibly be something else under the hood?
      Note: I'm currently waiting on the storage guy to expand the SR to 1GB.

      XO CE: 5811d
      Node: 24
      Pool: Fully updated with latest patches as of may 26, 2026

      pool.enableHa
      {
        "pool": "37e7a3b9-8c45-c7f2-7d09-249a935dd33d",
        "heartbeatSrs": [
          "3efef95a-4594-5a36-a182-f2b039d51ffa"
        ],
        "configuration": {}
      }
      {
        "code": "SR_SOURCE_SPACE_INSUFFICIENT",
        "params": [
          "OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567"
        ],
        "call": {
          "duration": 21,
          "method": "pool.enable_ha",
          "params": [
            "* session id *",
            [
              "OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567"
            ],
            {}
          ]
        },
        "message": "SR_SOURCE_SPACE_INSUFFICIENT(OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567)",
        "name": "XapiError",
        "stack": "XapiError: SR_SOURCE_SPACE_INSUFFICIENT(OpaqueRef:601b3d23-c211-b853-0a9a-2c12e16a6567)
          at XapiError.wrap (file:///opt/xen-orchestra/packages/xen-api/_XapiError.mjs:16:12)
          at file:///opt/xen-orchestra/packages/xen-api/transports/json-rpc.mjs:38:21
          at runNextTicks (node:internal/process/task_queues:65:5)
          at processImmediate (node:internal/timers:472:9)"
      }
      

      Update: Changed title to better describe problem

      ronan-aR 1 Reply Last reply Reply Quote 0
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Yes the doc is outdated. It's 4GiB now.

        Ping @thomas-dkmt

        Source:

        • XAPI code (this part was updated in 2023 upstream via commit 55f6d90a2)
        • XS doc https://docs.xenserver.com/en-us/xencenter/current-release/pools-ha-requirements.html

        But there's a reason we missed it. Having a SR that full is truly a big mistake, so it never happened before (nobody will have less than 4GiB left on their NAS/SAN).

        In your case, you have a dedicated LUN for it, why?

        The fundamental issue: the heartbeat should monitor the same failure domain as the workload. A dedicated HA LUN tests a different storage path, which creates two dangerous failure modes:

        1. HA LUN up, VM LUN down: Hosts keep heartbeating happily, HA sees no problem, but VMs are stuck on dead storage. No fencing, no restart. This is the worst case: silent failure.
        2. HA LUN down, VM LUN up: HA fences hosts even though VMs are running fine. Unnecessary disruption.
        J 1 Reply Last reply Reply Quote 1
        • J Online
          jr-m4
          last edited by

          Update:

          We just expanded the SR to 2GB. And I'm still getting the same error.

          Write permissions were verified by creating a vdi on the SR. And that was successfull with no problem.

          1 Reply Last reply Reply Quote 0
          • J jr-m4 marked this topic as a question
          • olivierlambertO Offline
            olivierlambert Vates 🪐 Co-Founder CEO
            last edited by

            Question for @Team-OS-Platform-Release

            1 Reply Last reply Reply Quote 1
            • stormiS Offline
              stormi Vates 🪐 XCP-ng Team
              last edited by

              Forwarding to @Team-Storage

              1 Reply Last reply Reply Quote 2
              • ronan-aR Offline
                ronan-a Vates 🪐 XCP-ng Team @jr-m4
                last edited by

                @jr-m4 Hello, can you share your xensource.log file after using enable_ha call?

                J 1 Reply Last reply Reply Quote 0
                • J Online
                  jr-m4 @ronan-a
                  last edited by

                  @ronan-a

                  Sure

                  xensource.txt

                  1 Reply Last reply Reply Quote 0
                  • olivierlambertO Offline
                    olivierlambert Vates 🪐 Co-Founder CEO
                    last edited by

                    Yes the doc is outdated. It's 4GiB now.

                    Ping @thomas-dkmt

                    Source:

                    • XAPI code (this part was updated in 2023 upstream via commit 55f6d90a2)
                    • XS doc https://docs.xenserver.com/en-us/xencenter/current-release/pools-ha-requirements.html

                    But there's a reason we missed it. Having a SR that full is truly a big mistake, so it never happened before (nobody will have less than 4GiB left on their NAS/SAN).

                    In your case, you have a dedicated LUN for it, why?

                    The fundamental issue: the heartbeat should monitor the same failure domain as the workload. A dedicated HA LUN tests a different storage path, which creates two dangerous failure modes:

                    1. HA LUN up, VM LUN down: Hosts keep heartbeating happily, HA sees no problem, but VMs are stuck on dead storage. No fencing, no restart. This is the worst case: silent failure.
                    2. HA LUN down, VM LUN up: HA fences hosts even though VMs are running fine. Unnecessary disruption.
                    J 1 Reply Last reply Reply Quote 1
                    • J Online
                      jr-m4 @olivierlambert
                      last edited by jr-m4

                      @olivierlambert

                      Thank you for your reply.

                      The reasoning here is that we are experimenting quite heavily at this moment.
                      And the thinking here is to have three LUNs each for their intended purpose.

                      LUNs
                      1 for VHD-based VDIs
                      1 for qcow2 based VDIs
                      The ones above will be used and modified quite a bit
                      1 for the Heartbeat. Being left alone to be as standard out-of-the-box as possible.

                      All of the LUNs reside on the same storage systems (Dell PowerStore at the moment). So my resoning here is that they're all on the same storage cluster, and therefore will be affected similarily regardless. Exotic corner-cases may of course show up.

                      But I will absolutely take your recommendations into account! And when we've stopped messing with the storage for the VMs. I will have the Heartbeat there on that as well!

                      1 Reply Last reply Reply Quote 0
                      • olivierlambertO Offline
                        olivierlambert Vates 🪐 Co-Founder CEO
                        last edited by olivierlambert

                        It's fine when you experiment, and if you are using the exact same storage and storage path, it's probably a very small risk. The only one I see is the storage removing the LUN for whatever reason and HA statefile couldn't work anymore. I think it's acceptable.

                        Anyway, thanks for the report, we'll update our doc!

                        J 1 Reply Last reply Reply Quote 1
                        • J Online
                          jr-m4 @olivierlambert
                          last edited by

                          @olivierlambert

                          Thanks again for your input and recomendations! I'll verify that this is solved by having the LUN expanded to 8GB instead. Afterwards I'll mark your answer as the solution!

                          1 Reply Last reply Reply Quote 1
                          • J jr-m4 has marked this topic as solved

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post