XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Adding new host to pool fails - Stunnel SSL certiticate verification failure

    Scheduled Pinned Locked Moved Solved XCP-ng
    16 Posts 4 Posters 627 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B Offline
      Bryanvh
      last edited by Danp

      Posting this here because in the hopes someone has an answer and that this helps anyone else encountering the issue.

      I have a pool of a few hosts which I recently upgraded to XCP-NG 8.3 from 8.2. And, now, I am attempting to add a new host to this pool to increase my resource capacity. However, after adding the new server in Xen Orchestra, I go to my primary pool to begin the process of adding the new server but that fails with an error "Internal_Error(Stunnel.Stunnel [some text that runs off the screen] routines::certificate verify failed"))"

      The full error is as follows:

      "Stunnel.Stunnel_verify_error("0A000086:SSL routines::certificate verify failed")"

      And the complete readout of the event is as follows:

      {
        "id": "0mpn7bwnk",
        "properties": {
          "method": "pool.mergeInto",
          "params": {
            "sources": [
              "65c279b5-5a9d-db33-92f1-3f057fbafda6"
            ],
            "target": "f735841b-af37-0547-5d1e-8cb11bc51f0d",
            "force": true
          },
          "name": "API call: pool.mergeInto",
          "userId": "905ebdb9-6698-4902-8e60-9a028d1aa441",
          "type": "api.call"
        },
        "start": 1779834203408,
        "status": "failure",
        "updatedAt": 1779834206165,
        "end": 1779834206165,
        "result": {
          "code": "INTERNAL_ERROR",
          "params": [
            "Stunnel.Stunnel_verify_error(\"0A000086:SSL routines::certificate verify failed\")"
          ],
          "call": {
            "duration": 2713,
            "method": "pool.join_force",
            "params": [
              "* session id *",
              "192.168.1.11",
              "root",
              "* obfuscated *"
            ]
          },
          "message": "INTERNAL_ERROR(Stunnel.Stunnel_verify_error(\"0A000086:SSL routines::certificate verify failed\"))",
          "name": "XapiError",
          "stack": "XapiError: INTERNAL_ERROR(Stunnel.Stunnel_verify_error(\"0A000086:SSL routines::certificate verify failed\"))\n    at Function.wrap (file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/_XapiError.mjs:16:12)\n    at file:///usr/local/lib/node_modules/xo-server/node_modules/xen-api/transports/json-rpc.mjs:38:21\n    at runNextTicks (node:internal/process/task_queues:60:5)\n    at processImmediate (node:internal/timers:454:9)\n    at process.callbackTrampoline (node:internal/async_hooks:130:17)"
        }
      }
      

      Obviously, it's unhappy about the certs. But I can't figure out why. For additional context, I have never messed with the certs on these servers previously. Based on some other forum posts, I went and checked the cert at /etc/stunnel/xapi-stunnel-ca-bundle.pem on the pool master as well as this new host. Seeing that it exists but unsure of whether it was still integral, I even ran xe host-refresh-server-certificate host=hostname on both just in case. Despite that, this error persists. Does anyone have any insight into the error or a possible fix from what they may have encountered themselves previously?

      1 Reply Last reply Reply Quote 0
      • LucienLassalleL Online
        LucienLassalle Vates 🪐 XCP-ng Team Security Team @Bryanvh
        last edited by

        @Bryanvh I think I've managed to reproduce the issue. The fact that the master's certificate is missing from /etc/stunnel/certs-pool/ seems to be the problem.

        On the master, run xe host-refresh-server-certificate host=$(hostname) and then xe pool-certificate-sync.

        Then, if you run ls -l /etc/stunnel/certs-pool, you should see a certificate with the same name as your master's UUID. It should end with .pem. If it ends with .new.pem, I recommend copying the certificate, removing the .new (which can apparently cause problems).

        You should then be able to join the pool from your host.

        I hope this worked. Please let me know if it works.
        Respectfully,

        B 1 Reply Last reply Reply Quote 1
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          Ping @Team-OS-Platform-Release

          1 Reply Last reply Reply Quote 1
          • semarieS Offline
            semarie Vates 🪐 XCP-ng Team XAPI & Network Team
            last edited by

            Just my 2 cents, but with SSL involved time is important: could you check the date is accurate on the two hosts ?

            having the output of the following commands might help too:

            • stat /etc/stunnel/xapi-stunnel-ca-bundle.pem
            • openssl x509 -in /etc/stunnel/xapi-stunnel-ca-bundle.pem -noout -text
            B 1 Reply Last reply Reply Quote 2
            • B Offline
              Bryanvh @semarie
              last edited by

              @semarie

              Maybe this points at an issue. It looks like the cert file is empty? And this is after I ran that command to refresh the cert. I get this same output for both the pool master and the host I am trying to add.

              84fc3624-7777-4f6a-b81f-c09586a63d05-image.jpeg

              Then the openssl x509 command says it's unable to load the cert or read it. I assume that's because it's empty?

              As for the time and date, yes the pool master and this server are in sync. At first, I had forgotten to set the new host to use the NTP pool during setup and Xen Orchestra helpfully yelled at me about that. Haha

              1 Reply Last reply Reply Quote 0
              • semarieS Offline
                semarie Vates 🪐 XCP-ng Team XAPI & Network Team
                last edited by

                Yes, if the file is empty, it is expected to the openssl x509 command to fail.
                Does is it the same on the master ?

                B LucienLassalleL 2 Replies Last reply Reply Quote 0
                • B Offline
                  Bryanvh @semarie
                  last edited by

                  @semarie
                  Yes. This screenshot is from the pool master. But, both it and the new host had the same output.

                  For clarity's sake, I have never applied an SSL cert to these hosts. This seems to be whatever built-in certs the system is using and signing.

                  Is there a way to fix these certs? Was the xe host-refresh-server-certificate host=hostname command not the correct command to fix this?

                  1 Reply Last reply Reply Quote 0
                  • semarieS Offline
                    semarie Vates 🪐 XCP-ng Team XAPI & Network Team
                    last edited by

                    Sorry, but it is outside my competence zone. I prefer to not tell you to try something that I don't know the exact consequences of.

                    Does someone else could reply ?

                    1 Reply Last reply Reply Quote 0
                    • LucienLassalleL Online
                      LucienLassalle Vates 🪐 XCP-ng Team Security Team @semarie
                      last edited by

                      @semarie I'll try to investigate to help you.

                      Is it possible to run:

                      • stat /etc/xensource/xapi-pool-tls.pem
                      • openssl x509 -in /etc/xensource/xapi-pool-tls.pem -noout -text
                      • stat /etc/xensource/xapi-ssl.pem
                      • openssl x509 -in /etc/xensource/xapi-ssl.pem -noout -text

                      (This file must exist; if not, I'd like the output of cat /etc/stunnel/xapi.conf.)
                      And I'd like the same output for /etc/xensource/xapi-ssl.pem.

                      If the certificate for /etc/xensource/xapi-pool.tls.pem has expired or it's empty, you can run:
                      xe host-refresh-server-certificate host=$(hostname)
                      If the certificate for /etc/xensource/xapi-ssl.pem has expired or it's empty, you can run:
                      xe host-emergency-reset-server-certificate

                      After running one of the two commands above, I recommend to do: xe-toolstack-restart
                      (This should indeed restart the stunnel@xapi.service)

                      I hope this helps.

                      B 1 Reply Last reply Reply Quote 0
                      • B Offline
                        Bryanvh @LucienLassalle
                        last edited by

                        @LucienLassalle

                        Here's the output from the pool master.
                        The xapi-pool-tls cert at least isn't empty.
                        2c80406a-8398-45bc-aa49-ce16acae9912-image.jpeg

                        And it still appears to be valid
                        0db145ec-4a02-41e2-9b82-aa79524ba966-image.jpeg

                        The xapi-ssl cert also looks correct and un-expired
                        b2004ec3-cd55-4c4a-a9c8-58cb2f01deb2-image.jpeg
                        5e0a2bc4-54b7-42f1-ba54-5007f670550c-image.jpeg

                        LucienLassalleL 1 Reply Last reply Reply Quote 0
                        • LucienLassalleL Online
                          LucienLassalle Vates 🪐 XCP-ng Team Security Team @Bryanvh
                          last edited by

                          @Bryanvh Thank you for your feedback,
                          Your previous certificates look correct. I have not been able to reproduce the issue on my side, but I will try to diagnose it based on the code.

                          [MASTER]
                          I have a few preliminary commands. The first one is to retrieve the MASTER_UUID:
                          cat /etc/xensource-inventory | grep INSTALLATION_UUID | cut -d'=' -f2 | tr -d "'"

                          Then we can compare fingerprints between the master certificate and the one stored for the pool:
                          openssl x509 -in /etc/xensource/xapi-pool-tls.pem -noout -fingerprint -sha256
                          openssl x509 -in /etc/stunnel/certs-pool/{MASTER_UUID}.pem -noout -fingerprint -sha256
                          (please replace {MASTER_UUID} with the value retrieved above)

                          Normally, both fingerprints should match.
                          Also check that the CA bundle exists and is not empty:
                          ls -l /etc/stunnel/xapi-pool-ca-bundle.pem

                          If you previously ran:
                          xe host-refresh-server-certificate
                          you should probably run:
                          xe pool-certificate-sync

                          [JOINER]
                          Based on the code, the first phase has already been completed. You should therefore have files under /etc/stunnel/certs-pool/, including the master certificate:
                          openssl x509 -in /etc/stunnel/certs-pool/{MASTER_UUID}.pem -noout -fingerprint -sha256

                          [Additional checks]
                          Are all hosts synchronized to the same NTP server? date & timedatectl
                          Are all hosts fully updated to XCP-ng 8.3 and rebooted after updates?
                          Do you see the same error when joining the pool using XCP-ng (via Console or CLI) instead of Xen Orchestra?
                          Is there any more detailed error in /var/log/xensource.log ?
                          How many hosts are in your pool?
                          Is stunnel running correctly on all hosts? systemctl status stunnel@xapi

                          Do certificate chains validate correctly?
                          openssl verify -CAfile /etc/stunnel/xapi-pool-ca-bundle.pem /etc/stunnel/certs-pool/{MASTER_UUID}.pem

                          Respectfully,

                          B 1 Reply Last reply Reply Quote 0
                          • B Offline
                            Bryanvh @LucienLassalle
                            last edited by

                            @LucienLassalle
                            I'm not sure if this points toward an issue but, when running the openssl command to check the pool cert using the UUID checked first here, I get this error
                            b65969a9-ec32-4b94-9aef-6ed1fe1e202a-image.jpeg

                            I get the same error when trying to check for the pool cert on the host that is trying to join the pool. Even if the pool cert was copied to the joining host, if this points to an issue with that cert, then I suppose that might be the cause of the error?

                            For the additional questions:
                            Yes, they are time synchronized and are all using pool.ntp.org
                            Yes, they are all up to date. 3 of the hosts (the existing pool) were previously on 8.2 but were updated to 8.3 and the new host I am trying to join was set up fresh on 8.3.
                            Yes, the stunnel service reports that it is running correctly.

                            And, as expected based on the previous error, verifying the cert fails with the same error as shown when trying to check the pool's cert fingerprint.

                            Here's what I see in the logs after trying to join the host to the pool:
                            Pool Master
                            26265d3f-bb1b-44cb-b8b4-901a30c0a18e-image.jpeg
                            Joining Host
                            be868a34-1efb-4809-90bb-c199982231ea-image.jpeg

                            LucienLassalleL 1 Reply Last reply Reply Quote 0
                            • LucienLassalleL Online
                              LucienLassalle Vates 🪐 XCP-ng Team Security Team @Bryanvh
                              last edited by

                              @Bryanvh I think I've managed to reproduce the issue. The fact that the master's certificate is missing from /etc/stunnel/certs-pool/ seems to be the problem.

                              On the master, run xe host-refresh-server-certificate host=$(hostname) and then xe pool-certificate-sync.

                              Then, if you run ls -l /etc/stunnel/certs-pool, you should see a certificate with the same name as your master's UUID. It should end with .pem. If it ends with .new.pem, I recommend copying the certificate, removing the .new (which can apparently cause problems).

                              You should then be able to join the pool from your host.

                              I hope this worked. Please let me know if it works.
                              Respectfully,

                              B 1 Reply Last reply Reply Quote 1
                              • B Offline
                                Bryanvh @LucienLassalle
                                last edited by

                                @LucienLassalle

                                Thanks for the quick response and the effort in recreating the issue!

                                It all played out exactly as you laid it out, even the cert showing up as a .new.pem at first.

                                Out of curiosity, what in your testing did result in causing this issue? Is it possible that my upgrade from 8.2 to 8.3 may have caused the underlying issue?

                                LucienLassalleL 1 Reply Last reply Reply Quote 1
                                • LucienLassalleL Online
                                  LucienLassalle Vates 🪐 XCP-ng Team Security Team @Bryanvh
                                  last edited by

                                  @Bryanvh Looking at the code, I saw that an exchange was taking place via this certificate.

                                  So when you told me that the master certificate was missing, I tried to put myself in the same situation as you (by removing the certificate) and trying to join the pool.
                                  Having encountered the same error as you, I determined that running these commands fixed the problem.

                                  Indeed, I think the upgrade from 8.2 to 8.3 is the cause. To be more precise, a change occurred in the XAPI during the certificate exchange in version 8.2, and I think it's possible that your 8.2 host wasn't up to date when it upgraded to 8.3 (I'm not sure).

                                  In any case, I'm glad your problem is solved.

                                  B 1 Reply Last reply Reply Quote 2
                                  • B Offline
                                    Bryanvh @LucienLassalle
                                    last edited by

                                    @LucienLassalle

                                    Interesting. I'm not sure I was all the way up to date when I upgraded to 8.3 and it's possible I was a month or two behind. I only upgraded because I ran across a need for the virtualized TPM support (which is cool to see implemented!).

                                    Thanks again for all the effort in looking at this!

                                    LucienLassalleL 1 Reply Last reply Reply Quote 0
                                    • LucienLassalleL LucienLassalle marked this topic as a question
                                    • LucienLassalleL LucienLassalle has marked this topic as solved
                                    • LucienLassalleL Online
                                      LucienLassalle Vates 🪐 XCP-ng Team Security Team @Bryanvh
                                      last edited by

                                      @Bryanvh No problem 🙂

                                      The issue you encountered wasn't very clear. Therefore, I've proposed a change to the XAPI to make the error more explicit (this will likely be implemented in future XAPI releases).

                                      So instead of SSL Certification failure the message will be: POOL_JOINING_MASTER_CERTIFICATE_NOT_IN_POOL_BUNDLE.

                                      Thank you very much for your patience and for bringing this issue to our attention.

                                      References:
                                      https://github.com/xapi-project/xen-api/pull/7112

                                      LucienLassalle opened this pull request in xapi-project/xen-api

                                      closed xapi: Improve error reporting when pool join fails on TLS verification #7112

                                      1 Reply Last reply Reply Quote 0

                                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                                      With your input, this post could be even better 💗

                                      Register Login
                                      • First post
                                        Last post