XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    DNS queries during backup job

    Scheduled Pinned Locked Moved Xen Orchestra
    27 Posts 6 Posters 6.0k Views 5 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ronivayR Offline
      ronivay Top contributor @Forza
      last edited by ronivay

      Could very well be. With multiple hosts and simultaneous backups, the amount of requests can easily be overwhelming if it multiplies in such scenario. I've actually created an issue about stalled backup due to failed DNS resolution back in 2019 (https://github.com/vatesfr/xen-orchestra/issues/4122). I didn't dig this deep into it then and kinda assumed it was something specific to my environment and switched to using just IP-address. Switched back to domain name at some point. Personally haven't seen failed backups due to same reason recently so could be it retries nowadays at least to some extent.

      /etc/hosts sure is also an option but as a static config it takes away the benefit you get from using DNS.

      ronivay created this issue in vatesfr/xen-orchestra

      closed Failed name resolution stalling backup #4122

      1 Reply Last reply Reply Quote 1
      • olivierlambertO Offline
        olivierlambert Vates 🪐 Co-Founder CEO
        last edited by

        Adding @florent and @julien-f in the conversation (and also @marcungeschikts )

        1 Reply Last reply Reply Quote 0
        • A Offline
          Andrew Top contributor @ronivay
          last edited by Andrew

          @ronivay Try installing nscd. It is the "name service caching daemon". It allows the OS to cache lookup for a short time. XO may have an issue of requesting repeat lookups too often but this will allow the OS to cover up that flaw.

          apt install nscd
          

          or

          yum install nscd
          
          ronivayR 1 Reply Last reply Reply Quote 0
          • ronivayR Offline
            ronivay Top contributor @Andrew
            last edited by ronivay

            This is not really an issue for me, just an observation which could possibly have all sorts of undesired effects to many XO users.

            I know multiple ways of getting around it, couple of them already mentioned here. Local caching with nscd is one more of those. Point is, this should rather be fixed in XO than worked around.

            1 Reply Last reply Reply Quote 0
            • A Offline
              Andrew Top contributor
              last edited by

              I agree it's worth looking into. I do an hourly continuous replication update and my stats show about 100 DNS requests/second (additional) for the 5 minutes that it runs.

              With NSCD installed I no longer see the burst of requests. It's not a big deal as the DNS servers regularly get 10x that number of requests and can deal with 100x that.

              It's still a XO issue and a waste of time and resources that can cause delays and failures in some cases. Some DNS servers or firewalls could see it as an attack and block requests.

              ForzaF 1 Reply Last reply Reply Quote 1
              • ForzaF Offline
                Forza @Andrew
                last edited by Forza

                Why isn't there a local caching agent in xcp/xoa? There are many caching dns server/relays available. Unbound, dnsmasq or even nscd (although it doesn't cache dns by default?).

                A 1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  Are you having this issue on XOA or on XO from the sources?

                  1 Reply Last reply Reply Quote 0
                  • A Offline
                    Andrew Top contributor @Forza
                    last edited by

                    @Forza NSCD is a local lookup caching, not a DNS cache/proxy. XO (and any normal app) request a lookup by the OS library code not DNS directly (it could use the resolver, but not normally). The host then uses its rules to lookup the records. That could be from /etc/hosts or LDAP or DNS or cache or anything that's configured as a source. XO does not need to truly cache the information but should not make repeated requests for the same records.

                    @olivierlambert I'm using XO source (current master) and have the issue...

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by

                      Okay I would be curious to see if you have a similar behavior on XOA 🙂

                      ForzaF 1 Reply Last reply Reply Quote 0
                      • ForzaF Offline
                        Forza @olivierlambert
                        last edited by

                        @olivierlambert said in DNS queries during backup job:

                        Okay I would be curious to see if you have a similar behavior on XOA 🙂

                        I can have a look at work during the week.

                        julien-fJ 1 Reply Last reply Reply Quote 1
                        • julien-fJ Offline
                          julien-f Vates 🪐 Co-Founder XO Team @Forza
                          last edited by

                          Hello,

                          I have investigated a bit, and indeed Node does not cache DNS queries and calls system methods directly (e.g. gethostbyname).

                          I've created a test branch which improves the situation: https://github.com/vatesfr/xen-orchestra/pull/6196

                          But I'm wondering if it's the right approach, maybe it this responsibility should be left to the system and we should nscd to our XOAs.

                          Let me know if you have any opinions on this or feedbacks on my branch.

                          julien-f opened this pull request in vatesfr/xen-orchestra

                          closed feat(@vates/cached-dns.lookup): small DNS cache #6196

                          ronivayR 2 Replies Last reply Reply Quote 0
                          • ronivayR Offline
                            ronivay Top contributor @julien-f
                            last edited by

                            I'll put this to test and see tomorrow what the DNS query stats look like.

                            Just my two cents but i feel like one shouldn't "fix" a flaw or bad behaviour in application by relying on external dependency to deal with it, especially if it's fixable. Sure using something like nscd in XOA would kinda fix the issue in it but wouldn't possible perf issue etc still exist in node? I'm not competent to review the code so can't say anything about the actual implementation in feature branch.

                            1 Reply Last reply Reply Quote 0
                            • olivierlambertO Offline
                              olivierlambert Vates 🪐 Co-Founder CEO
                              last edited by

                              It's not trivial to decide where to put that "frontier". XOA is meant to be an entire system, not just with XO code, but also the updater and other things.

                              For the DNS thing, I have to admit I don't know yet what's the best practice. I suppose it also depends on where do you want to stop thinking about doing "non-core" features (ie DNS caching) vs doing it internally. Should we also implement other "system" stuff? It's not trivial to answer that 🙂

                              ronivayR 1 Reply Last reply Reply Quote 0
                              • ronivayR Offline
                                ronivay Top contributor @olivierlambert
                                last edited by

                                I think the main point to focus on here is that XO is doing totally unnecessary DNS queries with excessive frequency. I don't see this as implementing a non-core feature but a fix in the logic how application figures out where to connect and how often. How exactly and what options there are is outside of my knowledge 🙂

                                ForzaF 1 Reply Last reply Reply Quote 0
                                • ForzaF Offline
                                  Forza @ronivay
                                  last edited by

                                  IMHO I don't think applications in general have internal dns caching, but they do rely on system provided functionality. So with that in mind it is sensible to use a system package rather than some fixing inside XO code. Especially considering XO can run on other platforms than XOA.

                                  A 1 Reply Last reply Reply Quote 0
                                  • A Offline
                                    Andrew Top contributor @Forza
                                    last edited by

                                    @Forza I agree that the OS is responsible for caching host records. The real question is why is XO doing so many lookups repeatedly. Maybe it is actually a Node problem (in addition to code issues).

                                    In most applications once a socket is opened to a host it stays open and does not need to do another lookup until it is closed and a new connection is made. If XO or Node is stateless and opens a new connection for each block read/write (or group of blocks) then it may do a lot of lookups. The mass lookups seems to be a sign of a lot of overhead that could be reduced to improve performance.

                                    Yes, nscd can be a host query (DNS) cache solution (for XO source and XOA) but can the code be improved to reduce overhead and improve general performance?

                                    Here is a quick MRTG image of DNS requests. You can see when I enabled nscd that caches lookup requests (hint, sunday night):
                                    dns-requests-week.jpg

                                    olivierlambertO 1 Reply Last reply Reply Quote 2
                                    • olivierlambertO Offline
                                      olivierlambert Vates 🪐 Co-Founder CEO @Andrew
                                      last edited by

                                      @Andrew said in DNS queries during backup job:

                                      If XO or Node is stateless and opens a new connection for each block read/write (or group of blocks) then it may do a lot of lookups. The mass lookups seems to be a sign of a lot of overhead that could be reduced to improve performance.

                                      I agree that's a good question (for @julien-f I assume)

                                      1 Reply Last reply Reply Quote 1
                                      • ronivayR Offline
                                        ronivay Top contributor @julien-f
                                        last edited by

                                        @julien-f this changed the situation from thousands of queries in minutes to no noticeable spike in query graphs during backup job, so huge improvement.

                                        1 Reply Last reply Reply Quote 1
                                        • H Offline
                                          hoerup
                                          last edited by

                                          Although it is nice that there is work arounds for the DNS spikes with either nscd or the in-process DNS cache, i think the DNS spikes are a symptom of a whole different issue.

                                          I think we can safely assume that each DNS lookup is corresponding to one attempt at establishing a TCP connection then there is some code somewhere that spawns an awfull lot of short lived connections instead of reusing / pooling them - with all the issues that follows in that area (insufficient ulimit NOFILE, connections in TIME_WAIT/exhausting of client ports etc)

                                          julien-fJ 1 Reply Last reply Reply Quote 2
                                          • julien-fJ Offline
                                            julien-f Vates 🪐 Co-Founder XO Team @hoerup
                                            last edited by

                                            @hoerup I agree with your analysis, not sure how easy it will be to fix, we'll investigate.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post