XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Very scary host reboot issue

    Scheduled Pinned Locked Moved XCP-ng
    60 Posts 6 Posters 15.6k Views 7 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      darabontors @tuxen
      last edited by

      @tuxen

      1. Absolutely no idea how to do this in Windows. I looked for any MTU setting but couldn't find any.
      2. This is not a viable workaround for me, maybe it would be useful to pin the issue to the xen PV driver, maybe I'll do some more testing on spare hardware.
      3. I read this, but I don't know how to test it. I didn't have any manual MTUs set so I don't know what values were before the update.

      What most definitely fixed the issue for me was using PCIe passthrough for the WAN interface. I used a 10 GbE NIC. It uses the ix driver (ix0) so IDK if this is related. Somehow PPPoE + WG + Windows Client on the virtual interface (Xen PV driver) in OPNsense produces this issue.
      At the moment I am happy with this mitigation.

      I'm a little spread thin with free time at the moment. Anyone care to test this further?

      M 1 Reply Last reply Reply Quote 0
      • W wttw referenced this topic on
      • M Offline
        Mark @darabontors
        last edited by

        Hi there,

        is there any more news on this topic? 'Cause I've got a similar one.

        I'll try to explain the setup:

        "Model A":

        • one XCP-ng Host v8.0 (no typo, old version)

        • PX82 (machine rented from Hetzner datacenter in Falkenstein, Germany

        • Dom0 acting as an IPv4 router (maps assigned IPv4 subnet to one of the xapi Interfaces)

        • an OPNsense Firewall "24.1.6-amd64 FreeBSD 13.2-RELEASE-p11 OpenSSL 3.0.13" as a VM

        • OPNsense does all the external-to-internal stuff, OpenVPN and IPsec

        • OPNsense "sees" 4 virtual NICs (xn0-3) and has Guest Tools installed for reporting back to XCP-ng

        • XCP-ng handles VLAN tagging/untagging

        • IPv4 routing only

        • network driver is e1000e (Intel Corporation Ethernet Connection (7) I219-LM (rev 10))

        • very, very stable (currently at 673 days uptime)

        • LAST CHANGE: added IPv4/IPv6 dual stack operation, enabling IPv6 in Dom0, IPv6 Forwarding, IPv6 Static Network in OPNsense

        • works flawlessly, still no crash

        "Model B":

        • one XCP-ng Host v8.2.1

        • AX101 machine rented from Hetzner datacenter in Falkenstein, Germany

        • Dom0 acting as an IPv4 router (maps assigned IPv4 subnet to one of the xapi Interfaces)

        • an OPNsense Firewall "24.1.6-amd64 FreeBSD 13.2-RELEASE-p11 OpenSSL 3.0.13" as a VM

        • OPNsense does all the external-to-internal stuff and OpenVPN (no IPsec)

        • OPNsense "sees" 4 virtual NICs (xn0-3) and has Guest Tools installed for reporting back to XCP-ng

        • XCP-ng handles VLAN tagging/untagging

        • IPv4 routing only

        • network driver igb (Intel Corporation I210 Gigabit Network Connection (rev 03))

        • WAS very stable (hundreds of days uptime)

        • LAST CHANGE: added IPv4/IPv6 dual stack operation, enabling IPv6 in Dom0, IPv6 Forwarding, IPv6 Static Network in OPNsense

        • now HOST crashes (reboots) about twice a day

        I'm not using WireGuard, but I DO use identical OPNsense versions on two different hosts with different XCP-ng versions. And problems started to occur only after activating IPv6 on the "Model B" machine.

        I noticed both Intel NICs using different drivers. Is the "igb" driver a possible source of the problem (here: in conjunction with IPv6) and could/should I switch the "Model B" system to e1000e (as long as there is no risk of the host staying offline after reboot)?

        I do have a server status report exported from XCP-ng Center, but I'm not very experienced in reading it - there is so much stuff inside and I'm sure, most of it is irrelevant.

        1 Reply Last reply Reply Quote 0
        • olivierlambertO Offline
          olivierlambert Vates 🪐 Co-Founder CEO
          last edited by

          IIRC, a fix was released preventing the issue to occur again.

          Note there's no IPv6 support in Dom0 in 8.2 (only in 8.3), so I'm not sure how did you ended configuring v6 on 8.2 🤔

          M 2 Replies Last reply Reply Quote 0
          • M Offline
            Mark @olivierlambert
            last edited by

            Hi Olivier!

            @olivierlambert said in Very scary host reboot issue:

            IIRC, a fix was released preventing the issue to occur again.

            Note there's no IPv6 support in Dom0 in 8.2 (only in 8.3), so I'm not sure how did you ended configuring v6 on 8.2 🤔

            XCP-ng v8.2.1 may not have real IPv6 support (and remains ignorant, uses a single IPv4 for management), but CentOS in Dom0 does. Activation:

            in /etc/sysctl.d/90-net.conf:

            # Enable IPv6 on interfaces
            net.ipv6.conf.all.disable_ipv6 = 0
            net.ipv6.conf.default.disable_ipv6 = 0
            
            # Enabling IPv4 forwarding
            net.ipv4.ip_forward = 1
            
            # ENABLE IPv6 forwarding
            net.ipv6.conf.all.forwarding = 1
            net.ipv6.conf.default.forwarding = 1
            

            /etc/rc.local addition:

            # SAFETY: generous grace time period for XCP-ng xapi network start (allow xenbr0 to come up)
            sleep 60
            
            # activate additional IPv4 subnet
            
            ip addr add xxx.251.xxx.1/28 dev xapi0
            
            # prevent broadcasts leaking out externally
            iptables -A FORWARD -m pkttype --pkt-type broadcast -i xenbr0 -j DROP
            
            # IPv4 done
            
            # activate IPv6 router address on xapi0
            ip addr add 2a01:XXX:XXX:8041:ffff::2/127 dev xapi0
            # add IPv6 default gw on xenbr0 (this is link-local)
            ip -6 ro add default via fe80::1 dev xenbr0
            # add IPv6 route for our /64 towards OPNsense
            ip -6 ro add 2a01:XXX:XXX:8041::/64 via 2a01:XXX:XXX:8041:ffff::3 dev xapi0
            
            # IPv6 done
            

            This works flawlessly. The OPNsense VM is the ::3 destination of the /64 route.

            As for the fix you mentioned, is that specific to 8.3 or is it also avaiable for 8.2.1? Can you point me to the relevant information?

            Thanks a lot for your quick response.

            1 Reply Last reply Reply Quote 0
            • olivierlambertO Offline
              olivierlambert Vates 🪐 Co-Founder CEO
              last edited by olivierlambert

              This works flawlessly.

              Obviously it does not 😄 It's absolutely not supported and I have no idea the impact it could have with the existing network stack. XCP-ng is NOT your average Linux distro, doing this kind of modification sends you in the twilight zone.

              To get an idea on the amount of work needed: https://xcp-ng.org/blog/2021/02/09/ipv6-in-xcp-ng/ (and this article is 3 years old, it doesn't count the huge amount of tests and bugs we found since then).

              So if you need IPv6 support in the Dom0, you have to rely on 8.3 and feedback/bug reports are very welcome 🙂

              M 1 Reply Last reply Reply Quote 0
              • M Offline
                Mark @olivierlambert
                last edited by

                @olivierlambert said in Very scary host reboot issue:

                This works flawlessly.

                Obviously it does not 😄 It's absolutely not supported and I have no idea the impact it could have with the existing network stack. XCP-ng is NOT your average Linux distro, doing this kind of modification sends you in the twilight zone.

                To get an idea on the amount of work needed: https://xcp-ng.org/blog/2021/02/09/ipv6-in-xcp-ng/ (and this article is 3 years old, it doesn't count the huge amount of tests and bugs we found since then).

                So if you need IPv6 support in the Dom0, you have to rely on 8.3 and feedback/bug reports are very welcome 🙂

                The linked article refers mainly to management and other "internal" XCP-ng handling of IPv6.

                I don't see the relevance, since I'm just passing through IPv6 within the Linux kernel without XCP-ng itself interacting with it.

                Mind you, it works crash-free on the "Model A" server and that is very old XCP-ng 8.0, while XCP-ng 8.2.1 on "Model B" crashes. So XCP-ng 8.0 has better IPv6 compatibility? 🤔

                XCP-ng itself only accesses IPv4. Isn't that the point of "Dual Stack": Two worlds co-existing. Can be used together but also completely separately. A process can freely choose the address family it wants to use.

                1 Reply Last reply Reply Quote 0
                • olivierlambertO Offline
                  olivierlambert Vates 🪐 Co-Founder CEO
                  last edited by

                  To rephrase: we support IPv6 in guests, but we do not support any modification on the Dom0. This doesn't prevent you to do it, but then it's harder to know what's the problem (maybe it's related, maybe not?).

                  Anyway, your issue might worth another thread since I think the original problem was solved here 🙂

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    Mark @olivierlambert
                    last edited by

                    @olivierlambert said in Very scary host reboot issue:

                    IIRC, a fix was released preventing the issue to occur again.

                    You still have not told my anything about that fix you mentioned. Care to explain instead of teasing, but not telling? 😉

                    1 Reply Last reply Reply Quote 0
                    • olivierlambertO Offline
                      olivierlambert Vates 🪐 Co-Founder CEO
                      last edited by olivierlambert

                      I am very very busy so I don't have time to make a search by myself but maybe someone else around with few minutes could point you to the blog post talking about this

                      edit: found it in few sec luckily: https://xcp-ng.org/blog/2024/01/26/january-2024-security-update/

                      M 1 Reply Last reply Reply Quote 0
                      • M Offline
                        Mark @olivierlambert
                        last edited by

                        @olivierlambert said in Very scary host reboot issue:

                        I am very very busy so I don't have time to make a search by myself but maybe someone else around with few minutes could point you to the blog post talking about this

                        edit: found it in few sec luckily: https://xcp-ng.org/blog/2024/01/26/january-2024-security-update/

                        Thanks. I'll check this out.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post