Posts made by bleader | XCP-ng and XO forum

bleader

I think whatever solution suits you will work.

Personally, if I know there are issues with it, I would tend to disable it in the bios, to be sure nobody tries to use it later and waste their time, in a enterprise settings, that can be important.

One thing to keep in mind if keeping it, is that if you want to add other hosts to the pool, they will need to have similar network topology, so if you endup having eth0 and eth1 with your current management network on eth1, any new host should be able to have its management on eth1 as well. You may work around it with interface renaming, but that tends to get messy over time.

That being said, I'm unsure even removing the realtek nic from the bios will change the interface number now that eth1 exists already and is configured.

If you don't plan to add hosts to the pool, and don't have a team with people that may act on these machines in the future without being aware of this setup history, leaving it connected and disabling the port on switch should not be an issue.

bleader

@dnikola said in [HELP] XCP-ng 4.17.5 dom0 kernel panic — page fault in TCP stack, crashdump attached:

Has anyone experienced similar page faults in the dom0 TCP stack on 4.19 kernels or XCP-ng 4.17.5?

Not that I know of.

Are there any known issues with network drivers on this kernel/hypervisor combo?

No, there can be issues with some drivers, you should have specified which network NICs and drivers you are using.

Would you recommend moving to a newer dom0 kernel or hypervisor build?

On XCP-ng, the latest version is 8.3 which you didn't specify in your post, but you're using the latest version of Xen, so I assume it is an up to date 8.3, so there is no newer build.

Could a memory issue cause this specific kind of page table inconsistency during a kernel panic?

Yes, it can be a bug in the the code, but it absolutely could be a hardware issues.

Any advice on additional debug steps or log files I should collect next time?

I would start by running a memtest on that host to make sure the memory is not having issues.

Do you know if there was a specific VM doing something specific at that time? We had some issues in the past with FreeBSD VMs using wireguard, but it does not look similar, and it should be fixed now.
What kind of guests were running on that host? linux, windows, some BSD based?
If running windows guests please be sure to have read this blog post and ensure to comply with the guidelines there.

From a quick look, I don't see anything obvious. Follow Olivier's suggestion first, if you still have issues after that, you can share an additional report using xen-bugtool -y. But please be sure to update your bios first, check your memory, and then do that.

bleader

ping @Team-Hypervisor-Kernel

bleader

@JBlessing as it looks like it does start, it looks like the networking side is working, at least at first.

Just for debugging purpose you could try to switch that VM to BIOS instead of UEFI if it is possible, maybe it is related to what the pxe is starting in the VM.

You could also try switching the VM between realtek and e1000 NIC, at this stage, PV drivers are not there so it is using an emulated NIC, maybe the image your PXE starts doesn't like the one you're using and it gets stuck somehow.

As you're already using it with vmware, I assume you know how to size your VM, but if you went for a tight RAM value for this VM, you could try to give it more RAM to see if that could be related, as everything has to fit in RAM at some point, we may be using more at startup than vmware…

Hope one of this can help

bleader

@dmz0001 In theory, XCP-ng 8.3 is mostly on par with XenServer 8.4, it's only a naming thing that they did as their previous naming was confusing some of their customers. Long story short, XCP-ng ~= XenServer 8.4. So what you see in the compatibility list of XenServer 8.4 is the right reference for XCP-ng 8.3.

Unfortunately we do not have much feedback on Zen 5 at this time on our side.

bleader

I don't think there is such a thing (if I understood your question properly), I have a simple script to create a network, add a vlan to it, create a vif and add it to a given VM. That's using xe commands directly on a single host.

From my understanding here you'll need to parse the networks and vlans, and recreate them on another pool, but that will need to be done outside of a host/pool as you need to recreate them on another pool.

I guess you could go the ssh + key way, that would be my first instinct. But it may also be doable, and maybe easier through XO's API or xo-cli if you do not want to mess with an API directly and prefer a shell script

bleader

The original problem could be a known issue when creating bond including the management interface that we have to investigate. Although the emergency network reset should have fixed that, so maybe it is a mix of the bond creation issue and MTU issue.

In the reinstalled pool, did you create the bonds already? If so I would think changing the MTU should be fine, especially as it worked on other PIF, but with MTU issue it is often quite sneaky, so I would not make any promises either.

bleader

Update published: https://xcp-ng.org/blog/2025/05/14/may-2025-security-update-for-xcp-ng-8-2-8-3/

Thank your for the tests.

bleader

Update published: https://xcp-ng.org/blog/2025/05/14/may-2025-security-update-for-xcp-ng-8-2-8-3/

Thank your for the tests.

bleader

Sorry for the delay, I'm a bit swamped. That does not ring a bell to me right now.

What is you setup like? How many pools, how many host per pool, is there bond on some of them?

Then, more to debug what was actually created as you stated there network does exist but there is no traffic:

On your hosts:

xe network-list to get the uuid of one of these private networks you created
xe network-param-list uuid=<netwok-uuid> should tell you in which bridge they are
ovs-vsctl show shows all bridges and their ports, in there you should be seeing the bridge you found in previous step. This bridge should have:
- a port with type vxlan and options in which you have a remote_ip to the network center
- the VIFs for the VMs

On your VM:

is that network assigned to the VMs?
do the VM have new devices created in ip link or similar when you attached the network?
is there any error in VMs dmesg on device creation?

We'll see from there if we can get an idea of what is happening.

bleader

Home host updated successfully, no issue.

bleader

Home host, no XOSTOR, updated fine, no issue with my usual VMs.

bleader

@nick.lloyd As mentionned by Olivier the documentation has a mail to contact us, which will create a ticket internally, unrelated to support, that will reach the security team directly.

bleader

The question that I'm asking here is how does the Vates Team evaluate these vulnerabilities, Qualys, Greenbone, something else?

I'm not sure what you mean by evaluate vulnerability, especially the list about Qualys, Greenbone…

If you mean how do we track and process them, I cannot talk about XO side, but I can shed some light on XCP-ng side:

we have an internal dependency-track (DT) with various projects (8.2 default install, 8.2 available packages, same split for 8.3), with a custom SBOM generation, to feed DT
- this is based on CVEs and their Common Platform Enumeration (CPE)
- the main issue here is that not all CVEs fill the CPEs the same way, so there may be some misses
- we're trying to improve the SBOM generation to minimize this
we also monitor the oss-security mailing list, and some other sources
DT reports the CVEs that matched, and we can keep them in or mark them internally as not impacted, fixed, etc
we evaluate the priority for us based on their general criticality, but modulate this depending on if it is in base install or not, if it is a part of the software that is meant to be used as a server, if it related to remote acces, and more
the one we're impacted by and feel are important, we either update to the latest package version, but now that CentOS 7 is end of life, that's less likely to happen, or try to backport the fix ourselves when possible.

That's for the dom0 side, on the hypervisor side, we're part of the security list of the Xen Project, so we receive the XSAs and integrate them as fast as we can in following our release process, sometime integrating the patches ourselves, sometime going with the XenServer fixes. If we integrate them ourselves we most of the time remove our own integration and move to the one from XenServer as the people working on these fixes are mostly the ones working on the XSAs in the first place so they have a better knowledge and insigts than us.

I hope this answers this question.

Is the Vates team open to the community reporting these vulnerabilities openly or would a ticket be best?

On XCP-ng side, everything that are packages from open source would be reports of publicly disclosed CVEs, so you can openly report them. If people were to find new vulnerabilities it would depend, but should follow a classic private disclosure in the first place:

if it's in an open source package, the upstream would be the best place to do so
On the same idea if it's regarding Xen, XAPI, or other Xen Project software, reporting them upstream through the security process is the best way, and it could be nice to drop us a ticket for a heads up too, but that's not mandatory
if it is for some of the packages directly coming from us, creating a ticket for us to be able to work on it before a public disclosure would be best.

Sorry, you asked about the whole ecosystem, but I'm only able to answer from the XCP-ng side of things.

bleader

Running tcpdump switches the interface to promiscuous to allow all traffic that reaches the NIC to be dumped. So I assume the issue you had on your switches allowed traffic to reach the host, that was forwarding it to the VMs, and wasn't dropped because tcpdump switched the VIF into promiscuous mode.

If it seems resolved, that's good, otherwise let us know if we need to investigate further on this

bleader

@carldotcliff if you are 100% positive you see traffic on the VM that should not reach them, it is worth opening a ticket as this is not an intended behavior. If you do, tell in the ticket that this was discussed in the forum with David (me), so our support team can assign it to me if they want to.

For the dropped packets, I do not see any on my home setup, which is a pretty "small" network, in our lab, we do have some on our hosts. On bigger network, that could be pretty much anything, broadcast or multicast reaching the host that the NIC is chosing to drop itself, some NIC will also drop some discovery protocol frames, it would be hard to identify unfortunately, but that would not worry me as long as it is not a high count and not impacting performances.

bleader

I think the promisc mode is due to the fact the interfaces end up in OVS bridges, without that, the traffic coming from the outside to the VMs MAC addresses would be dropped.

Once it reach the OVS bridge the interface is in, it is up to OVS to act as a switch and only forward packets to the MAC he knows on its ports so all the traffic should not be forwarded to all the VIFs.

I just tested on 8.2 and 8.3:

tcpdumpping icmp on 2 VMs, pinging VM1 does not show traffic on VM2, pinging VM2 does not show traffic on VM1, pinging the host show no traffic on the VMs
tcpdumpping everything, only ignoring ssh (as I was logged in on both VM in ssh), the only traffic I see is the multicast traffic on the network.

So to answer your question, yes it is normal the NICs are in promiscuous, but that should not lead to all traffic going to all the VMs.

bleader

@irtaza9 Xen Orchestra premium (and from sources) has an SDN Controller plugin, it allows to create private networks and relies on GRE or VXLAN to create private networks, so as long as there are IP connectivity this can do the trick.

There are 2 blog posts on the subject:
https://xen-orchestra.com/blog/xo-sdn-controller/
https://xen-orchestra.com/blog/devblog-3-extending-the-sdn-controller/

And the documentation:
https://docs.xen-orchestra.com/sdn_controller

There are 2 main issues:

being the star topology with an elected center that will be a bottleneck as all the traffic on this network will go through it
there is (for now) no automated way to have a network management (dhcp, dns, gateway…), that should be part of our microsegmentation solution later on, but no ETA at this time

Is that answering your question?

bleader

To be honnest, I'm unsure, generally the XSAs have pretty clear impact description, here it just states:

resulting in e.g. guest user processes
being able to read data they ought not have access to.

No detail here if that's only inside the guest or if it could maybe reach data outside its domain scope. So I would not be able to say, but generally it is pretty clear in XSAs when there is a risk of accessing other guests data, my assumption would be that this is only inside the guest domain.

bleader

Hello @NielsH, no, that XSA is on the guest side, the fixes will be in the kernel used by the guest, unless we missed something, there is currently nothing to be done on the host kernel side.