Posts made by fasterfourier | XCP-ng and XO forum

fasterfourier

Probably plenty of Citrix customers were affected, but they would rather reboot on schedule than spend months working through the support process

fasterfourier

@olivierlambert

Official Citrix update has been posted: https://support.citrix.com/article/CTX306529

fasterfourier

Our Citrix ticket has been worked and they concluded that the NIC driver is to blame here as well. They had us collect debug info using:

/opt/xensource/libexec/xen-cmdline --set-dom0 page_owner=on

They then confirmed the memory leak was from the NIC driver. They are intending to release a public hotfix for this issue.

fasterfourier

One more observation here. This issue does not occur on a different pool of ours that's also running CH8.2LTSR. That pool has lower loading overall, 2 hosts instead of 7, and does not contain any NICs using the ixgbe driver. Other aspects of the pool are identical.

fasterfourier

Update: we have disabled dynamic memory on all VMs in our pool and the issue is still occurring.I expect this to be sent to the citrix developers shortly, since the normal support team has exhausted their troubleshooting options.

fasterfourier

@stormi

I am also suspicious of this diagnosis, and I think this is likely related to checking off the "misalignments" in our configuration before escalating the case to the next level of troubleshooting support. That said, I figured I'd run it by the group here to see if there's any correlation between users with dynamic memory on their VMs and this issue.

fasterfourier

I have another observation to throw in the thread here. In working with Citrix support on our dom0 memory exhaustion issue in CH8.2LTSR, they are focusing on several of our VMs that had dynamic memory control enabled, which is deprecated in CH8.x. They believe this is related to the control domain memory exhaustion.

I have disabled this on all VMs that I can find with the feature enabled and will continue to monitor. I don't have much hope that this is the underlying issue, since we are seeing the memory issue on our pool master, which could only have hosted a VM with DMA enabled for very brief periods of time while other VMs were shuffled around for maintenance.

Does this track with anyone else here?

fasterfourier

@garyabrahams

Sorry I was unclear, but we are not running Kubernetes in our environment. We are running Citrix Hypervisor 8.2 LTSR.

fasterfourier

FWIW, no kubernetes in our environment with this issue.

fasterfourier

@r1 If it helps at all, I have seen this more often on the pool master than in other pool hosts. We are using XO delta backup on 125 VMs in this pool daily. So, the master is busy doing a lot of snapshot coalesce operations (lots of iSCSI storage IO) compared to other hosts. The other host that has hit 95% control domain memory use is also IO heavy (it has some database server VMs).

fasterfourier

@stormi The ID for the case I just opened is 80240347. If you have a bugtracker issue open, you may want to mention that ticket. I just now opened the ticket, though, so it will be a while before it makes its way out of tier 1, etc.

EDIT: Had the wrong case number at first. Updated case number.

fasterfourier

@stormi I've just opened a Citrix case on the issue, but I wouldn't expect much help there, and definitely wouldn't expect anything quickly.

fasterfourier

I am seeing similar behavior with Citrix Hypervisor 8.2LTSR after upgrading from 7.1CU2, which was not affected. We have a pool with 5 Poweredge R730 hosts and 2 R720 hosts. All have Intel 10G and 1G NICs (ixgbe and igb drivers) and we use iSCSI storage. I have had two hosts use up all their control domain memory, requiring an evacuate/reboot of the host. One host was the pool master, which runs only one VM (xen orchestra appliance) but is generally busy with various iSCSI tasks due to snapshot coalesce after daily backups. The other host has ~20 VMs that are pretty busy with network activity. No userspace processes that seem to be using an abnormal amount of memory.