Endless Xapi#getResource /rrd_updates in tasks list

olivierlambert

Okay thanks, we'll try to see if we can reproduce it here.

Mathieu

@olivierlambert
I can also confirm that the issue is still there on my host with XO updated to master with latest commit a548a.

uwood

It is gone with a548a after I've restarted XO and toolstack.

olivierlambert

That's confusing. @Mathieu do you confirm?

Mathieu

@olivierlambert
Yes, confusing, indeed
I just now restarted XO and the toolstack one more time to be sure. Yesterday, it was OK at the beginning but the issue reappeared after a few hours.
I'll let you know ASAP.

olivierlambert

Thanks!

uwood

@olivierlambert now, about 10 hours after restart of XO and Toolstack I have two task too.

olivierlambert

Thanks, might be useful for @julien-f

Mathieu

@olivierlambert
Same on my host, the first stuck task appeared 5 hours after toolstack and XO reboot.

uwood

8b7e1 still got the tasks

uwood

uwood

@uwood 0794a still some tasks

olivierlambert

That's tricky, @julien-f couldn't reproduce it in our lab. So we'll need more details on your setup guys.

14wkinnersley

@olivierlambert
I have one pool, and three Nodes in my pool.
Node 1 – Dell R420, 2x 10gb NICs (Master) – 192.168.1.4
Node 2 – Dell R620, 2x 10gb NICs – 192.168.1.7
Node 3 – Dell R420, 2x 10gb NICs – 192.168.1.3
XO VM – Ubuntu 22.04.4 LTS – 192.168.1.15

Each Node has a Bond of NIC 2 + 3. (Node 2 with the R620 has the mac addresses re-assigned to work correctly)

Above each node, network wise, is 2x switches (2x Unifi USW-Agg) and I use the Unifi Dream Machine Pro as my router. I am able to ping the other Nodes from each Node.

Local DNS utilizes Technitium DNS (Primary & Secondary), as a recursive DNS.

My SR’s are two iSCSI datastores that run on a separate server running TrueNAS Scale on a Dell R320.

Within my pools, I run about 25x virtual machines. I run nightly backups for ~5 VM’s, and weekly backup’s for all VM’s. Backups have a remote NFS storage location hosted on a separate server. I have 3 VM’s that run on separate network vlans than the rest, and those networks are setup under the pool, and upstream on the router.

Plugins, I have the following enabled
• Backup-reports
• Load-balancer (performance mode)
• Perf-alert
• Transport-email
• Usage-report

From my testing, this was introduced with commit 6c16055 - Mar 15. I have since rolled back to c6451cf and have stayed on this commit for the past several days.

olivierlambert

Thanks for the details @14wkinnersley !

olivierlambert

@14wkinnersley can you disable perf alert and load balancer plugin and see if it still happening?

Mathieu

@olivierlambert
Simpler setup on my side:

Pool 1 - 1 x ASRock Rack 1U4LW-X570/2L2T RPSU with about 10 VMs
Pool 2 - 1 x HP DL360 Gen9 with only the XO VM (Debian 11).

Each host has 2 x 10 Gb NIC in use (one for VMs/MGMT, the other one for NFS storage).

The storage is a NFS share on a QNAP NAS (except for the XO VM which is on the local storage of the DL360 host).

The 2 hosts and the NFS storage are connected on the same10Gbit Ubiquiti EdgeSwitch.

Same plugins as @14wkinnersley + audit and sdn-controller.

The issue with stuck task is only appearing on the pool 1, not on the pool 2 with only the XO VM.

14wkinnersley

@olivierlambert Will do. Plugins are disabled and I'm updating back to master right now. Will report back.

olivierlambert

Thank you very much, both of you

Mathieu

I'm gonna try the same and will let you know.