JamfoFL

JamfoFL

@jr-m4 I just wanted to report that today, February 24, 2025, that I updated to Commit f18b0 and that seems to have fixed the issue. I am able to see all drop downs once again and everything seems to be working normally.

@Danp and Support Team... thank you for looking into the issue!

JamfoFL

@ph7 I'm seeing the same thing as you, where I'm getting a mismatch between the server that is sending out the alert and then ending the alert. Just like you, it is actually the XO server that is truly the one that should be alerting. The second server (and it's always the same second server) is NOT having any issues with CPU or memory usage but is being drug into the alerts for some strange reason.

I'm currently on Commit 2e8d3 running Xen from sources. Yes, I know I'm 5 commits behind right now, and will update as soon as I finish this message. However, this issue has been going on for me for some time now and when I saw others with the same issue, I figured I'd add to the chain.

One other thing that happed around the same time this issue started... it seems the Average Length value for alerts are being ignored, or are at least being handled differently than they had previously. For example, I have my CPU alert set to trip if it exceeds 90% for over 600 seconds. Before the issue started, if I had a long running backup, my CPU would go over 90% and could sometimes stay there for an hour or more. During that period, I would get a single alert after the CPU was over 90% for that period of time and then Xen was "smart" enough that it would keep an eye on the average, so a brief couple second dip below 90% would NOT send out an "end of alert" and then a second "alert" message when the CPU went over 90% once again. This is not happening anymore... if the CPU spikes over my 90% threshold, I get an almost immediate alert message. The instant the CPU goes below the 90% threshold, I get an immediate end of alert message. If threshold goes back over 90%, even a few seconds later, I get yet another alert message.

This has had the effect where instead of getting a single message that spans the duration of the time the threshold is exceeded, where brief dips below were ignored if they were only a few seconds long, I am now getting an alert/end of alert/alert sequence for every seconds-long dip in CPU usage. Last night, for example, I received over 360 alert e-mails because of this, with many happening within seconds:

So... just confirming what @ph7 has been seeing... alerts are sending out from one server and a second sends the end-of-alerts, and for some reason the ability of Xen to average the alerts over the selected period of time so messages aren't sent out with every single seconds-long dip below the threshold is no longer working, as well.

Thanks!

JamfoFL

@julien-f I just ran a "yard build" this morning, and other than still seeing the chunk error message:

(!) Some chunks are larger than 500 KiB after minification. Consider:
- Using dynamic import() to code-split the application
- Use build.rollupOptions.output.manualChunks to improve chunking: https://rollupjs.org/guide/en/#outputmanualchunks
- Adjust chunk size limit for this warning via build.chunkSizeWarningLimit.

Everything else ran fine... no errors or OOM issues.

JamfoFL

@alcaraz73 @screame1 @AtaxyaNetwork @florent @Danp @olivierlambert I know I'm late back to the party, but just got back into our office after Hurricane Ian. Thanks to all for your well wishes... I was very fortunate to come out with no damage, and was only without power for about 12 hours.

Now that I'm back, I also applied the fixes here (and updated to the latest commit) and can confirm continuous replication is working like a champ.

Thanks to everyone involved for the help!

JamfoFL

@Danp Current commit is 3d3b6.

xo-server: 5.103.0
xo-web: 5.104.0

Yes... as I had mentioned, everything worked until I updated yesterday at around 3:50 PM. I noticed there were five different commits that were released yesterday, and the one you linked to is one of those. So, as I figured, one of the commits from yesterday "broke" the ability to run continuous replications.

Now that it looks like you've focused on a cause, I'm sure it's only a matter of time until a new commit is published to fix the issue. Once it's released, I'll get it installed and, hopefully, we can put this behind us.

Thanks!!

JamfoFL

I just wanted to post an update as of Monday, February 24, 2025...

I updated to Commit f18b0 and that seems to have fixed the issue. All of the dropdown options that were missing in the above screen shots, as well as many others, are all back and responding normally once again. So, it looks like the issue has been addressed and operations have returned to normal.

The only oddity I now see after all of that is when I run yarn after pulling the updates, I receive an error message that the expected version of Node is now v20. I upgraded to v20, but it seems that the backup portion of Xen still needs v18... so be sure to keep both on the system or things will break. I have both v18 and v20 installed and running, with v20 set as default.

Other than that little oddity, it seems everything else is doing everything it should be doing!

Thanks to the excellent Vates teams for getting everything back on track.

JamfoFL

@probain Thanks for the update... I will monitor that ticket, as well.

JamfoFL

@olivierlambert I would agree... one of those odd glitches that occurred during the build process that corrected itself on a second run.

Thanks for everyone's help... please go ahead and close this out!

JamfoFL

I just wanted to chime in that I was having the same issue after updating to the latest commit earlier today, 1b5157e9a7a7ba9a49ebc9484737c34ef3da95ed.

I rolled back to my previous commit (granted, it's a bit of an older one as I hadn't updated since last week) and the CR backups started working again perfectly.

In my case, both XCP-ng hosts are on the same subnet.

So, something seems to have broken CR backups between last week and today. Fortunately, rolling-back got everything working again.

If there's anything I can do on my end that might help, just let me know!

JamfoFL

@olivierlambert A-ha! Thanks! So it was just a coincidence... and I was a little ahead of the game.

Sorry for the confusion. Thanks for putting my head on straight!

JamfoFL

@Andrew Interesting... up until this time we've never needed to completely rebuild XO every time a new Commit was released or a Node update was required. The existing installation just kept working. I hope this isn't something that is going to become standard as having to completely rebuild XO every time a new Commit is available would be a bit of a pain...

JamfoFL

@Andrew I guess maybe that's the difference? I update Node and then do the usual update process (git checkout, yarn, yarn build, etc...) and my backups were still unable to run without Node 18 installed. Sounds like you went completely back to square one and built XO as if it were a brand-new machine.

JamfoFL

@Andrew Oh... you rebuilt XO from scratch?

JamfoFL

@Andrew Do you still have Node 18 installed even though you are on Node 22 now? When I upgraded to Node 20 I uninstalled Node 18 and immediately all backups started to fail. I received error messages about needing Node 18.19.1 for the backups to work.

I reinstalled 18.19.1 and that got everything working again. So, it seems that, for whatever reason, that newer versions of Node are supported, but you have to keep the older versions around or some things will break. In fact, I found if I installed the latest version of Node 18, 18.20.1, that my backups still failed. I specifically had to use Node 18.19.1 to keep things working.

JamfoFL

I just wanted to post an update as of Monday, February 24, 2025...

I updated to Commit f18b0 and that seems to have fixed the issue. All of the dropdown options that were missing in the above screen shots, as well as many others, are all back and responding normally once again. So, it looks like the issue has been addressed and operations have returned to normal.

The only oddity I now see after all of that is when I run yarn after pulling the updates, I receive an error message that the expected version of Node is now v20. I upgraded to v20, but it seems that the backup portion of Xen still needs v18... so be sure to keep both on the system or things will break. I have both v18 and v20 installed and running, with v20 set as default.

Other than that little oddity, it seems everything else is doing everything it should be doing!

Thanks to the excellent Vates teams for getting everything back on track.

JamfoFL

@jr-m4 I just wanted to report that today, February 24, 2025, that I updated to Commit f18b0 and that seems to have fixed the issue. I am able to see all drop downs once again and everything seems to be working normally.

@Danp and Support Team... thank you for looking into the issue!

JamfoFL

@probain Thanks for the update... I will monitor that ticket, as well.

JamfoFL

XCP-ng 8.3

Xen Orchestra from Sources
Commit 97230

After updating to the latest commit shown above, all of the drop-down boxes are empty on all screens.

Everything else seems to be working just fine... it's only an inability to go into any drop-down on any screen to see the available options, machines, backups, etc.

Any ideas on what to look for?

Thank you!

JamfoFL

@MathieuRA Thanks so much! I appreciate all the effort!

JamfoFL

@MathieuRA Yes, I can confirm I am using the All Running VMs and All Running Hosts (I am not using All Running SRs, but I never get alerts for those because I have a LOT of free disk space).

I did place an exclusion for one of my VMs (the one that was generating dozens and dozens of alerts) to cut down on some of that chatter, but even with one machine excluded, when I do get a report from one of the other VMs it still has the same issue: the proper VM will generate the alert, but an improper VM will be reported in the end of alert message.

So... as far as I can tell, we still have the issue with the improper machine identification and the Average Length field is ignored so a machine that pops over the threshold, then briefly under the threshold for a few seconds, then back over the threshold again will generate three messages (alert, end of alert, alert) in several seconds instead of looking at the average to make sure the dip isn't just a brief one.

Hopefully that makes sense.

Thanks again!

JamfoFL

@JamfoFL

Best posts made by JamfoFL

Latest posts made by JamfoFL