Hi @KPS,
Thank you for reporting this behavior. We haven't been able to reproduce the bug yet, but we'll look into it with @MathieuRA. We're a bit busy at the moment, so we probably won't be able to fix this issue before the November release.
Hi @KPS,
Thank you for reporting this behavior. We haven't been able to reproduce the bug yet, but we'll look into it with @MathieuRA. We're a bit busy at the moment, so we probably won't be able to fix this issue before the November release.
@jshiells this value is the average load across all cores on a host. To be more precise, it is a weighted average of the last 30min of this value. Migrations are triggered if this average exceeds 85% of the critical threshold defined in the plugin configuration, which is roughly 64% if you defined the critical threshold at 75%.
Other circumstances can trigger migrations :
Hi @McHenry
If you still have the problem, you can increase the healthCheckTimeout value as Olivier recommended (e.g. healthCheckTimeout = ‘30m’
), however this value should be in the [backups.defaultSettings] section of the configuration file (or in the [backups.vm.defaultSettings] section) rather than in the [backups] section.
We've detailed the documentation a bit to make this more understandable: https://github.com/vatesfr/xen-orchestra/blob/a945888e63dd37227262eddc8a14850295fa303f/packages/xo-server/config.toml#L78-L79
As Mathieu answered in this topic, the bug has been reproduced on our side but isn't trivial to fix. We'll discuss with the team to schedule this task, and we'll keep you informed when it's fixed.
You are right, the documentation isn't up to date, this isn't configurable at the moment.
We are currently working on the load balancer, so this may come in future versions.
It appears that the value of healthCheckTimeout
from the config file is indeed not taken into account.
We've created a card on our side to fix this issue, and we'll soon plan it with the team.
If you can't wait for the fix to be released, you can modify the default value of healthCheckTimeout
in the code in files @xen-orchestra/backups/_runners/VmsRemote.mjs
and @xen-orchestra/backups/_runners/VmsXapi.mjs
, then restart XO server, and this should fix it until next update.
Hi @nicols ,
As Dan said, we are indeed investigation this issue, and we will try to provide a fix during the next weeks. We will keep you informed.
Regards
There is indeed a bug in the perf-alert plugin with removable storages.
This will be fixed in an upcoming XO version by removing these SRs from the perf-alert monitoring.
Hi @McHenry
If you still have the problem, you can increase the healthCheckTimeout value as Olivier recommended (e.g. healthCheckTimeout = ‘30m’
), however this value should be in the [backups.defaultSettings] section of the configuration file (or in the [backups.vm.defaultSettings] section) rather than in the [backups] section.
We've detailed the documentation a bit to make this more understandable: https://github.com/vatesfr/xen-orchestra/blob/a945888e63dd37227262eddc8a14850295fa303f/packages/xo-server/config.toml#L78-L79
It appears that the value of healthCheckTimeout
from the config file is indeed not taken into account.
We've created a card on our side to fix this issue, and we'll soon plan it with the team.
If you can't wait for the fix to be released, you can modify the default value of healthCheckTimeout
in the code in files @xen-orchestra/backups/_runners/VmsRemote.mjs
and @xen-orchestra/backups/_runners/VmsXapi.mjs
, then restart XO server, and this should fix it until next update.
As Mathieu answered in this topic, the bug has been reproduced on our side but isn't trivial to fix. We'll discuss with the team to schedule this task, and we'll keep you informed when it's fixed.
Hi @KPS,
Thank you for reporting this behavior. We haven't been able to reproduce the bug yet, but we'll look into it with @MathieuRA. We're a bit busy at the moment, so we probably won't be able to fix this issue before the November release.
There is indeed a bug in the perf-alert plugin with removable storages.
This will be fixed in an upcoming XO version by removing these SRs from the perf-alert monitoring.
Hi @nicols ,
As Dan said, we are indeed investigation this issue, and we will try to provide a fix during the next weeks. We will keep you informed.
Regards
@jshiells this value is the average load across all cores on a host. To be more precise, it is a weighted average of the last 30min of this value. Migrations are triggered if this average exceeds 85% of the critical threshold defined in the plugin configuration, which is roughly 64% if you defined the critical threshold at 75%.
Other circumstances can trigger migrations :
@olivierlambert I'm not familiar with the perf-alert plugin but it seems that @KPS is right, see : https://github.com/vatesfr/xen-orchestra/blob/7882debe7adc7d4945da7e27d9825b1ffcf78ca9/packages/xo-server-perf-alert/src/index.js#L31
@manilx Ok, then I'll have to investigate on this to see if the density plan is still working properly for us, or if there are other things that may be preventing it from working for you.