Hi @KPS,
Thank you for reporting this behavior. We haven't been able to reproduce the bug yet, but we'll look into it with @MathieuRA. We're a bit busy at the moment, so we probably won't be able to fix this issue before the November release.
Hi @KPS,
Thank you for reporting this behavior. We haven't been able to reproduce the bug yet, but we'll look into it with @MathieuRA. We're a bit busy at the moment, so we probably won't be able to fix this issue before the November release.
@jshiells this value is the average load across all cores on a host. To be more precise, it is a weighted average of the last 30min of this value. Migrations are triggered if this average exceeds 85% of the critical threshold defined in the plugin configuration, which is roughly 64% if you defined the critical threshold at 75%.
Other circumstances can trigger migrations :
Hi @McHenry
If you still have the problem, you can increase the healthCheckTimeout value as Olivier recommended (e.g. healthCheckTimeout = ‘30m’
), however this value should be in the [backups.defaultSettings] section of the configuration file (or in the [backups.vm.defaultSettings] section) rather than in the [backups] section.
We've detailed the documentation a bit to make this more understandable: https://github.com/vatesfr/xen-orchestra/blob/a945888e63dd37227262eddc8a14850295fa303f/packages/xo-server/config.toml#L78-L79
As Mathieu answered in this topic, the bug has been reproduced on our side but isn't trivial to fix. We'll discuss with the team to schedule this task, and we'll keep you informed when it's fixed.
You are right, the documentation isn't up to date, this isn't configurable at the moment.
We are currently working on the load balancer, so this may come in future versions.
It appears that the value of healthCheckTimeout
from the config file is indeed not taken into account.
We've created a card on our side to fix this issue, and we'll soon plan it with the team.
If you can't wait for the fix to be released, you can modify the default value of healthCheckTimeout
in the code in files @xen-orchestra/backups/_runners/VmsRemote.mjs
and @xen-orchestra/backups/_runners/VmsXapi.mjs
, then restart XO server, and this should fix it until next update.
Hi @nicols ,
As Dan said, we are indeed investigation this issue, and we will try to provide a fix during the next weeks. We will keep you informed.
Regards
There is indeed a bug in the perf-alert plugin with removable storages.
This will be fixed in an upcoming XO version by removing these SRs from the perf-alert monitoring.
Some of the errors you encountered are intended. We don't allow values in the "Virtual Machines" field if "Exclude VMs" is disabled and "All running VMs" is enabled, because it would make the plugin configuration confusing.
However you're right, there seems to be an issue when the VMs are selected and then removed. The value becomes an empty list instead of being undefined, which causes the validation to fail when we try to turn off the "Exclude VMs" option.
I'm going to create a task on our side so that we can plan to resolve this problem.
In the meantime you can work around the problem by deleting the monitor and recreating a new one with the same parameters.
Hi @Forza,
This is not possible at the moment. The XO log retention is set to 20,000 entries.
About audit logs, we have a task planned for the following months to try to add retention configuration. It's a bit complicated as deleting old audit logs interferes with audit log chain verification, so I don't think we can expect this feature to be released before a few months.
Nice to hear it
In the meantime I've made a bugfix which will soon be available so you can edit the monitors instead of recreating them.
Hi @AlexQuorum,
I think you are getting this error because you try to use the smart mode ("All running VMs" / "All running hosts" / "All SRs") but also specify elements in the "Virtual Machines" field. This field has two purposes: it either allows you to select VMs you want to monitor when the "All running VMs" option is off, or it allows you to select VMs you want not to be monitored if both the "All running VMs" and "Exclude VMs" options are on. (same thing for the "SRs" and "Hosts" fields in other monitors) If you're not in one of those cases, we don't allow values in this field to avoid confusion about the plugin behavior.
There is currently a bug on the plugin configuration that don't let users empty the "VMs" field without getting an error (it will be patched soon), so I recommend you to create new monitors with "All running VMs" on and "Virtual Machines" empty, and delete the monitors you previously created.
Hi @KPS ,
The difference between these two settings is that sessionCookieValidity
determines the time before a user gets disconnected if they did not check the "Remember me" option, and permanentCookieValidity
determines this when this option was checked.
If you want to force users to be disconnected after 12 hours regardless of how they connected, I think you need to set both sessionCookieValidity = '12 hours'
and permanentCookieValidity = '12 hours'
.
However, this memory increase you're experiencing is intriguing, it is not an intended behaviour.
Some of the errors you encountered are intended. We don't allow values in the "Virtual Machines" field if "Exclude VMs" is disabled and "All running VMs" is enabled, because it would make the plugin configuration confusing.
However you're right, there seems to be an issue when the VMs are selected and then removed. The value becomes an empty list instead of being undefined, which causes the validation to fail when we try to turn off the "Exclude VMs" option.
I'm going to create a task on our side so that we can plan to resolve this problem.
In the meantime you can work around the problem by deleting the monitor and recreating a new one with the same parameters.
We have just merged to master a fix for this spam issue. Can you test these changes and confirm that the problem has been solved for you?
Hi @kagbasi-ngc ,
We have just merged changes to the perf-alert plugin on the master branch, which should resolve this spam problem that appeared some time ago. This fix will be available in the next XO version (5.105).
Please let us know if you encounter frequent alerts after upgrading to this version.
Hi @McHenry
If you still have the problem, you can increase the healthCheckTimeout value as Olivier recommended (e.g. healthCheckTimeout = ‘30m’
), however this value should be in the [backups.defaultSettings] section of the configuration file (or in the [backups.vm.defaultSettings] section) rather than in the [backups] section.
We've detailed the documentation a bit to make this more understandable: https://github.com/vatesfr/xen-orchestra/blob/a945888e63dd37227262eddc8a14850295fa303f/packages/xo-server/config.toml#L78-L79
It appears that the value of healthCheckTimeout
from the config file is indeed not taken into account.
We've created a card on our side to fix this issue, and we'll soon plan it with the team.
If you can't wait for the fix to be released, you can modify the default value of healthCheckTimeout
in the code in files @xen-orchestra/backups/_runners/VmsRemote.mjs
and @xen-orchestra/backups/_runners/VmsXapi.mjs
, then restart XO server, and this should fix it until next update.
As Mathieu answered in this topic, the bug has been reproduced on our side but isn't trivial to fix. We'll discuss with the team to schedule this task, and we'll keep you informed when it's fixed.