Bastien Nollet

Bastien Nollet

Thank you for reporting this behavior. We haven't been able to reproduce the bug yet, but we'll look into it with @MathieuRA. We're a bit busy at the moment, so we probably won't be able to fix this issue before the November release.

Bastien Nollet

@jshiells this value is the average load across all cores on a host. To be more precise, it is a weighted average of the last 30min of this value. Migrations are triggered if this average exceeds 85% of the critical threshold defined in the plugin configuration, which is roughly 64% if you defined the critical threshold at 75%.

Other circumstances can trigger migrations :

if a host free memory is too close to critical threshold
anti-affinity rules
if you defined a performance plan behavior other than default (i.e. Preventive or vCPU balancing), migrations can be triggered even when at low memory and CPU usage

Bastien Nollet

Hi @McHenry

If you still have the problem, you can increase the healthCheckTimeout value as Olivier recommended (e.g. healthCheckTimeout = ‘30m’), however this value should be in the [backups.defaultSettings] section of the configuration file (or in the [backups.vm.defaultSettings] section) rather than in the [backups] section.

We've detailed the documentation a bit to make this more understandable: https://github.com/vatesfr/xen-orchestra/blob/a945888e63dd37227262eddc8a14850295fa303f/packages/xo-server/config.toml#L78-L79

Bastien Nollet

As Mathieu answered in this topic, the bug has been reproduced on our side but isn't trivial to fix. We'll discuss with the team to schedule this task, and we'll keep you informed when it's fixed.

Bastien Nollet

You are right, the documentation isn't up to date, this isn't configurable at the moment.

We are currently working on the load balancer, so this may come in future versions.

Bastien Nollet

It appears that the value of healthCheckTimeout from the config file is indeed not taken into account.
We've created a card on our side to fix this issue, and we'll soon plan it with the team.

If you can't wait for the fix to be released, you can modify the default value of healthCheckTimeout in the code in files @xen-orchestra/backups/_runners/VmsRemote.mjs and @xen-orchestra/backups/_runners/VmsXapi.mjs, then restart XO server, and this should fix it until next update.

Bastien Nollet

@olivierlambert I'll take a look

Bastien Nollet

Hi @nicols ,

As Dan said, we are indeed investigation this issue, and we will try to provide a fix during the next weeks. We will keep you informed.

Regards

Bastien Nollet

There is indeed a bug in the perf-alert plugin with removable storages.

This will be fixed in an upcoming XO version by removing these SRs from the perf-alert monitoring.

Bastien Nollet

@ph7

Some of the errors you encountered are intended. We don't allow values in the "Virtual Machines" field if "Exclude VMs" is disabled and "All running VMs" is enabled, because it would make the plugin configuration confusing.

However you're right, there seems to be an issue when the VMs are selected and then removed. The value becomes an empty list instead of being undefined, which causes the validation to fail when we try to turn off the "Exclude VMs" option.

I'm going to create a task on our side so that we can plan to resolve this problem.

In the meantime you can work around the problem by deleting the monitor and recreating a new one with the same parameters.

Bastien Nollet

Hi @Forza,

This is not possible at the moment. The XO log retention is set to 20,000 entries.

About audit logs, we have a task planned for the following months to try to add retention configuration. It's a bit complicated as deleting old audit logs interferes with audit log chain verification, so I don't think we can expect this feature to be released before a few months.

Bastien Nollet

@AlexQuorum

Nice to hear it

In the meantime I've made a bugfix which will soon be available so you can edit the monitors instead of recreating them.

Bastien Nollet

Hi @AlexQuorum,

I think you are getting this error because you try to use the smart mode ("All running VMs" / "All running hosts" / "All SRs") but also specify elements in the "Virtual Machines" field. This field has two purposes: it either allows you to select VMs you want to monitor when the "All running VMs" option is off, or it allows you to select VMs you want not to be monitored if both the "All running VMs" and "Exclude VMs" options are on. (same thing for the "SRs" and "Hosts" fields in other monitors) If you're not in one of those cases, we don't allow values in this field to avoid confusion about the plugin behavior.

There is currently a bug on the plugin configuration that don't let users empty the "VMs" field without getting an error (it will be patched soon), so I recommend you to create new monitors with "All running VMs" on and "Virtual Machines" empty, and delete the monitors you previously created.

Bastien Nollet

Hi @KPS ,

The difference between these two settings is that sessionCookieValidity determines the time before a user gets disconnected if they did not check the "Remember me" option, and permanentCookieValidity determines this when this option was checked.

If you want to force users to be disconnected after 12 hours regardless of how they connected, I think you need to set both sessionCookieValidity = '12 hours' and permanentCookieValidity = '12 hours'.

However, this memory increase you're experiencing is intriguing, it is not an intended behaviour.

Bastien Nollet

@ph7

Some of the errors you encountered are intended. We don't allow values in the "Virtual Machines" field if "Exclude VMs" is disabled and "All running VMs" is enabled, because it would make the plugin configuration confusing.

However you're right, there seems to be an issue when the VMs are selected and then removed. The value becomes an empty list instead of being undefined, which causes the validation to fail when we try to turn off the "Exclude VMs" option.

I'm going to create a task on our side so that we can plan to resolve this problem.

In the meantime you can work around the problem by deleting the monitor and recreating a new one with the same parameters.

Bastien Nollet

Hi @ph7 @JamfoFL ,

We have just merged to master a fix for this spam issue. Can you test these changes and confirm that the problem has been solved for you?

Bastien Nollet

Hi @kagbasi-ngc ,

We have just merged changes to the perf-alert plugin on the master branch, which should resolve this spam problem that appeared some time ago. This fix will be available in the next XO version (5.105).

Please let us know if you encounter frequent alerts after upgrading to this version.

Bastien Nollet