"Host time and XOA time are not consistent with each other" error after a reboot
-
On this server we are testing XCP-ng 8.0.0. before going to production.
We have 2 RAIDs on this server: SSD (sda) and SAS (sdb).
XCP-ng installation (without asking) created a single LVM partition merging both disks.
We need to have two separated SRs, so (from XOA) we removed the single LVM SR: trying to create the new ones we got errors ("storage is in use").
We rebooted the server and suddenly appeared in XOA the "Host time and XOA time are not consistent with each other" error.
Now the server icon is orange and it is not fully manageable.
Please note that all servers attached to XOA have the same timezone (Europe/Rome) and all date/times are correct (even hwclock is synced with systime).Any idea on what caused the issue and how to recover the server status?
-
I assume you are running XO as a VM. Did you check to see that the date/time in this VM is set correctly?
-
I would like to report this as a bug to be honest.
I have had the same issue on 4 servers running as 2 pools on XCP 8. Never had the problem before. All 4 servers are set to UTC time in the bios, and after the first time I got the message I made sure XO and all 4 XCP installs were set to the correct timezone and double-checked they reported the same time and dates (they did, but not sure if they did prior to thoroughly checking them), and yet on occasion still get the same error.
-
Ping @pdonias if he got an idea why could trigger that. I assume we have a "basic" check comparing time, I know if toolstack is restarted we probably got an empty field or something like that. Maybe we could improve some edge case?
-
We compare the time reported by XAPI's method
host.get_servertime
and Node'sDate.now()
in XOA and check if they're more than 30 seconds apart. We work with timestamps so the timezones shouldn't matter. -
What if
host.get_servertime
is empty/null? Are we able to differentiate vs a 30sec+ time diff? -
If it's not a UTC formatted string, then it will indeed behave as if the 2 dates were more than 30 seconds apart. Is it supposed to happen? We can easily fix that.
-
I think we might have cases with something else or an empty result (just after restarting the toolstack?)
What could have be interesting before searching for those cases: displaying in the tooltip the host time and XOA time, so we can actually see what we are comparing. So people experiencing the bug will be able to report what they are actually seeing.
-
Sorry for the late reply.
If it helps, I'll explain exactly what happened to me.
We have had 2 Cisco C210 M2 servers running XCP since release (and Xen prior) as a single pool. This had worked for around 3 years. Never once saw the time error.
A few months ago we set up 2 R720 with XCP 8 as a single pool. Once it was all running, I was running VM's on them just fine for around 2 weeks. Then for some reason, 1 of them completely disappeared from the pool. It was still there, the iDRAC was still accessible. SSH was accessible to the server. When SSHing in and looking at the console, it reported there was no network cards or storage. It seems completely broken but was all still functioning. The VM's that were running on it had moved over to the second server. Rebooted the machine and it was back to normal.
Can't be precise as I hadn't written it all down, but I remember looking thru the logs (via XO) and seeing something about XOA time not matching something time...
I didn't take much notice at that point, and within 24 hours the Second server did exactly what the first server did.
After that, I saw the same error messages again regarding time. I then double-checked the bios time on the servers (both set to UTC) and both were correct. I then went into xsconsole and saw there that neither has the timezone set, so I set them, and also set both NTP settings up. For a few weeks, they seemed to start behaving.
I had to run some XCP patches on them recently and in the process (doing the install, reboot, restarting toolstack etc) I did notice that for a moment XO reported the time error again. My guess is in this instance it had to do with the tool stack restarting. Once updates were complete, all returned to normal and I haven't seen the time error since (yet).
I hope the above information helps.
-
Yeah, if XAPI isn't answering correctly, time diff might be just a symptom of something else. I would strongly advise to take a deeper look at your host logs to find the root cause
We are improving XO to improve the behavior by checking if it's a real time we got or just a glitch.
-
@olivierlambert I did dig through the logs and couldn't find anything.
In saying that since the last time it happened (mentioned above), it hasn't happened again. All servers up to date and I haven't had any problems at all regarding XCP-ng.
-
I think we fixed the display issue in latest XOA