xapi seems to broken after a Power Failure
-
After a Power Failure our server running XCP-ng is having issue that after it boot up it can't seems to detect and NIC and any pool anymore.
Shows up this error if I tried to go into All VM on the Server itself
"xenserver nonetype object has no attribute xenapi"
I also tried to log into SSH and it just keeps spamming this error message: xapi-nbd[2641]: main: Failed to log in via xapi's Unix domain socket in 300.000000 seconds
These are what I found inside /var/log/xensource.log
xensource small.txt
XCP-ng Version 8.2.1 -
@LunarstarPony Have to make the log shorter so I can upload to here.
And also I think it's just repeating itself as well, oh and even tho on the GUI it said no network configured I can still connect to it using ssh. -
It seems that the XAPI DB is corrupted:
Mar 4 14:59:03 xcp-ng-ksu xapi: [debug||0 |starting up database engine D:9e8add083ea4|xapi] Dbconf contains: /var/lib/xcp/state.db (generation 0) Mar 4 14:59:03 xcp-ng-ksu xapi: [debug||0 |starting up database engine D:9e8add083ea4|xapi] Most recent db is /var/lib/xcp/state.db (generation 0) Mar 4 14:59:03 xcp-ng-ksu xapi: [debug||0 |starting up database engine D:9e8add083ea4|sql] attempting to restore database from /var/lib/xcp/state.db Mar 4 14:59:04 xcp-ng-ksu xapi: [error||0 ||backtrace] starting up database engine D:9e8add083ea4 failed with exception Xmlm.Error(2:266234, "expected one of these character sequence: "/", ">", found "v"") Mar 4 14:59:04 xcp-ng-ksu xapi: [error||0 ||backtrace] Raised Xmlm.Error(2:266234, "expected one of these character sequence: "/", ">", found "v"")
That's not an usual problem, but I suppose it happened precisely while a write was going to the disk In general, it's a combination of bad/non pro hardware and a power failure (pro hardware got in general a mechanism to prevent incomplete write).
Can you tell us more about the hardware? Also:
- if you have a metadata backup, you can restore the XAPI DB
- if you don't but you have multiple hosts in the pool, you can promote a slave as a master
- if you don't have any backup at all, then you'll have to reinstall
-
@olivierlambert We do have another server that we connect together using XOA but I didn't set this up so I'm not pretty sure if it's a slave or not?
I also don't think there's a backup somewhere unless there's auto backup
Hardware It's a ASUS RS720 with E5 2620*2 -
@LunarstarPony Around 60G of Ram
-
What kind of drives do you have inside?
If you don't have any backup, you are clear to reinstall. You still have your VM data, but it will be more or less easy to recover stuff, depending if you lose a member of a pool or if your host was a pool by itself.
So you have no idea if you had a pool with multiple machines?
-
@olivierlambert Not sure anyway I can check that on my other machine?
-
@LunarstarPony That one is still running atm
-
In Xen Orchestra, you can see go to Home/host view, clear the search field to display all hosts (even those halted). What do you see there?
Alternatively, you can type
xe host-list
from the working host. -
@olivierlambert Seems like it only shows up itself and due to XOA is hosted on the server that's currently down I do not have access of that
-
So to recap: you had 2 different hosts not in the same pool, without any backup (metadata or VM data backup).
I suppose you also used only local storage?
-
@olivierlambert Well, yes both servers are running on Local Storage But I sometimes do transfer VMs between them
-
@LunarstarPony That's probably all I did between them
-
Hopefully, since there's no backup nor power protection on drives nor on the server, it's likely a lab so no big deal Just reinstall XCP-ng on the damaged host.
If you really want to salvage the VM data on that host, then you'll need to be careful and not remove the previous storage partition, you'll be able to rescan the SR and then have "unknown" disks (no name but usable to attach to a new VM).
If you want more security, you need to think as defense in depth: power protection, better drives, metadata and data backup, or replication etc. There's so many ways to avoid data loss. If you need custom training related to your future production usage, please contact us on https://xcp-ng.com
-
@olivierlambert Well, Is there any tutorial or some sort to tell me how to reinstall XOA without losing VM data?
-
Reinstall XCP-ng, not XOA. You can redeploy a fresh XOA on your working host, it's not a problem
I'm not sure there's a tutorial like this around, the goal is basically to not reformat your partition/disks where you had your SR. Since I have no idea on your setup, it's a bit hard to give you directions.
-
@olivierlambert Maybe I can take some pics while I'm doing it, So I just do a clean install of XCP-ng?
-
You have first to get a clear idea on your existing setup. You still did not answer regarding the disk setup: is the SR on a dedicated disk or partition?
In any case, you'll have to be certain to not modify the previous existing SR. If your install was already an upgrade, you might salvage the XAPI.
There's so many ways to do it, but also depending on so many different setup, there's no universal answer
-
@olivierlambert I'm pretty sure these server are upgraded from older XenServer.
And since it only shows up one drive in BIOS It should be partition -
@LunarstarPony If I remember correctly I think it's update from XenServer 5.6 to XCP-ng 7.0 and then XCP-ng 8.2.