@KPS The DRBD volume of the LINSTOR database was not created by the driver. We just restarted few services + the hosts to fix that. Unfortunately, we have no explanation for what could have happened. So unfortunately I don't have much more interesting information to give. However if a person finds himself again in this situation, I can assist him in order to see if we can obtain more interesting logs.

Posts made by ronan-a
-
RE: Lost access to all servers
-
RE: Lost access to all servers
@fred974 If you can yes.
Send me the code using the chat.
-
RE: Lost access to all servers
@fred974 I was a little bit busy, I can take a look at your problems tomorrow.
In the worst case, do you have a way to open a ssh connection to your servers? -
RE: Lost access to all servers
@fred974 I'll take a look at the logs. Thanks. What's the ouput of
lvs
? If the database is not active, execute:vgchange -ay linstor_group
. -
RE: Lost access to all servers
@fred974 Hi, well first, how many hosts do you have?
We recommend to use at least 3 hosts, (4 is more robust). And also what's your replication count on your LINSTOR SR?
I ask these questions because it's possible that a problem on a host has caused reboots on the whole pool and finally the emergency state.Now: can you share the kern.log files of each host? And execute this command (on each machine) please:
drbdsetup status xcp-persistent-database
-
RE: RunX: tech preview
@jchua We currently don't have a test version planned for XCP-ng 8.3. Only 8.2 is supported for the moment.
-
RE: Xen Orchestra Load Balancer - turning on hosts
@berish-lohith Just FYI I created a card in our backlog, I don't see too many blocking points to implement it correctly.
-
RE: XCP-ng 8.3 public alpha 🚀
@Anonabhar Could you upload the other logs (xensource.log, daemon.log, etc)? There is no valid reason to have a call to
cleanup.py
(not a SR scan) every 30s if there is nothing to coalesce. -
RE: RunX: tech preview
@jmccoy555 said in RunX: tech preview:
this uuid is of the SR created by the step in the first post?
Right!
Regarding all
xe vm-param-set
/xe vm-disk-remove
/xe template-param-set
commands, you must use the VM UUID returned by thexe vm-install
command. I edited the post regarding that. -
RE: RunX: tech preview
@etomm Why this
yum remove
command? You just deleted what allows to manage VMs.You can try to reinstall the packages using
yum install
. -
RE: Updates announcements and testing
@JeffBerntsen I think I will release a new
linstor
RPM to override thesm
testing package. The current is:sm-2.30.7-1.2.0.linstor.1.xcpng8.2.x86_64.rpm
. For the moment, you can downgrade if you want. -
RE: Updates announcements and testing
@JeffBerntsen What's your sm version? I suppose, you updated your hosts and you don't have the right one.
Please send me the output of:
rpm -qa | grep sm-
. -
RE: Create VM Error SR_BACKEND_FAILURE_1200, No such Tapdisk.
@geoffbland No problem.
It's the first time I see this error with tapdisk (and it's it's even more surprising to have it on this type of SR...).
It had all been working fine and I had not changed anything on the share - it just stopped working
In this case, maybe there was a problem with the XAPI, a lock on the device or something else. Not easy to find the cause without remote access. Don't hesitate to ping us if this problem comes back.
-
RE: Create VM Error SR_BACKEND_FAILURE_1200, No such Tapdisk.
Also:
Jun 8 22:39:00 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'open', '-p', '11505', '-m', '5', '-a', 'aio:/var/run/sr-mount/ec87c10e-1499-c1c5-cf3f-c234062bb459/ubuntu-22.04-live-server-amd64.iso', '-R'] Jun 8 22:39:00 XCPNG02 SM: [11473] = 13 Jun 8 22:39:00 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'close', '-p', '11505', '-m', '5', '-t', '30'] Jun 8 22:39:00 XCPNG02 SM: [11473] = 0 Jun 8 22:39:00 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'detach', '-p', '11505', '-m', '5'] Jun 8 22:39:01 XCPNG02 SM: [11473] = 0 Jun 8 22:39:01 XCPNG02 SM: [11473] ['/usr/sbin/tap-ctl', 'free', '-m', '5'] Jun 8 22:39:01 XCPNG02 SM: [11473] = 0
There is this error during the tapdisk open call:
Permission denied
(errno 13).
Are you sure you can access correctly to the data of your SR?The last exception is caused in
blktap2.py
:try: tapdisk = cls.__from_blktap(blktap) node = '/sys/dev/block/%d:%d' % (tapdisk.major(), tapdisk.minor) util.set_scheduler_sysfs_node(node, 'noop') return tapdisk except: TapCtl.close(pid, minor) raise
-
RE: Create VM Error SR_BACKEND_FAILURE_1200, No such Tapdisk.
@geoffbland You can downgrade your
sm
version on each host using:yum downgrade sm-2.30.6-1.1.xcpng8.2.x86_64
But I'm not sure if your problem is related to the sm linstor version.
-
RE: RunX: tech preview
@bc-23 You don't have the patched RPMs because there is a new hotfix in the 8.2 and 8.2.1 versions on the main branch. So the actual xenopsd package version is greater than runx... So we must build a new version of the runx packages on our side to correct this issue. We will fix that.
-
RE: RunX: tech preview
@bc-23 What's your xenopsd version? We haven't updated the modified runx package of xenopsd to support runx with XCP-ng 8.2.1. It is possible that you are using the latest packages without the right patches. ^^"
So please to confirm this issue using
rpm -qa | grep xenops
. -
RE: RunX: tech preview
@theaeon said in RunX: tech preview:
Oh now that's interesting. Turns out the containers (both archlinux and the one i just created) are exiting w/ error 143. They're getting sigterm'ed from somewhere.
It's related to how we terminate the VM process: it's a wrapper and not the real process that manages the VM. But we shouldn't show this code to users, it's not the real code, I will create an issue on our side, thanks for the feedback.