RunX: tech preview

theAeon

@ronan-a Confirmed all three points. I did set GUEST_TOOLS in runx.conf-I do wonder if that could be the problem here.

For what its worth log-level debug shouldn't ever make it over to runx, so those earlier pastes should effectively be podman start archlinux

edit: a quick test says that GUEST_TOOLS isnt it.

edit: more logs for ya:

[04:01 lenovo150 ~]# podman logs archlinux
connect to container console with 'xl console abb22f4d68083424252ac7116427dfd9c3644291e858039750f135cae499f8b4'
mount: mount(2) failed: Not a directory
mount: mount(2) failed: Not a directory
mkdir: cannot create directory '/var/lib/containers/storage/overlay/742626ef59426855d765f2cee7b24cac06ecacc60c5ae37668d1f95ff649cd22/merged/etc/hosts': File exists
mount: mount(2) failed: Not a directory
rm: cannot remove '/var/lib/containers/storage/overlay/742626ef59426855d765f2cee7b24cac06ecacc60c5ae37668d1f95ff649cd22/merged//etc/hosts': Device or resource busy
cp: '/var/run/containers/storage/overlay-containers/abb22f4d68083424252ac7116427dfd9c3644291e858039750f135cae499f8b4/userdata/hosts' and '/var/lib/containers/storage/overlay/742626ef59426855d765f2cee7b24cac06ecacc60c5ae37668d1f95ff649cd22/merged//etc/hosts' are the same file

BenjiReis

@theaeon Hi!

I've just setup an runx host following @olivierlambert instructions.
I've reproduced your error when --log-level debug is put in the podman command.

Do you still have an error without it?
podman start archlinux shoud create a VM named container:<something> that starts, stops and then is removed like a container would do.

ronan-a

@theaeon Like I said we must change how arguments are parsed in the runx script, so avoid additional params like --log-devel.

theAeon

For what its worth, the podman logs archlinux command from above is w/o debug. I didn't quite realize it vanishing immediately was intended behavior though, tells you how versed I am in containers.

I'll try setting up the matrixdotorg/mjolnir thing again now that I have a command I know is working on runc.

theAeon

Oh now that's interesting. Turns out the containers (both archlinux and the one i just created) are exiting w/ error 143. They're getting sigterm'ed from somewhere.

ronan-a

@theaeon said in RunX: tech preview:

For what its worth, the podman logs archlinux command from above is w/o debug. I didn't quite realize it vanishing immediately was intended behavior though, tells you how versed I am in containers.
I'll try setting up the matrixdotorg/mjolnir thing again now that I have a command I know is working on runc.

Yeah by default the archlinux image executes the bash command and when the container is started, bash is launched and died just after that. Finally the VM is stopped. This behavior is the same on docker with this image. However using the interactive mode, it's not the case, but we must implement it for a next runx version.

ronan-a

@theaeon said in RunX: tech preview:

Oh now that's interesting. Turns out the containers (both archlinux and the one i just created) are exiting w/ error 143. They're getting sigterm'ed from somewhere.

It's related to how we terminate the VM process: it's a wrapper and not the real process that manages the VM. But we shouldn't show this code to users, it's not the real code, I will create an issue on our side, thanks for the feedback.

theAeon

@ronan-a oop-good to know. Now I guess I need to figure out why the new image i created is exiting instead of, well, working.

Unless there's something in this command that I shouldn't be invoking.

podman create --health-cmd="wget --no-verbose --tries=1 --spider http://127.0.0.1:8080/ || exit 1" --volume=/root/mjolnir:/data:Z matrixdotorg/mjolnir

(I start it separately, later)

bc-23

Hi,

I have started to play around with this feature. I think it's a great idea
At the moment I'm running into issue on container start:

message: xenopsd internal error: Could not find BlockDevice, File, or Nbd implementation: {"implementations":[["XenDisk",{"backend_type":"9pfs","extra":{},"params":"vdi:80a85063-9b59-4fda-82c9-017be0fe967a share_dir none ///srv/runx-sr/1"}]]}

I have created the SR as described above:

uuid ( RO)                    : 968d0b84-213e-a269-3a7a-355cd54f1a1c
              name-label ( RW): runx-sr
        name-description ( RW): 
                    host ( RO): fraxcp04
      allowed-operations (SRO): VDI.introduce; unplug; plug; PBD.create; update; PBD.destroy; VDI.resize; VDI.clone; scan; VDI.snapshot; VDI.create; VDI.destroy; VDI.set_on_boot
      current-operations (SRO): 
                    VDIs (SRO): 80a85063-9b59-4fda-82c9-017be0fe967a
                    PBDs (SRO): 0d4ca926-5906-b137-a192-8b55c5b2acb6
      virtual-allocation ( RO): 0
    physical-utilisation ( RO): -1
           physical-size ( RO): -1
                    type ( RO): fsp
            content-type ( RO): 
                  shared ( RW): false
           introduced-by ( RO): <not in database>
             is-tools-sr ( RO): false
            other-config (MRW): 
               sm-config (MRO): 
                   blobs ( RO): 
     local-cache-enabled ( RO): false
                    tags (SRW): 
               clustered ( RO): false


# xe pbd-param-list uuid=0d4ca926-5906-b137-a192-8b55c5b2acb6
uuid ( RO)                  : 0d4ca926-5906-b137-a192-8b55c5b2acb6
     host ( RO) [DEPRECATED]: a6ec002d-b7c3-47d1-a9f2-18614565dd6c
             host-uuid ( RO): a6ec002d-b7c3-47d1-a9f2-18614565dd6c
       host-name-label ( RO): fraxcp04
               sr-uuid ( RO): 968d0b84-213e-a269-3a7a-355cd54f1a1c
         sr-name-label ( RO): runx-sr
         device-config (MRO): file-uri: /srv/runx-sr
    currently-attached ( RO): true
          other-config (MRW): storage_driver_domain: OpaqueRef:a194af9f-fd9e-4cb1-a99f-3ee8ad54b624

I see also that in /srv/runx-sr a symlink 1 is created, pointing to a overlay image.

The VM is in stated paused after the error above.

The template I used was a old debian PV template, where I removed the PV-bootloader and install-* attributes from other-config. What template would you recommend to use?

Any idea what could cause the error above?

Thanks,
Florian

ronan-a

@bc-23 What's your xenopsd version? We haven't updated the modified runx package of xenopsd to support runx with XCP-ng 8.2.1. It is possible that you are using the latest packages without the right patches. ^^"

So please to confirm this issue using rpm -qa | grep xenops.

bc-23

@ronan-a The server is still running on 8.2

[11:21 fraxcp04 ~]# rpm -qa | grep xenops
xenopsd-0.150.5.1-1.1.xcpng8.2.x86_64
xenopsd-xc-0.150.5.1-1.1.xcpng8.2.x86_64
xenopsd-cli-0.150.5.1-1.1.xcpng8.2.x86_64

Are the patches for this version?

ronan-a

@bc-23 You don't have the patched RPMs because there is a new hotfix in the 8.2 and 8.2.1 versions on the main branch. So the actual xenopsd package version is greater than runx... So we must build a new version of the runx packages on our side to correct this issue. We will fix that.

bc-23

@ronan-a I have seen there are updated packages, thanks
After the update I'm able to start the container/VM

matiasvl

For those that we like to try by using xe, I did this to create the correct template. I have started from a Debian 10 template, you have to replace with the correct UUID (2 VCPUs):

xe vm-install template=Debian\ Buster\ 10 new-name-label=tempforrunx sr-uuid=7c5212f3-97b2-cdeb-b735-ad26638926e3 --minimal
xe vm-param-set uuid=fc5c67c2-ee5a-4b90-8e0f-eb6ff9fdd29a HVM-boot-policy=""
xe vm-param-set uuid=fc5c67c2-ee5a-4b90-8e0f-eb6ff9fdd29a PV-args=""
xe vm-param-set VCPUs-max=2 uuid=fc5c67c2-ee5a-4b90-8e0f-eb6ff9fdd29a
xe vm-param-set VCPUs-at-startup=2 uuid=fc5c67c2-ee5a-4b90-8e0f-eb6ff9fdd29a
xe vm-disk-remove device=0 uuid=fc5c67c2-ee5a-4b90-8e0f-eb6ff9fdd29a
xe template-param-set is-a-template=true uuid=fc5c67c2-ee5a-4b90-8e0f-eb6ff9fdd29a

The template is listed when you issue xe template-list.

r3m8

Hi,

Same as @bc-23, i get the error :

message: xenopsd internal error: Could not find File, BlockDevice, or Nbd implementation: {"implementations":[["XenDisk",{"backend_type":"9pfs","extra":{},"params":"vdi:85daf561-836e-48f1-9b74-1dfef38abe9e share_dir none ///root/runx-sr/1"}]]}

This is my rpm -qa | grep xenops output (my XCP-NG is up-to-date) :

xenopsd-0.150.9-1.1.0.runx.1.xcpng8.2.x86_64
xenopsd-xc-0.150.9-1.1.0.runx.1.xcpng8.2.x86_64
xenopsd-cli-0.150.9-1.1.0.runx.1.xcpng8.2.x86_64

Is it always the runx package that causes problems ? Thanks you all

ronan-a

@r3m8 Weird, did you run a xe-toolstack-restart?

r3m8

@ronan-a We have reviewed our SR and template configuration (especially with xe vm-disk-remove device=0 setting) and it works fine (we had already done an xe-toolstack-restart to avoid restarting the hypervisor)

etomm

Hello all! After testing this and following the guidelines now my XCP-NG is no more able to run VMs.

When I restart the host and try to run a VM it complains that HVM is needed. I just checked the Bios and VT-d is enabled as all the other settings that were there before testing this out.

What can I do?

ronan-a

@etomm Could you share the full error message/trace please?

etomm

@ronan-a I could make it start again doing a yum update.

Then I think I did an error, because I tried to run the following line:

yum remove --enablerepo=epel -y qemu-dp xenopsd xenopsd-cli xenopsd-xc xcp-ng-xapi-storage runx

This killed my xapi.service. Not starting anymore. If you will tell me how to find the log I can give to you the trace