Broken Host Consoles in fresh xcp-ng 8 installation. VM consoles work.



  • 😢 Version 8 Dom0 console is blank in both XOA and XCP-ng center. Version 7.6 host Dom0 still renders fine.

    Both the latest XOA and XCP-ng center 8.0.1 will not render the console of the hypervisor machines, but they do render the consoles of virtual machines.

    978269db-42e9-4388-961a-e2a538d05654-image.png XOA xo-server 5.52.1 NOT rendering the Dom0 console for 8 built 2019-11-04.

    2019-12-23-BlankDom0.png XCP-ng Center 8.0.1 gives an all white screen for the host console.

    The same XCP-ng Center 8.0.1 displays the console correctly of XCP-ng 7.6 built 2019-06-25. XOA with xo-server 5.44.0 works as well but again only with 7.6 hosts.

    Would graphics cards installed in the servers cause this? I have done nothing to configure the cards in these machines. These are old desktop graphics cards with physical video out ports. XCP-ng Center recognizes them as the "Pass-through whole GPU" type.

    p.s. Firefox "Enhanced Tracking Protection" and/or Adblock Plus interferes with picture upload.



  • I've seen that before where when you initially set the management IP via DHCP it breaks the console when the IP changes. XCP-ng does seem to register that the IP has changed and so it does not update the console URI.

    We were able to resolve the issue by setting the IP statically on the XCP-ng node(I believe a reboot was also required). That updated the console's URI and it was working again in XCP-ng Center and XO.



  • Two fresh installations of xcp-ng 8, neither one has a visible Dom0 console except the physical console on the hardware and ssh.

    Have tried three different client machines. One fresh OS install, but no luck in seeing the Dom0 console. So most likely, the problem is server side. Any ideas at all? Anyone?

    Will have to try disabling the PCIe slots with the graphics adapter cards.


  • XCP-ng Team

    Is the XCP-ng host behind a NAT?



  • I'm having exact same issue. Fresh install of XCP-ng 8.

    I can ssh into the server and run xscenter and it works fine, but can't use XCP-ng Center 8.0.1 to see the console.

    The consoles for the various VMs I have running work fine. No NAT involved, everything on same subnet.



  • @olivierlambert
    No NAT. Same subnet.
    xcp-ng 7 host consoles work fine in same xcp center instance.
    Fresh install, reboot, and yum update and rebooted.
    I will disable physical GPUs via disabling their PCI slots and report back.



  • @woodguy908 Do you happen to have any dedicated graphic cards in your host? I threw old desktop cards with physical outputs in for a test. But this was not a problem earlier. The yum update is likely source of the problem.



  • @rjt No - I don't have a GPU installed on that server.


  • XCP-ng Team

    Hmm strange… I can't reproduce the issue. Anyone else with the problem?

    Can you share your server specs so we can try to find a pattern?



  • @olivierlambert Are you using a Linux client? Wasnt there a recent patch to fix redrawing the screen on Linux clients?


  • XCP-ng Team

    I'm not sure to understand the connection with this issue? Regardless the OS/client, I don't have the issue on Xen Orchestra.



  • [01:05 xcp-ng-G ~]# xe console-list
    uuid ( RO)             : fb69931e-ae2b-c2d3-dbce-3e9a9ae57646
              vm-uuid ( RO): 2149b4df-f113-42dd-8ac5-4eff82ac1b0b
        vm-name-label ( RO): Control domain on host: xcp-ng-G
             protocol ( RO): VT100
             location ( RO): https://192.168.10.192/console?ref=OpaqueRef:8159b2a9-8e7b-450c-ac1c-2d6ac7d64e84
    
    uuid ( RO)             : 12aae74a-4d4f-7ed3-976d-b36e0ae1905d
              vm-uuid ( RO): 2149b4df-f113-42dd-8ac5-4eff82ac1b0b
        vm-name-label ( RO): Control domain on host: xcp-ng-G
             protocol ( RO): RFB
             location ( RO): https://192.168.10.192/console?ref=OpaqueRef:71434657-a37c-4ba2-bdb1-64d8f45a1a3c
    
    [01:07 xcp-ng-G ~]# ip a show | egrep inet | egrep -v '(inet 127)'
        inet 192.168.10.193/16 brd 192.168.255.255 scope global xenbr0
    

    192.168.10.193 != 192.168.10.192 Control domain consoles are still using a very very old IP address totally different than the statically leased DHCP address of Dom0. Even after an emergency network reset. Cannot delete this VM. The location parameter is ReadOnly, so i cannot delete it. I am going to try another Emergency Network Reset and set the static lease as a static IP address. If that does not work, is there a way to blow away these bad consoles or at least the IP addresses?

    Because i knew i would wipe the machine and start over fresh anyway, i tried to save time by using DHCP.



  • Example from another machine on which the IP address of the Control domain VM does not match the actual IP address of Dom0 192.168.2.141 != 192.168.10.192:

    [01:23 eceoxen-B ~]# xe console-list vm-name-label=Control\ domain\ on\ host:\ eceoxen-B
    uuid ( RO)             : d5039d1a-64ad-c8a9-a309-51e568ba2926
              vm-uuid ( RO): 1593da28-8e85-4252-878e-778eb414c549
        vm-name-label ( RO): Control domain on host: eceoxen-B
             protocol ( RO): VT100
             location ( RO): https://192.168.2.141/console?ref=OpaqueRef:7be93ca1-76cf-4649-8276-74891eac0a06
    
    uuid ( RO)             : 7bcf725e-ae83-cf6f-7997-7dd63469929f
              vm-uuid ( RO): 1593da28-8e85-4252-878e-778eb414c549
        vm-name-label ( RO): Control domain on host: eceoxen-B
             protocol ( RO): RFB
             location ( RO): https://192.168.2.141/console?ref=OpaqueRef:1361b935-2d25-4929-9b27-4b0483cbb0f7
    
    [01:25 eceoxen-B ~]# ip a show  | egrep inet | egrep -v '(inet 127)'
        inet 192.168.10.192/16 brd 192.168.255.255 scope global dynamic xenbr0
    


  • Attempts to reset or clear the Dom0 console entries ....

    [02:37 xcp-ng-G ~]# xe console-param-clear param-name=location uuid=fb69931e-ae2b-c2d3-dbce-3e9a9ae57646
    Error: Can only clear RW parameters
    
    [02:38 xcp-ng-G ~]# xe console-param-clear uuid=12aae74a-4d4f-7ed3-976d-b36e0ae1905d param-name=location
    Error: Can only clear RW parameters
    
    [02:38 xcp-ng-G ~]# xe console-param-remove uuid=12aae74a-4d4f-7ed3-976d-b36e0ae1905d param-key=location param-name=location
    Error: Can only remove from parameters of type Set or Map
    
    [02:38 xcp-ng-G ~]# xe console-param-set uuid=12aae74a-4d4f-7ed3-976d-b36e0ae1905d
    

    Is there some way to blow these old Dom0 consoles away?


  • XCP-ng Team

    Does a reboot fix anything? Try to see if you have network parameters set somewhere. Maybe XCP-ng Center saved some config?



  • I have rebooted the hosts numerous times. xcp center configuration looks good.
    Tried reapplying network config, but nothing.
    Tried the xsconsole emergency network reset.
    Tried the firstboot service and /etc/firstboot.d/, but no luck.
    Wonder if it is picking up configuration information from another harddrive used to boot xcp.
    Resigned to wipe and reinstall.


  • XCP-ng Team

    I don't know how this thing might be stuck in your config. At worst you can edit XAPI DB manually and change the value.

    1. Stop XAPI service on all hosts of the pool
    2. Copy /var/xapi/state.db somewhere else in case
    3. Edit it, and find the bad IP, replace it with the right one
    4. Save
    5. Start XAPI on all hosts, master first


  • @olivierlambert
    The /var/lib/xcp/state.db is basically a single line of xml with a half-million characters.
    The old egrep or sed or vi searching did not seem to narrow things down likely they normally do. Used xmllint to put the db in a pretty format with line breaks and indentation. Then vi to edit. Of course, as soon as i start xapi, those changes are lost. I assume there must be a massive speedup in lookups when all on a single line.

    [23:59 xen-B xcp]# pushd /var/lib/xcp/
    [23:59 xen-B xcp]# systemctl stop xapi
    [23:59 xen-B xcp]# cp state.db ./state-YYYYMMDD-HHMM.db
    [23:59 xen-B xcp]# egrep  '(192.168.2.141)' ./state.db | wc
          1   11757  454034
    [23:59 xen-B xcp]# echo " :( 1 line with over 11,000 words :(" 
    [23:59 xen-B xcp]# xmllint --format state.db >> state.xmllint--pretty.db
    [23:59 xen-B xcp]# egrep  '(192.168.2.141)' ./state.db | wc
         10     193   13526
    [23:59 xen-B xcp]# echo "i can deal with 193 words :) and edited with vi" 
    [23:59 xen-B xcp]# mv state.xmllint--pretty.db ./state.db
    [23:59 xen-B xcp]# systemctl start xapi
    

    xe console-list indicates the the proper IP addresses:

    [23:59 xen-B xcp]# xe console-list  vm-name-label=Control\ domain\ on\ host:\ xen-B
    
    uuid ( RO)             : d5039d1a-64ad-c8a9-a309-51e568ba2926
              vm-uuid ( RO): 1593da28-8e85-4252-878e-778eb414c549
        vm-name-label ( RO): Control domain on host: xen-B
             protocol ( RO): VT100
             location ( RO): https://192.168.10.192/console?ref=OpaqueRef:7be93ca1-76cf-4649-8276-74891eac0a06
    
    
    uuid ( RO)             : 7bcf725e-ae83-cf6f-7997-7dd63469929f
              vm-uuid ( RO): 1593da28-8e85-4252-878e-778eb414c549
        vm-name-label ( RO): Control domain on host: xen-B
             protocol ( RO): RFB
             location ( RO): https://192.168.10.192/console?ref=OpaqueRef:1361b935-2d25-4929-9b27-4b0483cbb0f7
    
    
    [00:00 xen-B xcp]# ip a show dev xenbr0 | egrep inet
        inet 192.168.10.192/16 brd 192.168.255.255 scope global dynamic xenbr0
    

    But there is still a blank host console in both XOA and xcp center. Suppose those OpaqueRefs have to be fixed up as well. EFI starting crashing on this r720, so having many other severe issues to deal with.



  • Anybody have an idea of what to do to get our consoles back?


  • XCP-ng Center Team

    Do a recursive grep in /etc to find all files that have the old IP address

    Like grep -R 'IP-Address' /etc



  • @borzel i had tried that but found grep did not work like i am used. egrep worked better, but still not like how i am used to..

     egrep -R '(10\.40\.|192\.168\.)'  /etc
    


XCP-ng Pro Support

XCP-ng Pro Support