XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XOA Create VM and Delete VM Struggling..... Tasks Getting Stuck.....?

    Scheduled Pinned Locked Moved Xen Orchestra
    14 Posts 3 Posters 78 Views 3 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • M Offline
      MichaelCropper
      last edited by

      Hmm.....

      yum install -y nc
      
      nc -zv {IP of Windows Machine where SRs live} 445
      Ncat: Version 7.50 ( https://nmap.org/ncat )
      Ncat: Connected to {IP of Windows Machine where SRs live}:445.
      Ncat: 0 bytes sent, 0 bytes received in 0.01 seconds.
      
      

      Is that suggesting with 0 bytes sent that it's not getting out from XCP-ng? So never actually hitting the Windows machine where the SMB SR lives?

      What could be causing that?

      1 Reply Last reply Reply Quote 0
      • M Offline
        MichaelCropper
        last edited by

        Northing weird getting blocked on XCP-ng firewall for outbound connections

        iptables -L
        Chain INPUT (policy ACCEPT)
        target     prot opt source               destination
        xapi_nbd_input_chain  tcp  --  anywhere             anywhere             tcp dpt:nbd
        ACCEPT     gre  --  anywhere             anywhere
        ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:rtps-dd-mt
        RH-Firewall-1-INPUT  all  --  anywhere             anywhere
        ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:rtps-dd-mt
        
        Chain FORWARD (policy ACCEPT)
        target     prot opt source               destination
        RH-Firewall-1-INPUT  all  --  anywhere             anywhere
        
        Chain OUTPUT (policy ACCEPT)
        target     prot opt source               destination
        xapi_nbd_output_chain  tcp  --  anywhere             anywhere             tcp spt:nbd
        
        Chain RH-Firewall-1-INPUT (2 references)
        target     prot opt source               destination
        ACCEPT     all  --  anywhere             anywhere
        ACCEPT     icmp --  anywhere             anywhere             icmp any
        ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootps
        ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
        ACCEPT     udp  --  anywhere             anywhere             ctstate NEW udp dpt:ha-cluster
        ACCEPT     tcp  --  anywhere             anywhere             ctstate NEW tcp dpt:ssh
        ACCEPT     tcp  --  anywhere             anywhere             ctstate NEW tcp dpt:http
        ACCEPT     tcp  --  anywhere             anywhere             ctstate NEW tcp dpt:https
        ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:21064
        ACCEPT     udp  --  anywhere             anywhere             multiport dports hpoms-dps-lstn,netsupport
        REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited
        
        Chain xapi_nbd_input_chain (1 references)
        target     prot opt source               destination
        REJECT     all  --  anywhere             anywhere             reject-with icmp-port-unreachable
        
        Chain xapi_nbd_output_chain (1 references)
        target     prot opt source               destination
        REJECT     all  --  anywhere             anywhere             reject-with icmp-port-unreachable
        

        All the above is just out of the box default, no additional configurations.

        1 Reply Last reply Reply Quote 0
        • M Offline
          MichaelCropper
          last edited by

          Various similar issues I've spotted on Reddit re. Windows updates breaking SMB Mounts over the last 12-18 months with updates. Impossible to track down the specifics so not even going to bother.

          Then found this issue too, https://xcp-ng.org/forum/topic/10545/long-delays-at-46-when-creating-or-starting-a-new-vm

          Which was solved by moving from SMB Mount to NFS Mount.

          That isn't going to work with my infrastructure setup as the Windows machine is on a Home Edition, and NFS is only supported on Business or Enterprise edition.

          As it stands right now, and given nothing I've tried to get the SMB Mount to work again, feels like I'm at a dead end to get this fixed. Looking like I'll need to re-address the architecture for storing the ISOs and one of the backup routes (thankfully there are many types of backups in place for redundancy)

          Not the end of the world, but a tad annoying which feels like it's probably a Windows update that broke this with one of the almost daily automated updates/shutdowns/restarts that kicks in. Only started happening yesterday, all previous backups via the SR were working fine until then. That's the only thing I can put this down to really.

          1 Reply Last reply Reply Quote 0
          • M Offline
            MichaelCropper
            last edited by

            Bit more info.....

            The issue reading up on this appears to be related to SMB v1, which is commonly referred to as "Insecure" (More like Less Secure if we're being accurate.....)

            I've just run the following on the Windows machine where the SMB Shares live via Powershell (as an Admin)

            Get-WindowsOptionalFeature -Online -FeatureName SMB1Protocol
            

            Which has clarified that this is currently running, which throws a spanner in the works as I was expecting that to be Disabled, which would back up the theory in the previous comment.

            FeatureName      : SMB1Protocol
            DisplayName      : SMB 1.0/CIFS File Sharing Support
            Description      : Support for the SMB 1.0/CIFS file sharing protocol and the Computer Browser protocol.
            RestartRequired  : Possible
            State            : Enabled
            CustomProperties :
                               ServerComponent\Description : Support for the SMB 1.0/CIFS file sharing protocol and the Computer
                               Browser protocol.
                               ServerComponent\DisplayName : SMB 1.0/CIFS File Sharing Support
                               ServerComponent\Id : 487
                               ServerComponent\Type : Feature
                               ServerComponent\UniqueName : FS-SMB1
                               ServerComponent\Deploys\Update\Name : SMB1Protocol
            
            1 Reply Last reply Reply Quote 0
            • M Offline
              MichaelCropper
              last edited by

              SMB v2 + 3 are also turned on;

              Get-SmbServerConfiguration | Select EnableSMB2Protocol
              
              

              Which outputs

              EnableSMB2Protocol
              ------------------
                            True
              
              G 1 Reply Last reply Reply Quote 0
              • G Offline
                Greg_E @MichaelCropper
                last edited by

                @MichaelCropper

                I would restart the xcp-ng tool stack or do a rolling pool reboot.

                Since your SR is hosted on Windows, I might uninstall the last updates and see if things start working. Then make a plan to move storage to Truenas.

                M 1 Reply Last reply Reply Quote 0
                • M Offline
                  MichaelCropper
                  last edited by

                  Solved.....

                  Turns out that everything above was basically a red herring.....

                  After a lot of playing around with the carrot and stick, noticed a lot of Tasks in XOA just getting stuck at 0%, the kinds of Tasks that usually just appear and disappear in no-time, to the point that you just don't even notice they are happening really. Something just felt off.

                  Multiple restarts of the following command when SSH'ing into the XCP-ng Server

                  xe-toolstack-restart
                  

                  It cleared the Task list in XOA, but things just started to back up again with no obvious insights into what was really going on. I did notice the Progress % Bar wasn't showing up though, that was odd....

                  Anyhow, after all this "fun" I decided to give the server a physical kick, aka. a Manual Hard Power-Off and Power-On.

                  Low and behold, the age-old IT solution of turning it off and back on again has solved everything magically 🤣#FFS 🤦🤣

                  Hopefully the above self-documenting debugging process helps future readers (myself included, more than likely)

                  Anyhow, just a thought @olivierlambert @greg_e - I've clearly no idea what fell over under the hood here, you guys probably have much more of an insight into this than I do - So the question is.....

                  1. What went on here? (gut feel)

                  2. What insights could be surfaced to the user of XOA/XO to help debug these things?

                  3. What friendly error messages could be added to help with #2 etc.

                  Kind of rhetorical questions really, so just adding some thoughts for future product development.

                  Personally, my gut feel here is that it's probably some low level Linux service/package/driver/etc. that fell over and didn't auto-recover - and XOA/XO had no way of detecting what had happened and hence was just a bit blind so seemed to go a bit wacky.

                  If it helps to improve the platform, happy to have a private DM or in this thread re. any kinds of commands I can run on said server to see if it can help pinpoint what failed (if logs persisted after a hard reboot) - it'll mean more to you than me the outputs of said commands.

                  1 Reply Last reply Reply Quote 0
                  • M Offline
                    MichaelCropper @Greg_E
                    last edited by

                    @Greg_E Looks like you were writing a reply as I was...

                    But yeah, Truenas is on the roadmap at some point in the future, once I've had the time to dig into the details and understand where this fits in the overall system architecture.

                    Out of interest though, where would you see Truenas fitting in with this setup outlined below?

                    My initial thoughts are that Truenas in this context would be either;

                    1. A glorified USB Stick storing ISOs
                    2. A glorified (RAID Redundant) of an External HDD running as the Backup platform
                    3. Or both of the above

                    Have I understood that context correctly?

                    Current setup is essentially.....

                    • Bare Metal
                      • HBA Raid Controllers for Storage
                        • Primary Raid Array
                        • Backup Raid Array
                      • Dom0
                        • XCP-ng Server
                      • DomU
                        • XOA VM
                          • Backups; Primary Raid Array -> Backup Raid Array
                          • Backups; Primary Raid Array -> Remote PC
                          • Backups; Primary Raid Array -> Dom0
                    • Remote PC
                      • ISOs via SMB Share

                    Truenas in the above context would simply be an additional backup to either;

                    1. A Truenas VM on the Base Metal as a VM
                    2. A physically separate Bare Metal running Truenas (aka. a glorified RAID'ed External USB Disk).

                    Is that where you would see things fitting into the overall system architecture?

                    Curious to get your thoughts on that.....

                    G 1 Reply Last reply Reply Quote 0
                    • G Offline
                      Greg_E @MichaelCropper
                      last edited by

                      @MichaelCropper

                      My Truenas runs on bare metal, I have smb SR for iso, NFS SR for VMs, another smb for VMs.

                      In production I have a second old Truenas that that has smb and nfs that I use for storage updates, migrate from faster to slower storage which is still generally enough for my needs, update Truenas, migrate back to faster. I also have a third Truenas that has user data, but it's big so I set up a backup remote smb to spread out my disaster footprint.

                      In my lab is just a single bare metal Truenas with whatever kind of share I need. Generally just smh and nfs or iso and vm.

                      Both systems run three XCP-ng hosts so I can do things like rolling pool updates with no VM downtime. RPU is genius, all automated, just click a button and watch VMs move and hosts reboot.

                      M 1 Reply Last reply Reply Quote 0
                      • M Offline
                        MichaelCropper @Greg_E
                        last edited by

                        @Greg_E Interesting, I've not had a play with RPUs yet, sounds pretty handy though.

                        I can see how that could work when using Truenas as the HostOS on the Bare Metal, whereas I've got XCP-ng on the Bare Metal so I'd need to get some more machines to give that a go in due course.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post