XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Booting to Dracut (I trusted ChatGPT)

    Scheduled Pinned Locked Moved XCP-ng
    12 Posts 4 Posters 155 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • N Offline
      nuentes @dthenot
      last edited by

      @dthenot Thanks for getting back to me.

      Yes, it seems we still have time to prepare for the robot uprising

      I did boot from the initrd fallback before, and ChatGPT walked me through hosing that one as well.

      I ran the command from that doc as verbose.

      alt text

      I ran the exact command a 2nd time as:

      dracut -f --verbose /boot/initrd-4.19-xen.img 4.19-xen
      

      No change. Boot to dracut with the keyboard not working. I've tried multiple kernels.

      N 1 Reply Last reply Reply Quote 0
      • N Offline
        nuentes @nuentes
        last edited by

        Does anybody have any ideas?

        Am I mounting the correct partition from shell? p1 and p2 look very similar, but not identical.

        Why does my dracut error have so many modules that "will not be installed because x could not be found!"?

        I also have metadata backups. Would these be helpful?

        1 Reply Last reply Reply Quote 0
        • B Offline
          bvitnik @nuentes
          last edited by

          @nuentes said in Booting to Dracut (I trusted ChatGPT):

          Other things that I didn't mention yet that may or may not be relevent:

          • the enclosure is connected with a USB-C cable to a USB-C input
          • the enclosure hosts 4 disks
          • Other disks/enclosures were not experiencing the disconnect issue
          • The enclosure was actually working fine until a few days after I troubleshot/resolved an issue with one disk having slow transfer speeds. I switched the enclosure from USB-A to USB-C and also disabled spindown for the affected disk in the enclosure. Disabling spindown was done at the VM level, so I didn't mention it above.

          😬 there is your problem

          USB is very unreliable for any kind of serious data transfer. Disconnecting devices, data transfer errors, corruptions are just the tip of the iceberg. For anything reliable you have to go for network attached storage or eSATA.

          I'm that type of a person that verifies md5 sums of all files copied to the USB flash, disk, enclosure etc. I've spotted data corruptions sooooo many times regardless of the OS, version, HW, USB type, storage device type... The only common thing was USB. eSATA and ethernet never produced such corruptions. As far as I'm concerned, USB is for mice and keyboards... and maybe a lamp or fan 😂

          N 1 Reply Last reply Reply Quote -1
          • N Offline
            nuentes @bvitnik
            last edited by

            @bvitnik I'm having a hard time figuring out if you're trying to be helpful, so I'm going to assume you are.

            All of my USB disks are data disks. Xcp-ng can boot just fine without them attached. My OS runs on an NVMe and a hardwired SSD with software RAID 1 (through xcp-ng).

            So yes, this issue initially began as a USB issue, but that's definitely not related to why I'm unable to boot now. In fact, the USB disks have been fully disconnected during all of my troubleshooting so as not to disturb them (or their data) accidentally.

            P B 2 Replies Last reply Reply Quote 0
            • P Offline
              Pilow @nuentes
              last edited by

              @nuentes where you're at, would not be possible to simply wipe/reinstall/restore metadata ?
              or rejoin pool if this is multi host pool ?

              is this a single host with all vms currently sitting in the inaccessible USB ?

              N 1 Reply Last reply Reply Quote 0
              • N Offline
                nuentes @Pilow
                last edited by

                @Pilow I run a single host, so I'm fully offline for a few days now. I know it can be recovered, I'm just not sure the right next steps. I'm just looking for the best way forward right now.

                I do have a metadata backup on one of the USB hard drives. Would restoring from metadata backup actually resolve this?

                P 1 Reply Last reply Reply Quote 0
                • P Offline
                  Pilow @nuentes
                  last edited by

                  @nuentes i GUESS if you make a clean reinstall
                  then restore the metadata... you should be up & running again

                  BEWARE before reinstalling, be sure you can restore said metadatas ! I don't want you to be in any more troubles

                  1 Reply Last reply Reply Quote 0
                  • B Offline
                    bvitnik @nuentes
                    last edited by

                    @nuentes No. My intention was to rise awareness of USB (un)reliability, especially the reliability of USB attached storage. Also, either I'm blind or there is no mention of your system not being installed on USB storage.

                    You said everything yourself. Your problems started with USB which you assumed can be fixed by flicking some kernel parameters. In the process of "fixing", you destroyed your system. Unfortunately, I believe that the system is now beyond repair via interactive forum session because no one knows what really happened. Backup is your best friend.

                    N 1 Reply Last reply Reply Quote 0
                    • N Offline
                      nuentes @bvitnik
                      last edited by

                      @bvitnik You're right. I added a lot of details, but neglected to mention that I'm not booting from USB.

                      I'm really not convinced I've destroyed my system. I truly think that's an over-reaction. I think I ruined my initrd and initramfs files, yes. But that should be recoverable. I haven't done nearly as much as you think I have. The reason I haven't succeeded in that yet is because I'm not really convinced I've been doing it the right way.

                      Since my my disks run in RAID, my system has like 6 partitions.

                      md127p1
                      md127p2
                      md127p3
                      md127p4
                      md127p5
                      md127p6

                      From memory, p1 and p2 are very similar. However p1 doesn't include grub (/boot/efi/EFI). P4 is grub. P2 looks very similar to p1, but it includes grub. P3 is my VHDs. P5 is maybe swap, and I can't remember what the other one is.

                      My point is that I don't believe that I've mounted everything correctly through the shell in order to be able to successfully chroot into the device and be able to run the dracut commands successfully. When I run the dracut commands, I see failures for applications that I can see in the sbin folder.

                      So there is something that I'm missing in mounting these disks in the shell that is preventing me from solving this issue. This is why I'm here. I'm not here for lectures about the dangers of USB.

                      Alternatively, I could boot the install media and simply perform a metadata/pool restore from backup, but I just want someone to tell me that's an actual viable option.

                      I'm not going to simply re-install the OS. If I do, I'll clone it first, and then boot the clone and test a metadata restore. But that's a lot of work for it to fail.

                      B 1 Reply Last reply Reply Quote 0
                      • B Offline
                        bvitnik @nuentes
                        last edited by bvitnik

                        @nuentes Oh no, no. Your system is not destroyed beyond repair. It can be repaired. It's just that it is almost impossible or too much of a hustle for anyone to try to help you over forum. Someone has to sit in front of your machine to do it.

                        My only guess is that ChatGPT instructed you to make changes based on a CentOS system but XCP-ng and Xen virtualization in general is much different than regular CentOS. It has two stage boot process. First the Xen kernel boots and then a special virtual machine called Dom0 is booted. What you are accessing and reconfiguring is in fact this VM, not the underlying "system". So it's like a two layer system and some configuration must be done on Xen layer, some on Dom0 layer. I'm unfortunately unfamiliar with exact specifics on kernel and initrd image generation for this case so I can't spot where thing have gone wrong.

                        In short terms. Instead of going back and forth and trying a lot of different things, it's more time saving and simpler to reinstall the system and restore metadata if you already have a backup.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post