XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Upgrade 8.2.1 -> 8.3 failed (manually fixed)

    Scheduled Pinned Locked Moved XCP-ng
    3 Posts 2 Posters 64 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • C Offline
      cg
      last edited by

      After upgrading my first server successfully, I upgraded another one recently (different environments, no pool), but it failed.

      I remembered from the forum to check installer logs, so I copied the whole directory (in case it contains useful info) and switched to another console (ALT+F3?) to see where it failed.
      I don't know if it's documented somewhere, but following that console is pretty informative, rather then just seeing a progress bar.
      Older windows types didn't hide what the installer is doing. It's a bit sad XCP-ng "hides" that.

      tl;dr: The problem seemed to be:
      STANDARD ERROR:

      cp: error reading '/tmp/primary-jqbXmQ/usr/lib64/python2.7/lib-dynload/_codecs_hk.so': Input/output error
      cp: failed to extend '/tmp/backup-TbutMQ/usr/lib64/python2.7/lib-dynload/_codecs_hk.so': Input/output error
      

      As it was during backup phase, nothing was broken and I could just retry... to end up with the same problem.
      As it looks like some hongkong locales, I just removed the file and tried again: with success.
      Backup ran through, install/upgrade went fine. Box is running since.

      I didn't back the file up, but with "ls" it looked fine like everything else. Also nobody ever touched that file. I can't say why, but wanted to drop it here, for archival purposes. Maybe someone else stumbles over a close or similar problem.

      As the logs and other terminal give quite some information about current actions, debugging was somehow fun and it was interesting to dig a bit into what the installer is actually doing. Big pro over Microsoft... which often is a big pain to debug.

      If you want the whole installlog-dir: I still have it, but will delete the next days, if not.

      Greetings

      • Christof
      1 Reply Last reply Reply Quote 0
      • bleaderB Offline
        bleader Vates 🪐 XCP-ng Team
        last edited by

        When you say retry, did you completely restart the installation process, or did you just go back to the installer screen and retry the step that does the backup?

        If you retried the step, you should still have all the logs of the installer on the updated host in /var/log/installer/ in that case, I would suggest checking the dmesg-log, that sounds like a disk error to me.

        Having removed the file would not impact the newly upgraded host in anyway, it just means you lack that file in the backup, so that could have an impact in case you were to restore, which I hope you won't need.

        Other than that, I would encourage you to check your disk health as this could be a sign of hardware error.

        C 1 Reply Last reply Reply Quote 1
        • C Offline
          cg @bleader
          last edited by

          @bleader

          IIRC I just "tried again".
          It failed 2 times, then I looked up the logs from other console, removed the file (which shouldn't be of any importance for our instance) and retried without reboot.

          I copied the whole installer-log to the usb stick before finshing the install. 🙂
          (Could actually be a good hint or even a menu-option for those, where the install fails and won't leave it on the harddrive - e.g. evaluating hardware)

          [  128.517356] ata1.00: exception Emask 0x0 SAct 0x800000 SErr 0x0 action 0x0
          [  128.517357] ata1.00: irq_stat 0x40000008
          [  128.517359] ata1.00: failed command: READ FPDMA QUEUED
          [  128.517362] ata1.00: cmd 60/80:b8:10:6c:d4/00:00:02:00:00/40 tag 23 ncq dma 65536 in
                   res 41/40:10:80:6c:d4/00:00:02:00:00/00 Emask 0x409 (media error) <F>
          [  128.517363] ata1.00: status: { DRDY ERR }
          [  128.517364] ata1.00: error: { UNC }
          [  128.518008] ata1.00: configured for UDMA/133
          [  128.518018] sd 0:0:0:0: [sda] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
          [  128.518020] sd 0:0:0:0: [sda] tag#23 Sense Key : Medium Error [current] 
          [  128.518021] sd 0:0:0:0: [sda] tag#23 Add. Sense: Unrecovered read error - auto reallocate failed
          [  128.518024] sd 0:0:0:0: [sda] tag#23 CDB: Read(10) 28 00 02 d4 6c 10 00 00 80 00
          [  128.518025] print_req_error: I/O error, dev sda, sector 47475840
          [  128.518039] ata1: EH complete
          [  128.581286] ata1.00: exception Emask 0x0 SAct 0x2000000 SErr 0x0 action 0x0
          [  128.581287] ata1.00: irq_stat 0x40000008
          [  128.581288] ata1.00: failed command: READ FPDMA QUEUED
          [  128.581291] ata1.00: cmd 60/08:c8:80:6c:d4/00:00:02:00:00/40 tag 25 ncq dma 4096 in
                   res 41/40:08:80:6c:d4/00:00:02:00:00/00 Emask 0x409 (media error) <F>
          [  128.581292] ata1.00: status: { DRDY ERR }
          [  128.581293] ata1.00: error: { UNC }
          [  128.582111] ata1.00: configured for UDMA/133
          [  128.582117] sd 0:0:0:0: [sda] tag#25 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
          [  128.582118] sd 0:0:0:0: [sda] tag#25 Sense Key : Medium Error [current] 
          [  128.582119] sd 0:0:0:0: [sda] tag#25 Add. Sense: Unrecovered read error - auto reallocate failed
          [  128.582121] sd 0:0:0:0: [sda] tag#25 CDB: Read(10) 28 00 02 d4 6c 80 00 00 08 00
          [  128.582122] print_req_error: I/O error, dev sda, sector 47475840
          [  128.582133] ata1: EH complete
          [  128.629307] ata1.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x0
          [  128.629309] ata1.00: irq_stat 0x40000008
          [  128.629310] ata1.00: failed command: READ FPDMA QUEUED
          [  128.629313] ata1.00: cmd 60/08:48:80:6c:d4/00:00:02:00:00/40 tag 9 ncq dma 4096 in
                   res 41/40:08:80:6c:d4/00:00:02:00:00/00 Emask 0x409 (media error) <F>
          [  128.629314] ata1.00: status: { DRDY ERR }
          [  128.629315] ata1.00: error: { UNC }
          [  128.630068] ata1.00: configured for UDMA/133
          [  128.630074] sd 0:0:0:0: [sda] tag#9 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
          [  128.630076] sd 0:0:0:0: [sda] tag#9 Sense Key : Medium Error [current] 
          [  128.630077] sd 0:0:0:0: [sda] tag#9 Add. Sense: Unrecovered read error - auto reallocate failed
          [  128.630078] sd 0:0:0:0: [sda] tag#9 CDB: Read(10) 28 00 02 d4 6c 80 00 00 08 00
          [  128.630079] print_req_error: I/O error, dev sda, sector 47475840
          [  128.630092] ata1: EH complete
          

          Indeed it looks like the SSD should be replaced.

          8.3 is running stable on this (and all other hosts, I upgraded) so far.
          It's a system at a UAS, running various student projects for several years now, coming from XenServer originally. I voluntarily maintain it. Thx for the hint!

          1 Reply Last reply Reply Quote 1
          • First post
            Last post