XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. EddieA
    E Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 3
    • Posts 23
    • Groups 0

    EddieA

    @EddieA

    5
    Reputation
    1
    Profile views
    23
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    EddieA Unfollow Follow

    Best posts made by EddieA

    • RE: Kernel trap (??) booting TrueNAS with 2 x Kingston NVMe SSDs

      @olivierlambert said in Kernel trap (??) booting TrueNAS with 2 x Kingston NVMe SSDs:

      Not all NVMe are created equals

      Now all I need to do is determine if these errors are from the NVME's themselves or the 4 x NVMe sled that they're inserted in. LOL.

      Again, thanks for the help.

      posted in Hardware
      E
      EddieA
    • RE: TrueNAS VM failing to start

      OK, not really sure what's going on. I fired XCP back up to try what @teddyastie suggested.

      Looking at the specs for the TrueNAS VM before booting it, it now had zero passthrough devices attached, which wasn't the state of the last time I tried (from memory). So re-added all but 1 passthrough, a GPU. Booted TrueNAS and this time it came up.

      Bingo, I thought, the GPU is the issue, but based on my background, I had to try again with the GPU included to prove it was the culprit. Well, what do you know, after adding it back in, TrueNAS now starts perfectly. One theory destroyed.

      All I can think, is that somehow the passthrough definitions in the VM config were corrupted and finding them all gone and re-adding them fixed this. Who knows.

      But all appears to be good again (for now).

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      @yannsionneau Uploaded contents of /var/crash together with the output of "xen-bugtool --yestoall".

      Cheers.

      posted in Compute
      E
      EddieA
    • RE: xo-server executable not found

      @poddingue said:

      usually looks like an update that got interrupted or only half-applied

      Thinking back on it, I think that may be the issue. in that I was too quick off the mark rebooting after the base upgrades.

      @poddingue said:

      I think the gentler recovery before rebuilding would have been re-running the updater from the CLI

      Kinda tried that, but:

      [18:47 09] xoa@xoa:~$ xoa check
      -bash: xoa: command not found
      [18:47 09] xoa@xoa:~$ sudo xoa-updater --upgrade
      [sudo] password for xoa:
      sudo: xoa-updater: command not found
      [18:48 09] xoa@xoa:~$
      

      But regardless, I'm all good now.

      Cheers.

      posted in Xen Orchestra
      E
      EddieA

    Latest posts made by EddieA

    • RE: TrueNAS VM failing to start

      @tuxen Doing some research, it doesn't look like the Xeon's I have are affected.

      But I'm willing to try the next time I need to reboot. Will report back after that.

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      Wearing my best Lazarus cosplay outfit, I'll apologise for the resurrection.

      Today I had an issue with my UPS which caused me to reboot XCP a few times. During those reboots I had at least 2, maybe 3, re-occurrences of this where when TrueNAS was booting, XCP would lock up. Most of the time, after a power cycle of the server, the next boot would start TRUENAS cleanly. One time it took 2 power cycles before success.

      Unfortunately only one of the crashes resulted in a /var/crash report, but that did have the same symptoms as my original report:

      (XEN) [   81.101362] Non-responding CPUs: {24-47}
      (XEN) [   81.101363]
      (XEN) [   81.101364] ****************************************
      (XEN) [   81.101365] Panic on CPU 5:
      (XEN) [   81.101366] FATAL TRAP: vec 2, NMI[0000] IN INTERRUPT CONTEXT
      (XEN) [   81.101366] ****************************************
      (XEN) [   81.101367]
      (XEN) [   81.101368] Reboot in five seconds...
      (XEN) [   81.101369] Executing kexec image on cpu5
      (XEN) [   82.101441] Failed to shoot down CPUs {24-47}
      

      Between my original report and today, I have rebooted other times, following updates, when this issue has not surfaced.

      Does anyone think this could be hardware related, despite all the memory testing and stress testing I did when I built the server and again after the original issue, all with no faults. Or have I just got an unlucky set of circumstances with some sort of race condition.

      posted in Compute
      E
      EddieA
    • RE: xo-server executable not found

      @poddingue said:

      usually looks like an update that got interrupted or only half-applied

      Thinking back on it, I think that may be the issue. in that I was too quick off the mark rebooting after the base upgrades.

      @poddingue said:

      I think the gentler recovery before rebuilding would have been re-running the updater from the CLI

      Kinda tried that, but:

      [18:47 09] xoa@xoa:~$ xoa check
      -bash: xoa: command not found
      [18:47 09] xoa@xoa:~$ sudo xoa-updater --upgrade
      [sudo] password for xoa:
      sudo: xoa-updater: command not found
      [18:48 09] xoa@xoa:~$
      

      But regardless, I'm all good now.

      Cheers.

      posted in Xen Orchestra
      E
      EddieA
    • xo-server executable not found

      Had some time today, so went to update my home XCP-ng with the latest updates, on both the base and XOA. After rebooting I wasn't able to connect to the XOA appliance UI.

      ssh'ing into XOA and poking around, I find this in the logs:

      Jun 09 18:57:31 xoa systemd[1]: Started xo-server.service - XO Server.
      Jun 09 18:57:31 xoa (o-server)[557]: xo-server.service: Failed to locate executable /usr/local/bin/xo-server: No such file or directory
      

      I've looked through the update notes and don't see anything related. Did I miss something, or did my updates screw up somehow.

      How do I recover this. I did find one write up on deleting the XOA appliance and reinstalling, but the delete fails with:

      [17:15 xcp-ng ~]# xe vm-destroy uuid="3d80a504-3460-96ab-d438-2e6f2e9e0487"
      You attempted an operation that was explicitly blocked (see the blocked_operations field of the given object).
      ref: 3d80a504-3460-96ab-d438-2e6f2e9e0487 (XOA)
      code: true
      [17:15 xcp-ng ~]#
      

      ***** Update *****

      Was able to delete and re-create the XOA appliance.

      So now I'd just like to understand what happened.

      Cheers.

      posted in Xen Orchestra
      E
      EddieA
    • RE: TrueNAS VM failing to start

      OK, not really sure what's going on. I fired XCP back up to try what @teddyastie suggested.

      Looking at the specs for the TrueNAS VM before booting it, it now had zero passthrough devices attached, which wasn't the state of the last time I tried (from memory). So re-added all but 1 passthrough, a GPU. Booted TrueNAS and this time it came up.

      Bingo, I thought, the GPU is the issue, but based on my background, I had to try again with the GPU included to prove it was the culprit. Well, what do you know, after adding it back in, TrueNAS now starts perfectly. One theory destroyed.

      All I can think, is that somehow the passthrough definitions in the VM config were corrupted and finding them all gone and re-adding them fixed this. Who knows.

      But all appears to be good again (for now).

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      Give me a couple of days to try. It is (obviously) down to the combination of devices passed through, as I reported this earlier:

      said in TrueNAS VM failing to start:

      Re-boot XCP and start the TrueNAS VM with NO passthrough devices. As expected, that started up fine.

      Cheers.

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      @yannsionneau Uploaded contents of /var/crash together with the output of "xen-bugtool --yestoall".

      Cheers.

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      @olivierlambert Any further thoughts or suggestions (move PCIe cards around again ??).

      Cheers.

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      Not sure if this helps, from the bottom of the xen.log:

      (XEN) [  919.901833] Watchdog timer detects that CPU23 is stuck!
      (XEN) [  919.901837] ----[ Xen-4.17.5-23  x86_64  debug=n  Not tainted ]----
      (XEN) [  919.901838] CPU:    23
      (XEN) [  919.901839] RIP:    e008:[<ffff82d04032ca4a>] arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901843] RFLAGS: 0000000000000012   CONTEXT: hypervisor
      (XEN) [  919.901845] rax: 0000000000000030   rbx: ffff83103fff7cf8   rcx: 0000000000000017
      (XEN) [  919.901846] rdx: ffff83103fff7df8   rsi: 0000000000000000   rdi: ffff83103fff7cf8
      (XEN) [  919.901847] rbp: 0000000000000017   rsp: ffff831033b87d00   r8:  0000000000000030
      (XEN) [  919.901849] r9:  ffff83103fff7cf8   r10: 0000000000000000   r11: 0000000000000000
      (XEN) [  919.901850] r12: 0000000000000000   r13: ffff82d040987680   r14: 00000000000000fb
      (XEN) [  919.901851] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000007526e0
      (XEN) [  919.901852] cr3: 000000006162f000   cr2: 00007f233881e010
      (XEN) [  919.901853] fsb: 0000000000000000   gsb: 0000000000000000   gss: ffff9ee10f280000
      (XEN) [  919.901854] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
      (XEN) [  919.901857] Xen code around <ffff82d04032ca4a> (arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0):
      (XEN) [  919.901858]  1f 80 00 00 00 00 f3 90 <8b> 0a 39 c8 75 f8 eb 97 66 0f 1f 44 00 00 31 ff
      (XEN) [  919.901862] Xen stack trace from rsp=ffff831033b87d00:
      (XEN) [  919.901863]    ffff82d040987680 ffff82d04023201c ffff831033b87d98 00000000000000fb
      (XEN) [  919.901865]    ffff82d04031166c 0000000000000202 0000000000000000 0000000080000000
      (XEN) [  919.901867]    0000000000000000 0000000000000000 ffff831033b87fff 0000000000000000
      (XEN) [  919.901869]    0000000000000000 0000000000000000 0000000000000000 0000000000000000
      (XEN) [  919.901870]    ffff831033b87fff 0000000000000000 ffff82d040201916 000000d5308191d5
      (XEN) [  919.901872]    000000d529ac0122 0000000000000017 ffff831033b916a0 ffff831033b91738
      (XEN) [  919.901873]    0000000000000060 0000000000000001 ffff82d040987680 ffff831033b87ef8
      (XEN) [  919.901875]    ffff82d040988200 ffff831033b8d06c 000000d5308187aa 0000000000000000
      (XEN) [  919.901877]    000000d5308191d5 ffff831033b916d0 000000fb00000000 ffff82d0402931f4
      (XEN) [  919.901879]    000000000000e008 0000000000000246 ffff831033b87e48 0000000000000000
      (XEN) [  919.901880]    ffff82d0402931ed 0000000000000000 0000000000000000 0000000000000000
      (XEN) [  919.901882]    ffff82d0409875e0 0000000000000017 ffff82d0409d5340 0000000000000017
      (XEN) [  919.901884]    0000000000000017 0000000000007fff ffff82d040820c00 ffff82d040987680
      (XEN) [  919.901885]    ffff82d0409d5340 ffff82d0403001bb ffff82d040988200 ffff82d0409803b0
      (XEN) [  919.901887]    ffff82d0403000e0 ffff831033b92000 ffff83132018e000 ffff83103ffc9000
      (XEN) [  919.901889]    0000000000000017 ffff8323a572e000 ffff82d040301f5e 000000000000003b
      (XEN) [  919.901891]    00007f2339a6a948 0000000000000003 00007f2338828840 00007f232b42a840
      (XEN) [  919.901893]    0000000000000002 00007f2339a6a8d8 00007f2339a6a950 0000000000000001
      (XEN) [  919.901894]    00000000004a2950 00007f2338813740 0000000000000000 0000000000000003
      (XEN) [  919.901896]    00000000009465e0 00007f233881dff0 000000fa00000000 00000000004a9499
      (XEN) [  919.901898] Xen call trace:
      (XEN) [  919.901899]    [<ffff82d04032ca4a>] R arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901902]    [<ffff82d04023201c>] S smp_call_function_interrupt+0x4c/0x90
      (XEN) [  919.901905]    [<ffff82d04031166c>] S do_IRQ+0x2bc/0x710
      (XEN) [  919.901907]    [<ffff82d040201916>] S common_interrupt+0x136/0x150
      (XEN) [  919.901911]    [<ffff82d0402931f4>] S arch/x86/cpu/mwait-idle.c#mwait_idle+0x204/0x3c0
      (XEN) [  919.901913]    [<ffff82d0402931ed>] S arch/x86/cpu/mwait-idle.c#mwait_idle+0x1fd/0x3c0
      (XEN) [  919.901916]    [<ffff82d0403001bb>] S arch/x86/domain.c#idle_loop+0xdb/0xf0
      (XEN) [  919.901918]    [<ffff82d0403000e0>] S arch/x86/domain.c#idle_loop+0/0xf0
      (XEN) [  919.901919]    [<ffff82d040301f5e>] S context_switch+0x1ee/0x900
      (XEN) [  919.901920] 
      (XEN) [  919.901927] CPU3	d[IDLE]v3	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901930] CPU2	d[IDLE]v2	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901934] CPU1	d[IDLE]v1	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901937] CPU0	d[IDLE]v0	e008:ffff82d04032c9d2 in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0x42/0xe0
      (XEN) [  919.901941] CPU4	d[IDLE]v4	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901945] CPU5	d[IDLE]v5	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901949] CPU6	d[IDLE]v6	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901952] CPU7	d[IDLE]v7	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901956] CPU8	d[IDLE]v8	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901960] CPU9	d[IDLE]v9	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901964] CPU10	d[IDLE]v10	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901969] CPU16	d[IDLE]v16	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901972] CPU17	d[IDLE]v17	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901975] CPU11	d[IDLE]v11	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901978] CPU22	d[IDLE]v22	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901983] CPU20	d0v11	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901986] CPU21	d[IDLE]v21	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901991] CPU14	d[IDLE]v14	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901994] CPU15	d[IDLE]v15	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.901998] CPU18	d[IDLE]v18	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.902002] CPU19	d[IDLE]v19	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.902006] CPU13	d[IDLE]v13	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.902009] CPU12	d[IDLE]v12	e008:ffff82d04032ca4a in Xen: arch/x86/time.c#time_calibration_std_rendezvous+0xba/0xe0
      (XEN) [  919.912921] Non-responding CPUs: {24-47}
      (XEN) [  919.912922] 
      (XEN) [  919.912923] ****************************************
      (XEN) [  919.912923] Panic on CPU 23:
      (XEN) [  919.912924] FATAL TRAP: vec 2, NMI[0000] IN INTERRUPT CONTEXT
      (XEN) [  919.912925] ****************************************
      (XEN) [  919.912926] 
      (XEN) [  919.912926] Reboot in five seconds...
      (XEN) [  919.912928] Executing kexec image on cpu23
      (XEN) [  920.912554] Failed to shoot down CPUs {24-47}
      
      

      Cheers.

      posted in Compute
      E
      EddieA
    • RE: TrueNAS VM failing to start

      @olivierlambert As I pointed out earlier, everything was working perfectly until I shut down to replace an NVMe stick, which involved moving around a couple of PCIe cards, hence changing their IDs for passthrough.

      It's a Supermicro X11DPH-T running a pair of Xeon Gold 5118. The BIOS was up to date as of the middle of last year, with a date of 3/5/24.

      Cheers.

      posted in Compute
      E
      EddieA