XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    XCP-ng 7.5 - MegaRAID SAS 9240-8i hang/reboot issue.

    Scheduled Pinned Locked Moved Development
    30 Posts 3 Posters 10.2k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • R Offline
      r1 XCP-ng Team
      last edited by

      Let us see if a newer kernel would help. There is also an option of back porting the newer driver to older kernel with possible code changes. Both will be experimental though!

      mpyuskoM 1 Reply Last reply Reply Quote 0
      • mpyuskoM Offline
        mpyusko @r1
        last edited by

        @r1

        Same thing happens in XS 7.6 "upgrading" from XS 7.5. Interenstingly enough post upgrade, the upper-left corner says Xenserver 7.5 but the stats field and Xencenter report 7.6.
        Same thing happens in a clean installation of XS 7.6 too.

        mpyuskoM 1 Reply Last reply Reply Quote 0
        • R Offline
          r1 XCP-ng Team
          last edited by

          @mpyusko Let me see if we can build driver 07.703.05.00-rc1 for your XCP-NG 7.5/6, will let you know if it becomes available.

          1 Reply Last reply Reply Quote 0
          • R Offline
            r1 XCP-ng Team
            last edited by r1

            @mpyusko Please get the driver from link and
            [root@xcp-ng-rjv ~]# yum install megaraid_sas-07.703.05.00-1.x86_64.rpm
            [root@xcp-ng-rjv ~]# rmmod megaraid_sas
            [root@xcp-ng-rjv ~]# modprobe megaraid_sas

            Then check for your lspci.

            // Additional info

            [root@xcp-ng-rjv ~]# modinfo /usr/lib/modules/4.4.0+10/weak-updates/megaraid_sas/megaraid_sas.ko
            filename:       /usr/lib/modules/4.4.0+10/weak-updates/megaraid_sas/megaraid_sas.ko
            description:    Avago MegaRAID SAS Driver
            author:         megaraidlinux.pdl@avagotech.com
            version:        07.703.05.00
            license:        GPL
            srcversion:     2A8AB66F9A16F0542FC2173
            
            mpyuskoM 1 Reply Last reply Reply Quote 0
            • mpyuskoM Offline
              mpyusko @mpyusko
              last edited by

              For the record...

              @mpyusko said in XCP-ng 7.5 - MegaRAID SAS 9240-8i hang/reboot issue.:

              @r1

              Same thing happens in XS 7.6 "upgrading" from XS 7.5. Interenstingly enough post upgrade, the upper-left corner says Xenserver 7.5 but the stats field and Xencenter report 7.6.
              Same thing happens in a clean installation of XS 7.6 too.

              XS 7.5

              ***** megaraid_sas Version Info *****
              version:        07.701.18.00-rc1
              srcversion:     550B32DFFACE241631510C5
              vermagic:       4.4.0+10 SMP mod_unload modversions
              

              XS 7.6

              ***** megaraid_sas Version Info *****
              version:        07.701.18.00-rc1
              srcversion:     550B32DFFACE241631510C5
              vermagic:       4.4.0+10 SMP mod_unload modversions
              
              1 Reply Last reply Reply Quote 0
              • R Offline
                r1 XCP-ng Team
                last edited by r1

                @r1 said in XCP-ng 7.5 - MegaRAID SAS 9240-8i hang/reboot issue.:

                get the driver from link and

                Try with this driver.

                1 Reply Last reply Reply Quote 0
                • mpyuskoM Offline
                  mpyusko @r1
                  last edited by mpyusko

                  @r1 said in XCP-ng 7.5 - MegaRAID SAS 9240-8i hang/reboot issue.:

                  @mpyusko Please get the driver from link and
                  [root@xcp-ng-rjv ~]# yum install megaraid_sas-07.703.05.00-1.x86_64.rpm
                  [root@xcp-ng-rjv ~]# rmmod megaraid_sas
                  [root@xcp-ng-rjv ~]# modprobe megaraid_sas

                  I did what you requested....

                  [root@vincent Downloads]# yum install megaraid_sas-07.703.05.00-1.x86_64.rpm
                  Loaded plugins: fastestmirror
                  Cannot open: megaraid_sas-07.703.05.00-1.x86_64.rpm. Skipping.
                  Error: Nothing to do
                  [root@vincent Downloads]# rpm -Uhv megaraid_sas-07.703.05.00-1.x86_64.rpm                                         
                  error: megaraid_sas-07.703.05.00-1.x86_64.rpm: not an rpm package (or package manifest):
                  [root@vincent Downloads]#
                  
                  1 Reply Last reply Reply Quote 0
                  • R Offline
                    r1 XCP-ng Team
                    last edited by r1

                    Can you post #ls -lh and md5sum output of it?

                    mpyuskoM 1 Reply Last reply Reply Quote 0
                    • mpyuskoM Offline
                      mpyusko @r1
                      last edited by

                      @r1 said in XCP-ng 7.5 - MegaRAID SAS 9240-8i hang/reboot issue.:

                      Can you post #ls -lh and md5sum output of it?

                      -rw-r--r-- 1 root root  40K Oct  5 13:28 megaraid_sas-07.703.05.00-1.x86_64.rpm
                      e1e232eab5d90308144bf3c47665cedd  megaraid_sas-07.703.05.00-1.x86_64.rpm
                      
                      1 Reply Last reply Reply Quote 0
                      • R Offline
                        r1 XCP-ng Team
                        last edited by

                        You seem to have downloaded something wrong.

                        my output is

                        [root@xcp-ng-rjv ~]# wget "https://github.com/rushikeshjadhav/MegaRAID-SAS-07.703.05.00/raw/master/megaraid_sas-07.703.05.00-1.x86_64.rpm"
                        [root@xcp-ng-rjv ~]# ls -lh megaraid_sas-07.703.05.00-1.x86_64.rpm 
                        -rw-r--r-- 1 root root 388K Oct  4 21:26 megaraid_sas-07.703.05.00-1.x86_64.rpm
                        [root@xcp-ng-rjv ~]# md5sum megaraid_sas-07.703.05.00-1.x86_64.rpm 
                        ef3064607545e0d390445f9e82ab8930  megaraid_sas-07.703.05.00-1.x86_64.rpm
                        
                        1 Reply Last reply Reply Quote 0
                        • R Offline
                          r1 XCP-ng Team
                          last edited by

                          @mpyusko Did you happen to check this?

                          1 Reply Last reply Reply Quote 0
                          • mpyuskoM Offline
                            mpyusko
                            last edited by

                            Just got to it again....

                            ***** ahci Version Info *****
                            version:        3.0
                            srcversion:     35F0A9078B4BB938E54A1E7
                            vermagic:       4.4.0+10 SMP mod_unload modversions
                            
                            
                            ***** megaraid_sas Version Info *****
                            version:        07.703.05.00
                            srcversion:     2A8AB66F9A16F0542FC2173
                            vermagic:       4.4.0+10 SMP mod_unload modversions
                            
                            

                            lspci -v output

                            [root@vincent nfs]# lspci -v -s 07:00.0
                            07:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
                                    Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9240-8i
                                    Flags: bus master, fast devsel, latency 0, IRQ 40
                                    I/O ports at ec00 [size=256]
                                    Memory at df2bc000 (64-bit, non-prefetchable) [size=16K]
                                    Memory at df2c0000 (64-bit, non-prefetchable) [size=256K]
                                    Expansion ROM at df200000 [disabled] [size=256K]
                                    Capabilities: [50] Power Management version 3
                                    Capabilities: [68] Express Endpoint, MSI 00
                                    Capabilities: [d0] Vital Product Data
                                    Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                                    Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
                                    Capabilities: [100] Advanced Error Reporting
                                    Capabilities: [138] Power Budgeting <?>
                                    Capabilities: [150] Single Root I/O Virtualization (SR-IOV)
                                    Capabilities: [190] Alternative Routing-ID Interpretation (ARI)
                                    Kernel driver in use: megaraid_sas
                            
                            [root@vincent nfs]#
                            

                            lspci -vv output

                            [root@vincent nfs]# lspci -vv -s 07:00.0
                            07:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
                                    Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9240-8i
                                    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                    Latency: 0, Cache Line Size: 64 bytes
                                    Interrupt: pin A routed to IRQ 40
                                    Region 0: I/O ports at ec00 [size=256]
                                    Region 1: Memory at df2bc000 (64-bit, non-prefetchable) [size=16K]
                                    Region 3: Memory at df2c0000 (64-bit, non-prefetchable) [size=256K]
                                    Expansion ROM at df200000 [disabled] [size=256K]
                                    Capabilities: [50] Power Management version 3
                                            Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                                            Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                                    Capabilities: [68] Express (v2) Endpoint, MSI 00
                                            DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                                                    ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                                            DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                                                    RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                                                    MaxPayload 256 bytes, MaxReadReq 512 bytes
                                            DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                                            LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
                                                    ClockPM- Surprise- LLActRep- BwNot-
                                            LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                                    ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                                            LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                                            DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
                                            DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                                            LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                                                     Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                                     Compliance De-emphasis: -6dB
                                            LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                                                     EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                            
                            

                            And then same result. Ugh.

                            1 Reply Last reply Reply Quote 0
                            • R Offline
                              r1 XCP-ng Team
                              last edited by

                              @mpyusko If I understood correctly, lspci -vv -s 07:00.0 is crashing the host? Even on megaraid_sas version 07.703.05.00. But Kali linux host does not crash on same megaraid_sas version.

                              To resolve this, Do you have console access to the host? or remote KVM?

                              I would suggest you to boot your host in "XCP-ng in Safe Mode", this menu comes up when you start to boot the host. Instead of default "XCP-ng" choose "XCP-ng in Safe Mode".

                              This will allow us to see the messages generated in kern.log or onscreen about the crash and would point it right to the problem.

                              Meanwhile if you have some stack trace logs in kern.log, please share those.

                              1 Reply Last reply Reply Quote 0
                              • mpyuskoM Offline
                                mpyusko
                                last edited by

                                Yes, you are understanding correctly. I have Root, Console, iDRAC, KVM and physical access to the machine

                                The SEL reports:

                                 Normal  0.000202Mon Oct 15 2018 03:24:03 An OEM diagnostic event has occurred.  
                                 Normal  0.000201Mon Oct 15 2018 03:24:03 An OEM diagnostic event has occurred.  
                                 Normal  0.000200Mon Oct 15 2018 03:24:03 An OEM diagnostic event has occurred.  
                                 Normal  0.000199Mon Oct 15 2018 03:24:03 An OEM diagnostic event has occurred.  
                                 Non-Recoverable  0.000198Mon Oct 15 2018 03:24:03 CPU 1 machine check detected.  
                                 Normal  0.000197Mon Oct 15 2018 03:24:00 An OEM diagnostic event has occurred.  
                                 Critical  0.000196Mon Oct 15 2018 03:24:00 A bus fatal error was detected on a component at bus 0 device 9 function 0.  
                                 Critical  0.000195Mon Oct 15 2018 03:23:59 A bus fatal error was detected on a component at slot 3. 
                                
                                

                                Please note, I have tried changing slots, the same issue occurs, and the SEL reports accordingly. The kern.log does not have any applicable output. Neither in 'normal' mode, nor in "safe mode". same applies to dmesg. I'm running tail -f from both files. It there is any output, it's not being logged or displayed.

                                Under "safe mode" the output is:

                                # lspci -vv -s 07:00.0
                                07:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2008 [Falcon] (rev 03)
                                        Subsystem: LSI Logic / Symbios Logic MegaRAID SAS 9240-8i
                                        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                        Latency: 0, Cache Line Size: 64 bytes
                                        Interrupt: pin A routed to IRQ 40
                                        Region 0: I/O ports at ec00 [size=256]
                                        Region 1: Memory at df2bc000 (64-bit, non-prefetchable) [size=16K]
                                        Region 3: Memory at df2c0000 (64-bit, non-prefetchable) [size=256K]
                                        Expansion ROM at df200000 [disabled] [size=256K]
                                        Capabilities: [50] Power Management version 3
                                                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                                                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                                        Capabilities: [68] Express (v2) Endpoint, MSI 00
                                                DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                                                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                                                DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                                                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                                                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                                                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                                                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
                                                        ClockPM- Surprise- LLActRep- BwNot-
                                                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                                                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                                                DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
                                                DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                                                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                                                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                                         Compliance De-emphasis: -6dB
                                                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                                                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                                
                                
                                

                                I really don't feel I should be having this issue since it is all mainstream, enterprise hardware. The only thing "odd" about this server is I pulled out the PERC 6/i controller and installed a brand new LSI controller because I my drives exceed the 2TB limit of the PERC. Even when idle in Maintenance Mode, it will still randomly reboot with the same SEL output. This makes it too unstable to run for production, or even a dev environment. It could be minutes, hours, or days between random reboots. Probably due to the kernel accessing the controller for some health check or something. In Maintenance Mode, there are no VMs running, just XCP-ng, and that's it. The system is on a conditioned powersource with battery-backup. So I am ruling out dips and spikes. The iDRAC also reports on power quality, usage, and health. Everything is good. As I said before, this does not happen under Kali. I probably have other boot flashes for other OSes and distros I can try. But the fact is, if it was hardware related, then it would never be stable.

                                mpyuskoM 1 Reply Last reply Reply Quote 0
                                • olivierlambertO Offline
                                  olivierlambert Vates 🪐 Co-Founder CEO
                                  last edited by

                                  We'll have a more recent kernel to test, thanks to @r1's work. This could be interesting to test I suppose.

                                  mpyuskoM 1 Reply Last reply Reply Quote 0
                                  • mpyuskoM Offline
                                    mpyusko @mpyusko
                                    last edited by

                                    Debain Stretch reports:

                                    # modinfo megaraid_sas | grep version
                                    version:        06.811.02.00-rc1
                                    srcversion:     64B34706678212A7A9CC1B1
                                    vermagic:       4.9.0-7-amd64 SMP mod_unload modversions
                                    

                                    and it completed successfully.

                                    Unfortunately, the NIC drivers are not configured for this boot flash, so I can't copy and past the console output.

                                    1 Reply Last reply Reply Quote 0
                                    • R Offline
                                      r1 XCP-ng Team
                                      last edited by

                                      @mpyusko I'm surprised that the host is "rebooting" even after having in "safe-mode" thats not expected behavior.

                                      Step1:
                                      I know only one situation in earlier Xen days (3.x) when BIOS CPU C-states were causing CPU to black out, resulting an undetectable crash. To rule this out, please set your server to "performance mode" from BIOS so that it does not try to enter in power save mode randomly.

                                      Step2:
                                      Please have your grub Xen line updated as
                                      multiboot2 /boot/xen.gz noreboot no-mce ....

                                      Step3:
                                      A newer version of driver is released on 12 Sept, 07.707.00.00. I'll make that available for you to isolate the "driver" to be the reason for crash.

                                      Step4:
                                      I have a test kernel which may fix this issue - or at least help us isolate "kernel" to be the reason for crash.

                                      [root@xcp-ng-kernel ~]# modinfo /usr/lib/modules/4.9.133/kernel/drivers/scsi/megaraid/megaraid_sas.ko | grep -i ver
                                      filename:       /usr/lib/modules/4.9.133/kernel/drivers/scsi/megaraid/megaraid_sas.ko
                                      description:    Avago MegaRAID SAS Driver
                                      version:        06.811.02.00-rc1
                                      srcversion:     E452D341082401C48444BC7
                                      vermagic:       4.9.133 SMP mod_unload modversions 
                                      [root@xcp-ng-kernel ~]# uname -a
                                      Linux xcp-ng-kernel 4.9.133 #1 SMP Sun Oct 14 15:48:31 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
                                      [root@xcp-ng-kernel ~]# 
                                      
                                      1 Reply Last reply Reply Quote 0
                                      • mpyuskoM Offline
                                        mpyusko @olivierlambert
                                        last edited by

                                        @olivierlambert It will be. Kali with 4.15 and Debian with 4.9 both do not exhibit the issue. However Xenserver and XCP-ng both do. I'd be interested to compare their compiler settings as to what they do and do not include.

                                        1 Reply Last reply Reply Quote 0
                                        • mpyuskoM Offline
                                          mpyusko
                                          last edited by

                                          I just received an LSI SAS-9211 I ordered. I have the exact same controller running in a production environment for a web-hosting company. Similar architecture (HP DL380 G6) and it operates flawlessly with XS 7.1 so I figured I would try on in this machine. Initially I wanted the 9240 for it's hardware based RAID6. However with ZFS support rolling out in the new versions of XS/XCP-ng, there are greater gains to be had there. So opting for a different JBOD mode controller was a fair choice.

                                          Interesting things to note.

                                          • Both brand new LSI cards
                                          • they each use different drivers.
                                          • 9240 supports a broad range of RAID levels 0,1,10,5,6,etc and JBOD
                                          • 9211supports RAD levels 0,1,10, 1E/10E and JBOD
                                          • My system was originally configure for two drives in a 1TB RAID1 volume and 4 drives in a 9TB RAID5 volume.

                                          I removed the 9240 and installed the 9211 making sure the port 0 = 0 and 1 = 1. I then booted the system and entered the LSI setup. unlike the 924, there was no option to import existing arrays. Rather than detsroy everything on the 1TB array (the 9TB was still empty) I opted to just boot straight to XCP-ng 7.6 without any changes. The last time I shut down the server I detached all the storage volumes from the VMs. (A quick trick I learned... detatch volumes, export VMs - only takes a couple minutes and a few KB - then dd flashdrive to backup flash. Upgrade Xen, reattach drives. Keeps you from risking your data, especially on critical machines.) When I booted this time, all the SR's were broken but a quick repair brought them back and when I reattached the volumes, the VMs booted. Remember this was originally a Hardware level Array. I'm still trying to peek into the Array to see if both drives are functioning, but appears healthy so far.

                                          The big question is will the system still generate the same issue? Well, one is a megaraid controller and the other isn't so they use different drivers. Here is the output...

                                          [root@vincent ~]# lspci |grep LSI
                                          07:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
                                          [root@vincent ~]# lspci -vv -s 07:00.0
                                          07:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
                                                  Subsystem: LSI Logic / Symbios Logic Device 3020
                                                  Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
                                                  Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
                                                  Latency: 0, Cache Line Size: 64 bytes
                                                  Interrupt: pin A routed to IRQ 40
                                                  Region 0: I/O ports at ec00 [size=256]
                                                  Region 1: Memory at df2bc000 (64-bit, non-prefetchable) [size=16K]
                                                  Region 3: Memory at df2c0000 (64-bit, non-prefetchable) [size=256K]
                                                  Expansion ROM at df200000 [disabled] [size=512K]
                                                  Capabilities: [50] Power Management version 3
                                                          Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                                                          Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
                                                  Capabilities: [68] Express (v2) Endpoint, MSI 00
                                                          DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
                                                                  ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
                                                          DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+
                                                                  RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                                                                  MaxPayload 256 bytes, MaxReadReq 512 bytes
                                                          DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                                                          LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s, Exit Latency L0s <64ns, L1 <1us
                                                                  ClockPM- Surprise- LLActRep- BwNot-
                                                          LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                                                                  ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                                                          LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                                                          DevCap2: Completion Timeout: Range BC, TimeoutDis+, LTR-, OBFF Not Supported
                                                          DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled
                                                          LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                                                                   Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                                                                   Compliance De-emphasis: -6dB
                                                          LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                                                                   EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
                                                  Capabilities: [d0] Vital Product Data
                                          pcilib: sysfs_read_vpd: read failed: Input/output error
                                                          Not readable
                                                  Capabilities: [a8] MSI: Enable- Count=1/1 Maskable- 64bit+
                                                          Address: 0000000000000000  Data: 0000
                                                  Capabilities: [c0] MSI-X: Enable+ Count=15 Masked-
                                                          Vector table: BAR=1 offset=00002000
                                                          PBA: BAR=1 offset=00003800
                                                  Capabilities: [100 v1] Advanced Error Reporting
                                                          UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                                                          UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                                                          UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
                                                          CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                                                          CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+
                                                          AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
                                                  Capabilities: [138 v1] Power Budgeting <?>
                                                  Capabilities: [150 v1] Single Root I/O Virtualization (SR-IOV)
                                                          IOVCap: Migration-, Interrupt Message Number: 000
                                                          IOVCtl: Enable- Migration- Interrupt- MSE- ARIHierarchy+
                                                          IOVSta: Migration-
                                                          Initial VFs: 16, Total VFs: 16, Number of VFs: 0, Function Dependency Link: 00
                                                          VF offset: 1, stride: 1, Device ID: 0072
                                                          Supported Page Size: 00000553, System Page Size: 00000001
                                                          Region 0: Memory at 0000000000000000 (64-bit, non-prefetchable)
                                                          Region 2: Memory at 0000000000000000 (64-bit, non-prefetchable)
                                                          VF Migration: offset: 00000000, BIR: 0
                                                  Capabilities: [190 v1] Alternative Routing-ID Interpretation (ARI)
                                                          ARICap: MFVC- ACS-, Next Function: 0
                                                          ARICtl: MFVC- ACS-, Function Group: 0
                                                  Kernel driver in use: mpt3sas
                                          
                                          [root@vincent ~]#
                                          
                                          

                                          And specifically the Module in use...

                                          [root@vincent ~]# modinfo mpt3sas |grep -i version
                                          version:        22.00.00.00
                                          srcversion:     80624A1362CD953ED59AF65
                                          vermagic:       4.4.0+10 SMP mod_unload modversions
                                          [root@vincent ~]#
                                          

                                          (Yes, my server is named after the robot in the Black Hole)

                                          mpyuskoM 1 Reply Last reply Reply Quote 0
                                          • mpyuskoM Offline
                                            mpyusko @mpyusko
                                            last edited by mpyusko

                                            @mpyusko BTW.... Yes, it did "repair" the SR, but it did not rebuild the array. It is functioning off one drive only. the other is now out of sync. Not a perfect transition, but it's nice to know my data isn't lost by simply swapping adapters. Clearly I'll need to build a software array to rectify the issue.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post