XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. dave
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 3
    • Posts 26
    • Groups 0

    dave

    @dave

    20
    Reputation
    59
    Profile views
    26
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    dave Unfollow Follow

    Best posts made by dave

    • RE: Server Locks Up Periodically with ASRock X570D4I-2T AMD Ryzen 9 3900X and Intel X550-AT2

      R2rho We were building dozens ASRock Rack mainboard- and barebone based systems over the past few years. Starting with the X470D4U which worked realy great. Since the X570D4, it started to get messy. The B650D4U is also affected. We had random periodic reboots and freezes, mostly after some weeks or months uptime.

      Interestingly we have identical systems which have an uptime of over a year. I would say, about 60% of the systems were affected.

      BIOS version and attached hardware did not really matter.

      I once contacted the ASRock support, but they did not know of a general problem, instead they suggested to check other components. (which we also did)

      We went the RMA way and we even had some exchanged RMA mainboards, which also were faulty.

      But: The most recent mainboard returning from RMA seems to work...so maybe you`re lucky 🙂

      posted in XCP-ng
      daveD
      dave
    • RE: Alert: Control Domain Memory Usage

      stormi

      I upgraded a pool which was affected from 8.1 to 8.2 this weekend and installed the driver on one of the Hosts. Its a little early, but as you can see, there seems to be a difference in the memory usage:

      Stock Driver

      c53f2add-8bbe-4203-bba8-97e94b466c56-image.png

      Stromis Driver:

      57bf6a63-eff5-47ca-a155-b918c12b95b2-image.png

      One can allready see a constanty, slowly growing mem-usage in "small steps" on the Server with the stock driver, wheras the server with stormis driver seems to be stable.

      posted in Compute
      daveD
      dave
    • RE: Alert: Control Domain Memory Usage

      Today another customer called:

      He had a host (pool master) with 16GB Dom0 mem and uptime of 119 days.

      Currently all my affected Systems were using megaraid_sas and iscsi and 10g intel nics.

      megaraid_sas is found in MrMike and @inaki-martinez mods too.

      This is the customers lsmod:

      Module                  Size  Used by
      tun                    49152  0
      ebtable_filter         16384  0
      ebtables               36864  1 ebtable_filter
      nls_utf8               16384  0
      cifs                  929792  0
      ccm                    20480  0
      fscache               380928  1 cifs
      iscsi_tcp              20480  16
      libiscsi_tcp           28672  1 iscsi_tcp
      libiscsi               61440  2 libiscsi_tcp,iscsi_tcp
      scsi_transport_iscsi   110592  3 iscsi_tcp,libiscsi
      bonding               176128  0
      bridge                196608  1 bonding
      8021q                  40960  0
      garp                   16384  1 8021q
      mrp                    20480  1 8021q
      stp                    16384  2 bridge,garp
      llc                    16384  3 bridge,stp,garp
      ipt_REJECT             16384  3
      nf_reject_ipv4         16384  1 ipt_REJECT
      xt_tcpudp              16384  8
      xt_multiport           16384  1
      xt_conntrack           16384  5
      nf_conntrack          163840  1 xt_conntrack
      nf_defrag_ipv6         20480  1 nf_conntrack
      nf_defrag_ipv4         16384  1 nf_conntrack
      libcrc32c              16384  1 nf_conntrack
      iptable_filter         16384  1
      dm_multipath           32768  0
      sunrpc                413696  1
      sb_edac                24576  0
      intel_powerclamp       16384  0
      crct10dif_pclmul       16384  0
      crc32_pclmul           16384  0
      ghash_clmulni_intel    16384  0
      pcbc                   16384  0
      aesni_intel           200704  0
      aes_x86_64             20480  1 aesni_intel
      crypto_simd            16384  1 aesni_intel
      cryptd                 28672  3 crypto_simd,ghash_clmulni_intel,aesni_intel
      glue_helper            16384  1 aesni_intel
      dm_mod                151552  285 dm_multipath
      ipmi_si                65536  0
      ipmi_devintf           20480  0
      intel_rapl_perf        16384  0
      ipmi_msghandler        61440  2 ipmi_devintf,ipmi_si
      i2c_i801               28672  0
      sg                     40960  0
      lpc_ich                28672  0
      acpi_power_meter       20480  0
      ip_tables              28672  2 iptable_filter
      x_tables               45056  7 ebtables,xt_conntrack,iptable_filter,xt_multiport,xt_tcpudp,ipt_REJECT,ip_tables
      hid_generic            16384  0
      usbhid                 57344  0
      hid                   122880  2 usbhid,hid_generic
      sd_mod                 53248  9
      isci                  163840  0
      ahci                   40960  0
      libsas                 86016  1 isci
      libahci                40960  1 ahci
      scsi_transport_sas     45056  2 isci,libsas
      xhci_pci               16384  0
      ehci_pci               16384  0
      igb                   233472  0
      libata                274432  3 libahci,ahci,libsas
      ehci_hcd               90112  1 ehci_pci
      xhci_hcd              258048  1 xhci_pci
      e1000e                286720  0
      megaraid_sas          167936  12
      scsi_dh_rdac           16384  0
      scsi_dh_hp_sw          16384  0
      scsi_dh_emc            16384  0
      scsi_dh_alua           20480  1
      scsi_mod              253952  15 isci,scsi_dh_emc,scsi_transport_sas,sd_mod,dm_multipath,scsi_transport_iscsi,scsi_dh_alua,iscsi_tcp,libsas,libiscsi,megaraid_sas,libat                                                                                                    a,sg,scsi_dh_rdac,scsi_dh_hp_sw
      ipv6                  548864  545 bridge
      crc_ccitt              16384  1 ipv6
      
      
      posted in Compute
      daveD
      dave
    • RE: Alert: Control Domain Memory Usage

      Don`t restart openvswitch, if you have active iSCSI storage attached.

      posted in Compute
      daveD
      dave
    • RE: XcpNG - Xen kernel crash (FATAL TRAP: vector = 2 (nmi))

      @petr-bena Thanks.

      I can confirm: Until now everything is stable for us, too. ( with nmi=dom0 )

      olivierlambert Since i have only production-servers with the affected hardware ATM, i cant test the 8.1 beta right now. But after relase i will try 8.1 final. Do you think there is a real chance that this error wont appear in 8.1 stock? Or should I do the same change?

      posted in Compute
      daveD
      dave
    • RE: XcpNG - Xen kernel crash (FATAL TRAP: vector = 2 (nmi))

      Hi!

      @petr-bena did you have crashes since your change nmi=dom0 ?

      We have a similar problem.

      There are 4 servers in different locations, two standalone, two of them in pools, all with the same hardware:

      Supermicro X11SRA-RF Version: 1.02
      and
      Intel(R) Xeon(R) W-2145 CPU

      We tried all BIOS Versions and a lot off different settings.

      Two of them are runnig XCP 7.6 and have uptimes of 143 and 160 days. No Problems at all.

      Two of them are running XCP 8.0 and crash regulary between 2 or 30 days, everytime with the same error.

      NMI - PCI system error (SERR)

      The crash is more likely to happen, if we produce high IO and/or network load on those hosts.

      We suspected a hardware error, so we took one of those crashing servers to our workshop and testet it for almost two weeks with Prime95 and Memtest86 and other things that came in mind.

      We were not able to produce any crash. Neither were we able to detect any errors.

      We put this particular server back in production and it crashed within the first hours while we were migrating some VMs back to him. (with Storage Migration)

      So i think, it has something to do with XCP-ng 8.0.

      I will try the change nmi=dom0 next.

      (XEN) [395218.940883] 
      (XEN) [395218.940886] 
      (XEN) [395218.940886] NMI - PCI system error (SERR)
      (XEN) [395218.940889] ----[ Xen-4.11.1-7.8.xcpng8.0  x86_64  debug=n   Not tainted ]----
      (XEN) [395218.940889] CPU:    0
      (XEN) [395218.940890] RIP:    e008:[<ffff82d0802c6d38>] mwait_idle_with_hints+0xf8/0x160
      (XEN) [395218.940894] RFLAGS: 0000000000000046   CONTEXT: hypervisor
      (XEN) [395218.940896] rax: 0000000000000001   rbx: 000167730fc96b09   rcx: 0000000000000001
      (XEN) [395218.940897] rdx: 0000000000000000   rsi: ffff83006f667ef8   rdi: ffff83006f667fff
      (XEN) [395218.940898] rbp: 0000000000000000   rsp: ffff83006f667e00   r8:  0000000000000048
      (XEN) [395218.940899] r9:  000530dec0dbea3e   r10: 0000000000000008   r11: ffff83207cac1a68
      (XEN) [395218.940900] r12: 0000000000000000   r13: 0000000000000001   r14: 0000000000000001
      (XEN) [395218.940902] r15: ffff82d080573d00   cr0: 0000000080050033   cr4: 0000000000362660
      (XEN) [395218.940903] cr3: 00000012c899a000   cr2: ffffe783981eb000
      (XEN) [395218.940904] fsb: 0000000000000000   gsb: ffff88827bf40000   gss: 0000000000000000
      (XEN) [395218.940906] ds: 002b   es: 002b   fs: 0000   gs: 0000   ss: e010   cs: e008
      (XEN) [395218.940908] Xen code around <ffff82d0802c6d38> (mwait_idle_with_hints+0xf8/0x160):
      (XEN) [395218.940908]  89 f0 44 89 e9 0f 01 c9 <0f> b6 47 f5 80 a6 fd 00 00 00 fe 44 89 c1 0f 30
      (XEN) [395218.940912] Xen stack trace from rsp=ffff83006f667e00:
      (XEN) [395218.940913]    ffff83207cac4f08 0000000000000000 ffff83207cac4e90 ffff82d080573d00
      (XEN) [395218.940915]    ffff82d0805baa50 ffff82d080592b20 ffff83207cac4f08 ffff82d0802ccd07
      (XEN) [395218.940916]    000167730faf9015 0000000100000002 00000108000004c9 0000000000000000
      (XEN) [395218.940918]    0000000000000000 ffff82d08035b43e ffff83006f7fc000 ffffffffffffffff
      (XEN) [395218.940919]    ffff82d08035b400 ffff82d080573d00 ffff82d0805baa50 0000000000000000
      (XEN) [395218.940921]    0000000000000000 ffff82d080592b20 ffff83006f667fff ffff82d08026e505
      (XEN) [395218.940922]    ffff83006f7fc000 ffff83006f7fc000 ffff83006f7bf000 ffff83207cb69000
      (XEN) [395218.940924]    00000000ffffffff ffff8320246cc000 ffff82d080592b20 ffff88827ae3d700
      (XEN) [395218.940926]    ffff88827ae3d700 0000000000000000 0000000000000000 0000000000000005
      (XEN) [395218.940927]    ffff88827ae3d700 0000000000000246 ffffc9004106b930 0000000000000000
      (XEN) [395218.940928]    000000000001ca00 0000000000000000 ffffffff810013aa ffffffff8203c190
      (XEN) [395218.940930]    0000000000000000 0000000000000001 0000010000000000 ffffffff810013aa
      (XEN) [395218.940931]    000000000000e033 0000000000000246 ffffc90040113eb0 000000000000e02b
      (XEN) [395218.940933]    6f5b7c2b6f667fe0 6f5b7cae00097f76 6f5b7da200000000 6f5b79516f667fe0
      (XEN) [395218.940934]    0000e01000000000 ffff83006f7fc000 0000000000000000 0000000000362660
      (XEN) [395218.940936]    0000000000000000 800000207caef002 0000070100000000 6f5b883e00097f00
      (XEN) [395218.940938] Xen call trace:
      (XEN) [395218.940939]    [<ffff82d0802c6d38>] mwait_idle_with_hints+0xf8/0x160
      (XEN) [395218.940942]    [<ffff82d0802ccd07>] mwait-idle.c#mwait_idle+0x337/0x3d0
      (XEN) [395218.940945]    [<ffff82d08035b43e>] lstar_enter+0xae/0x120
      (XEN) [395218.940946]    [<ffff82d08035b400>] lstar_enter+0x70/0x120
      (XEN) [395218.940950]    [<ffff82d08026e505>] domain.c#idle_loop+0x85/0xb0
      (XEN) [395218.940951] 
      (XEN) [395218.940952] 
      (XEN) [395218.940953] ****************************************
      (XEN) [395218.940953] Panic on CPU 0:
      (XEN) [395218.940954] FATAL TRAP: vector = 2 (nmi)
      (XEN) [395218.940955] [error_code=0000] , IN INTERRUPT CONTEXT
      (XEN) [395218.940955] ****************************************
      (XEN) [395218.940956] 
      (XEN) [395218.940956] Reboot in five seconds...
      (XEN) [395218.940958] Executing kexec image on cpu0
      (XEN) [395218.941963] Shot down all CPUs
      
      posted in Compute
      daveD
      dave

    Latest posts made by dave

    • RE: Server Locks Up Periodically with ASRock X570D4I-2T AMD Ryzen 9 3900X and Intel X550-AT2

      R2rho yeah, there are Supermicro systems with AM5 which can handle a decent amount of load, like based on the h13sae-mf, like:

      https://www.supermicro.com/de/products/system/mainstream/1u/as-1015a-mt
      (with less depth)

      Seem to be stable, but we have a small issue regarding onboard graphics ATM:

      https://xcp-ng.org/forum/topic/9976/black-screen-after-install-on-supermicro-h13sae-mf-with-ryzen-9950x/3?_=1734419502978

      posted in XCP-ng
      daveD
      dave
    • RE: Server Locks Up Periodically with ASRock X570D4I-2T AMD Ryzen 9 3900X and Intel X550-AT2

      R2rho We were building dozens ASRock Rack mainboard- and barebone based systems over the past few years. Starting with the X470D4U which worked realy great. Since the X570D4, it started to get messy. The B650D4U is also affected. We had random periodic reboots and freezes, mostly after some weeks or months uptime.

      Interestingly we have identical systems which have an uptime of over a year. I would say, about 60% of the systems were affected.

      BIOS version and attached hardware did not really matter.

      I once contacted the ASRock support, but they did not know of a general problem, instead they suggested to check other components. (which we also did)

      We went the RMA way and we even had some exchanged RMA mainboards, which also were faulty.

      But: The most recent mainboard returning from RMA seems to work...so maybe you`re lucky 🙂

      posted in XCP-ng
      daveD
      dave
    • RE: Troubleshooting Backups (in general)

      olivierlambert Yes, i will update and have a look if it changes the behaviour anyhow.

      "For the rest, it's hard to answer without digging more."

      That`t exactly was i was looking for: Any Information where i can dig deeper?

      I`m looking for logs or traces of errors or so.

      It is not that i just wan`t this particular problem to be solved 🙂

      In my limited understanding of interal processes i think somehow a backup process dies and xo-server does not report or recognize this.

      But shouldn`t it somehow?

      posted in Xen Orchestra
      daveD
      dave
    • RE: Troubleshooting Backups (in general)

      olivierlambert OK. Sorry 🙂 Xen Orchestra, commit 17027 installed on Debian 10.

      Just to add: I think in general it`s not something thats just happening at this version. I have seen such things happening for a few years on different systems.

      Until now, i lived with it (quite well) 🙂

      Just trying to explore where i can find aditional information and try to improve my understanding of whats happening under the hood.

      posted in Xen Orchestra
      daveD
      dave
    • RE: Troubleshooting Backups (in general)

      olivierlambert its XO, as i wrote: with Server 5.107.5 and Web 5.109.0,

      posted in Xen Orchestra
      daveD
      dave
    • Troubleshooting Backups (in general)

      Hi, i want to learn how i can troubleshoot backup job problems.

      On the event of an error happening, In most cases the job status gets set to failed and i have an error message which i can then trace and resolve.

      But occasionally this does not happen, like in the following example:

      I have a job which runs one time in a week full backup and the other days delta.

      1438d9ef-dc99-4c08-bc07-7eabd7a8f974-image.png

      This job started a full backup on Dec 31. 22 on 5:00 AM. It was still in the state "started" 24 hours later.

      • there was no visible activity anymore (no tasks, no traffic)

      • 3VMs were backed up successfully

      • the timeout of this job was set to 23 hours (so it should have been killed allready?)

      Because of this job beeing stuck in "Started" the following days fail with "job allready running"

      67d5f441-a6a5-4723-9803-f467a12afab4-image.png

      On Jan 3 I restartet xo-server service and then the job was set to "interrupted" without an end time.

      The delta backup on Jan 4. started as planed, but is stuck again.

      I would probably be able to reconfigure the job, and it would be ok, but since this happens sometimes i would like to understand what happens.

      Where i can get aditional information?
      Why is it, at least after the configured timeout, not set to failed?

      BTW: I am running source with Server 5.107.5 and Web 5.109.0, but i had such things happening in earlier versions too.

      posted in Xen Orchestra
      daveD
      dave
    • Backup on encrypted, exchangable disks

      Hi,
      in small environments with a single host, we often use https://github.com/NAUbackup/VmBackup for backup to USB Disks encrypted with LUKS.

      This basically does a "xe vm-export" and works well.

      We wrap this into a small shell script, that identifies the currently attached USB drive, mounts it, backups, and then unmounts it.

      Everything is handled by the hosts itself.

      Now i was thinking about using a XO VM fur such Tasks, the following questions came into my mind:

      • What would be the best way to expose the attached USB drives to a XO VM? NFS, USB Passthrough or udev SR or such?

      • How to configure XO for Backup Remotes that get Plugged or unplugged?

      • What backup Strategy would be best suited for such a Task?

      posted in Xen Orchestra
      daveD
      dave
    • RE: Alert: Control Domain Memory Usage

      stormi

      I upgraded a pool which was affected from 8.1 to 8.2 this weekend and installed the driver on one of the Hosts. Its a little early, but as you can see, there seems to be a difference in the memory usage:

      Stock Driver

      c53f2add-8bbe-4203-bba8-97e94b466c56-image.png

      Stromis Driver:

      57bf6a63-eff5-47ca-a155-b918c12b95b2-image.png

      One can allready see a constanty, slowly growing mem-usage in "small steps" on the Server with the stock driver, wheras the server with stormis driver seems to be stable.

      posted in Compute
      daveD
      dave
    • RE: Can't install guest utilities in pfSense

      Ascar Maybe this helps:

      https://forum.netgate.com/topic/97553/pfsense-2-3-on-xen-server/5

      Also, dont forget to disable offloading.

      posted in Compute
      daveD
      dave
    • RE: Alert: Control Domain Memory Usage
      [10:36 xs03 ~]# free -m
                    total        used        free      shared  buff/cache   available
      Mem:          11921       11322         171         151         427         175
      Swap:          1023          37         986
      [10:36 xs03 ~]# ps -ef | grep CROND | wc -l
      1
      
      

      BTW: All my affected pools never had dynamic memory.

      posted in Compute
      daveD
      dave