XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. inaki.martinez
    I
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 6
    • Groups 0

    inaki.martinez

    @inaki.martinez

    0
    Reputation
    2
    Profile views
    6
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    inaki.martinez Unfollow Follow

    Latest posts made by inaki.martinez

    • RE: Alert: Control Domain Memory Usage

      @stormi current loaded modules:

      Module                  Size  Used by
      bridge                196608  0 
      tun                    49152  0 
      nfsv3                  49152  1 
      nfs_acl                16384  1 nfsv3
      nfs                   307200  5 nfsv3
      lockd                 110592  2 nfsv3,nfs
      grace                  16384  1 lockd
      fscache               380928  1 nfs
      bnx2fc                159744  0 
      cnic                   81920  1 bnx2fc
      uio                    20480  1 cnic
      fcoe                   32768  0 
      libfcoe                77824  2 fcoe,bnx2fc
      libfc                 147456  3 fcoe,bnx2fc,libfcoe
      scsi_transport_fc      69632  3 fcoe,libfc,bnx2fc
      openvswitch           147456  53 
      nsh                    16384  1 openvswitch
      nf_nat_ipv6            16384  1 openvswitch
      nf_nat_ipv4            16384  1 openvswitch
      nf_conncount           16384  1 openvswitch
      nf_nat                 36864  3 nf_nat_ipv6,nf_nat_ipv4,openvswitch
      8021q                  40960  0 
      garp                   16384  1 8021q
      mrp                    20480  1 8021q
      stp                    16384  2 bridge,garp
      llc                    16384  3 bridge,stp,garp
      ipt_REJECT             16384  3 
      nf_reject_ipv4         16384  1 ipt_REJECT
      xt_tcpudp              16384  9 
      xt_multiport           16384  1 
      xt_conntrack           16384  6 
      nf_conntrack          163840  6 xt_conntrack,nf_nat,nf_nat_ipv6,nf_nat_ipv4,openvswitch,nf_conncount
      nf_defrag_ipv6         20480  2 nf_conntrack,openvswitch
      nf_defrag_ipv4         16384  1 nf_conntrack
      libcrc32c              16384  3 nf_conntrack,nf_nat,openvswitch
      iptable_filter         16384  1 
      dm_multipath           32768  0 
      sunrpc                413696  20 lockd,nfsv3,nfs_acl,nfs
      sb_edac                24576  0 
      intel_powerclamp       16384  0 
      crct10dif_pclmul       16384  0 
      crc32_pclmul           16384  0 
      ghash_clmulni_intel    16384  0 
      pcbc                   16384  0 
      aesni_intel           200704  0 
      aes_x86_64             20480  1 aesni_intel
      cdc_ether              16384  0 
      crypto_simd            16384  1 aesni_intel
      usbnet                 49152  1 cdc_ether
      cryptd                 28672  3 crypto_simd,ghash_clmulni_intel,aesni_intel
      glue_helper            16384  1 aesni_intel
      hid_generic            16384  0 
      mii                    16384  1 usbnet
      dm_mod                151552  1 dm_multipath
      usbhid                 57344  0 
      hid                   122880  2 usbhid,hid_generic
      sg                     40960  0 
      intel_rapl_perf        16384  0 
      mei_me                 45056  0 
      mei                   114688  1 mei_me
      lpc_ich                28672  0 
      i2c_i801               28672  0 
      ipmi_si                65536  0 
      acpi_power_meter       20480  0 
      ipmi_devintf           20480  0 
      ipmi_msghandler        61440  2 ipmi_devintf,ipmi_si
      ip_tables              28672  2 iptable_filter
      x_tables               45056  6 xt_conntrack,iptable_filter,xt_multiport,xt_tcpudp,ipt_REJECT,ip_tables
      sd_mod                 53248  4 
      xhci_pci               16384  0 
      ehci_pci               16384  0 
      tg3                   192512  0 
      xhci_hcd              258048  1 xhci_pci
      ehci_hcd               90112  1 ehci_pci
      ixgbe                 380928  0 
      megaraid_sas          167936  3 
      scsi_dh_rdac           16384  0 
      scsi_dh_hp_sw          16384  0 
      scsi_dh_emc            16384  0 
      scsi_dh_alua           20480  0 
      scsi_mod              253952  13 fcoe,scsi_dh_emc,sd_mod,dm_multipath,scsi_dh_alua,scsi_transport_fc,libfc,bnx2fc,megaraid_sas,sg,scsi_dh_rdac,scsi_dh_hp_sw
      ipv6                  548864  926 bridge,nf_nat_ipv6
      crc_ccitt              16384  1 ipv6
      
      posted in Compute
      I
      inaki.martinez
    • RE: Alert: Control Domain Memory Usage

      @stormi

      • grub.cfg grub.txt
      • xl top for Dom0
        Domain-0 -----r 5461432 0.0 8388608 1.6 8388608 1.6 16 0 0 0 0 0 0 0 0 0 0
      • xe param list for Dom0 (memory)
                               memory-target ( RO): <unknown>
                             memory-overhead ( RO): 118489088
                           memory-static-max ( RW): 8589934592
                          memory-dynamic-max ( RW): 8589934592
                          memory-dynamic-min ( RW): 8589934592
                           memory-static-min ( RW): 4294967296
                            last-boot-record ( RO): '('struct' ('uuid' '5e1386d5-e2c9-47eb-8445-77674d76c803') ('allowed_operations' ('array')) ('current_operations' ('struct')) ('power_state' 'Running') ('name_label' 'Control domain on host: bc2-vi-srv03') ('name_description' 'The domain which manages physical devices and manages other domains') ('user_version' '1') ('is_a_template' ('boolean' '0')) ('is_default_template' ('boolean' '0')) ('suspend_VDI' 'OpaqueRef:NULL') ('resident_on' 'OpaqueRef:946c6678-044a-62ab-2a98-f8c93e34ade9') ('affinity' 'OpaqueRef:946c6678-044a-62ab-2a98-f8c93e34ade9') ('memory_overhead' '84934656') ('memory_target' '4294967296') ('memory_static_max' '4294967296') ('memory_dynamic_max' '4294967296') ('memory_dynamic_min' '4294967296') ('memory_static_min' '4294967296') ('VCPUs_params' ('struct')) ('VCPUs_max' '48') ('VCPUs_at_startup' '48') ('actions_after_shutdown' 'destroy') ('actions_after_reboot' 'destroy') ('actions_after_crash' 'destroy') ('consoles' ('array' 'OpaqueRef:aa16584e-48c6-70a3-98c0-a2ee63b3cfa4' 'OpaqueRef:01efe105-d6fe-de5e-e214-9c6e2b5be498')) ('VIFs' ('array')) ('VBDs' ('array')) ('crash_dumps' ('array')) ('VTPMs' ('array')) ('PV_bootloader' '') ('PV_kernel' '') ('PV_ramdisk' '') ('PV_args' '') ('PV_bootloader_args' '') ('PV_legacy_args' '') ('HVM_boot_policy' '') ('HVM_boot_params' ('struct')) ('HVM_shadow_multiplier' ('double' '1')) ('platform' ('struct')) ('PCI_bus' '') ('other_config' ('struct' ('storage_driver_domain' 'OpaqueRef:166e5128-4906-05cc-bb8d-ec99a3c13dc0') ('is_system_domain' 'true'))) ('domid' '0') ('domarch' 'x64') ('last_boot_CPU_flags' ('struct')) ('is_control_domain' ('boolean' '1')) ('metrics' 'OpaqueRef:2207dad4-d07f-d7f9-9ebb-796072aa37e1') ('guest_metrics' 'OpaqueRef:NULL') ('last_booted_record' '') ('recommendations' '') ('xenstore_data' ('struct')) ('ha_always_run' ('boolean' '0')) ('ha_restart_priority' '') ('is_a_snapshot' ('boolean' '0')) ('snapshot_of' 'OpaqueRef:NULL') ('snapshots' ('array')) ('snapshot_time' ('dateTime.iso8601' '19700101T00:00:00Z')) ('transportable_snapshot_id' '') ('blobs' ('struct')) ('tags' ('array')) ('blocked_operations' ('struct')) ('snapshot_info' ('struct')) ('snapshot_metadata' '') ('parent' 'OpaqueRef:NULL') ('children' ('array')) ('bios_strings' ('struct')) ('protection_policy' 'OpaqueRef:NULL') ('is_snapshot_from_vmpp' ('boolean' '0')) ('snapshot_schedule' 'OpaqueRef:NULL') ('is_vmss_snapshot' ('boolean' '0')) ('appliance' 'OpaqueRef:NULL') ('start_delay' '0') ('shutdown_delay' '0') ('order' '0') ('VGPUs' ('array')) ('attached_PCIs' ('array')) ('suspend_SR' 'OpaqueRef:NULL') ('version' '0') ('generation_id' '') ('hardware_platform_version' '0') ('has_vendor_device' ('boolean' '0')) ('requires_reboot' ('boolean' '0')) ('reference_label' ''))'
                                      memory (MRO): <not in database>
      
      
      posted in Compute
      I
      inaki.martinez
    • RE: Alert: Control Domain Memory Usage

      @stormi this is the current ps aux: ps-aux.txt
      @r1 the sar file is too big to add it here but here is a link sar.txt (valid for a day), and the kernel oom message too messages.txt . From what I can see only around 3GB where accounted for when the OOM killer was triggered (Dom0 has 8GB of memory available).
      In this case rsyslog was killed but I have seen xapi killed on other occasions. I can dig up the logs if they can help.

      posted in Compute
      I
      inaki.martinez
    • RE: Alert: Control Domain Memory Usage

      @stormi This is the info for the current pool master with memory issues. The machine had the last OOM event on October 12th.
      Slabtop: slabopt.txt
      meminfo: meminfo.txt
      sorted top: top_memsort.png

      posted in Compute
      I
      inaki.martinez
    • RE: Alert: Control Domain Memory Usage

      @olivierlambert will upgrade our test environment and see if we can see the issue happening again.
      @stormi there is nothing using particularly too much ram, listing processes by their RSS rss_usage.txt

      posted in Compute
      I
      inaki.martinez
    • RE: Alert: Control Domain Memory Usage

      Just to add that us too have been experiencing the issue pointed out by Dave since we upgraded to 8.0. Even if the in place upgrade did bump the Dom0 memory from 4 to 8GB, we started to get out of memory errors on the pool master after an uptime of around 60 to 70 days.

      Our current solution as mentioned in the thread too, is to icrease the memory for Dom0 to 32GBs but this does only buys us more time until the next reboot.

      The main problem is that once this happens, backups start to fail and the only solution is to empty the host and reboot, which can be disruptive to some large VMs that don't seem to support live migration very well.

      To add some more data, here is a graph of the memory consumption in the master of one of our pools, uptime starts at around week 33 and week 43 is current time (pending a reboot and memory increase for that host). This is a pool of three hosts and 80 vms.
      master-03.png

      Let me know if we can help with log data or anything else.

      posted in Compute
      I
      inaki.martinez