XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. McHenry
    3. Posts
    M
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 63
    • Posts 221
    • Groups 0

    Posts

    Recent Best Controversial
    • pfSense Guest Tools

      I have been using pfSense with xcp-ng for a while now without installing the guest tools.

      Due to some networking complications I have decided to install the guest tools to eliminate this as the cause.

      Q1) Are the guest tools required on pfSense and what do they do?

      Q2) Are these tools being maintained?
      e00ca3e2-5bb2-417f-aaf3-ca23687a3cd3-image.png

      posted in Management
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      @olivierlambert

      Host started and issue resolved.

      posted in Backup
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      Appears to be the same as:
      https://xcp-ng.org/forum/topic/1751/smgc-stuck-with-xcp-ng-8-0?_=1761802212787

      It appears this snapshot is locked by a slave host that is currently offline.

      Oct 30 08:30:09 HST106 SMGC: [1866514] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check'
      

      When using shared storage how does a snapshot become locked by a host?

      Is the scenario where a slave host is offline how can this lock be cleared?

      posted in Backup
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      @olivierlambert

      As per yesterday, the backups are still being "Skipped". Checking the logs I see the following message being repeated:

      Oct 30 08:30:09 HST106 SMGC: [1866514] Found 1 orphaned vdis
      Oct 30 08:30:09 HST106 SM: [1866514] lock: tried lock /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr, acquired: True (exists: True)
      Oct 30 08:30:09 HST106 SMGC: [1866514] Found 1 VDIs for deletion:
      Oct 30 08:30:09 HST106 SMGC: [1866514]   *d4a17b38(100.000G/21.652G?)
      Oct 30 08:30:09 HST106 SMGC: [1866514] Deleting unlinked VDI *d4a17b38(100.000G/21.652G?)
      Oct 30 08:30:09 HST106 SMGC: [1866514] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check', {'path': '/var/run/sr-mount/be743b1c-7803-1943-0a70-baf5fcbfeaaf/d4a17b38-5a3c-438a-b394-fcbb64784499.vhd'})
      Oct 30 08:30:09 HST106 SM: [1866514] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr
      Oct 30 08:30:09 HST106 SM: [1866514] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running
      Oct 30 08:30:09 HST106 SMGC: [1866514] GC process exiting, no work left
      Oct 30 08:30:09 HST106 SM: [1866514] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active
      Oct 30 08:30:09 HST106 SMGC: [1866514] In cleanup
      Oct 30 08:30:09 HST106 SMGC: [1866514] SR be74 ('Shared NAS002') (166 VDIs in 27 VHD trees): no changes
      Oct 30 08:30:09 HST106 SM: [1866514] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running
      Oct 30 08:30:09 HST106 SM: [1866514] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active
      Oct 30 08:30:09 HST106 SM: [1866514] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr
      

      It appears the unlinked VDI is never deleted. Could this be blocking and should this be deleted manually?

      Deleting unlinked VDI *d4a17b38(100.000G/21.652G?)
      

      In regards to the following line, I can identify the VM UUID however is the 2nd UUID a snapshot? (d4a17b38-5a3c-438a-b394-fcbb64784499.vhd)

      Oct 30 08:30:09 HST106 SMGC: [1866514] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check', {'path': '/var/run/sr-mount/be743b1c-7803-1943-0a70-baf5fcbfeaaf/d4a17b38-5a3c-438a-b394-fcbb64784499.vhd'})
      
      posted in Backup
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      I have the following entry in the logs, over and over. Not sure if this is a problem:

      Oct 29 15:25:08 HST106 SMGC: [1009624] Found 1 orphaned vdis
      Oct 29 15:25:08 HST106 SM: [1009624] lock: tried lock /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr, acquired: True (exists: True)
      Oct 29 15:25:08 HST106 SMGC: [1009624] Found 1 VDIs for deletion:
      Oct 29 15:25:08 HST106 SMGC: [1009624]   *d4a17b38(100.000G/21.652G?)
      Oct 29 15:25:08 HST106 SMGC: [1009624] Deleting unlinked VDI *d4a17b38(100.000G/21.652G?)
      Oct 29 15:25:08 HST106 SMGC: [1009624] Checking with slave: ('OpaqueRef:16797af5-c5d1-08d5-0e26-e17149c2807b', 'nfs-on-slave', 'check', {'path': '/var/run/sr-mount/be743b1c-7803-1943-0a70-baf5fcbfeaaf/d4a17b38-5a3c-438a-b394-fcbb64784499.vhd'})
      Oct 29 15:25:08 HST106 SM: [1009624] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr
      Oct 29 15:25:08 HST106 SM: [1009624] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running
      Oct 29 15:25:08 HST106 SMGC: [1009624] GC process exiting, no work left
      Oct 29 15:25:08 HST106 SM: [1009624] lock: released /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active
      Oct 29 15:25:08 HST106 SMGC: [1009624] In cleanup
      Oct 29 15:25:08 HST106 SMGC: [1009624] SR be74 ('Shared NAS002') (166 VDIs in 27 VHD trees): no changes
      Oct 29 15:25:08 HST106 SM: [1009624] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/running
      Oct 29 15:25:08 HST106 SM: [1009624] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/gc_active
      Oct 29 15:25:08 HST106 SM: [1009624] lock: closed /var/lock/sm/be743b1c-7803-1943-0a70-baf5fcbfeaaf/sr
      
      posted in Backup
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      @olivierlambert

      I spoke too soon. The backups started working however the problem has returned.
      221c9885-e718-4a43-b948-db80820666e3-image.png

      I do see 44 items waiting to coalesce. This is new as these would coalesce faster previously without causing this issue.
      06fdf997-95bc-4670-ae48-6c6dd31bbe33-image.png

      Is there a reason the coalesce is taking longer now or is there a way I can add resources to speed up the process?

      posted in Backup
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      @olivierlambert

      Is it XO or xcp-ng that manages the coalescing? Can more resources be applied to assist?

      posted in Backup
      M
      McHenry
    • RE: Job canceled to protect the VDI chain

      @olivierlambert

      I think you are correct. When I checked the Health it showed 46 to coalesce and then number started dropping down to zero. Now the backups appear to be running again 🙂

      I have never seen this before and I am curious as to why it appeared yesterday.

      My fear was storage corruption, as with shared storage it would impact all VMs. I checked TrueNAS and everything appears to be be healthy.

      aa938a4d-33c8-4a55-ab6a-721e6fb3b909-image.png

      86a09009-3ff8-42d7-8fa6-ba283b49fa22-image.png

      posted in Backup
      M
      McHenry
    • Job canceled to protect the VDI chain

      Yesterday our backup job started failing for all VMs with the message:
      "Job canceled to protect the VDI chain"

      27cb4f46-c2fa-43e9-8bb0-5c9c5dd67b2a-image.png

      I have checked the docs regarding VDI chain protection:
      https://docs.xen-orchestra.com/backup_troubleshooting#vdi-chain-protection

      The xcp-ng logs do not show any errors:

      f40b4a41-ad4a-4cf7-be70-dda098c1e274-image.png

      19e6d7a1-d93a-4404-83bb-967b18364909-image.png

      I am using TrueNAS as shared storage.

      posted in Backup
      M
      McHenry
    • RE: VM association with shared storage

      @acebmxer

      Perfect thanks. The issue is we have an IP address locked to that host so the router needs to live there. The host affinity looks like the correct solution.

      e16d15f9-2ceb-4b6e-963d-0cc75c23d9fe-image.png

      Does host affinity also prevent the VM being migrated manually?

      posted in Management
      M
      McHenry
    • RE: VM association with shared storage

      @ph7

      When a rolling pool update is performed I imagine the VMs are moved off the host being updated to another host. When the update is completed are the VMs moved back again?

      I ask as I have a VM that must run on a particular host.

      posted in Management
      M
      McHenry
    • RE: VMs on OVH with Additional IP unable to be agile

      @olivierlambert

      Thank you. I never thought of using an automation. I'll look into that.

      OVH does allow you to associate an additional IP with either a server or a vRack. We use vRack for the LAN so I think we can only associate the additional IP with the server.

      posted in Management
      M
      McHenry
    • VMs on OVH with Additional IP unable to be agile

      We have recently implemented shared storage for VMs and they are now agile, this works well.

      We have xcp-ng hosted on OVH Cloud and have "Additional IPs" for various VMs to allow external access.

      In OVH the additional IPs are associated with a server (xcp-ng host) so if an agile VM is moved to another host the additional IP no longer connects. The only solution I have found is to move the additional IP to the new host when the VM is migrated. This is possible however a more seamless solution would be better.

      Is there a better way to manage VM migration when the VM has an additional IP?

      posted in Management
      M
      McHenry
    • RE: VM association with shared storage

      @ph7

      To automatically update the hosts? I expect, to work, a host reboot would be required however how can this be automated if the host has running VMs?

      posted in Management
      M
      McHenry
    • RE: Alarms in XO

      @Danp @DustinB @ph7

      This host does not run any VMs, just used for CR

      I've increased the dom0 ram to 4GB with no more alarms.

      14d145a4-ed5f-40c9-9d97-f9fa5da99023-image.png

      posted in Management
      M
      McHenry
    • Alarms in XO

      When I check "Health" in XO everything appears fine but I do see a number of Alarms, problems is I have no idea what they mean. I do not think I have any system performance issues but am sure these should not be ignored.

      HST150 is a host for disaster recovery using CR

      a374dc23-83e6-4fdc-9421-287b5035187f-image.png

      posted in Management
      M
      McHenry
    • RE: VM association with shared storage

      @olivierlambert

      Why did I not do this sooner 🙂

      posted in Management
      M
      McHenry
    • VM association with shared storage

      I have recently changed our setup to use FreeNAS shared storage for VMs. Now I have shared storage and two hosts I can move running VMs between hosts. This makes it easy to patch & restart a host by moving the VMs off it first.

      As opposed to moving the VMs, I could schedule a maint windows and down the VMs then patch and reboot the host. In this scenario, if the host was to fail I expect nothing will be lost as the shared storage is independent. Then I can simple start the VMs on the remaining host, meaning there is no hard link between the host and VM.

      Does this sound correct?

      posted in Management
      M
      McHenry
    • RE: Backup Issue: "timeout reached while waiting for OpaqueRef"

      @stevewest15

      I believe this issue was resolved when the health check system was changed to detect network connectivity at startup so it did not need to wait for then entire VM to boot. Needs the Xen tools to be installed. I have not had an issue since this change.

      posted in Backup
      M
      McHenry