XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    Parent VHD Missing Errors During SMB Backup

    Scheduled Pinned Locked Moved Backup
    3 Posts 2 Posters 280 Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • planedropP Offline
      planedrop Top contributor
      last edited by

      I've been investigating this one for some time but haven't been able to find a solution, hoping someone can point me in the right direction or see if they've gotten the same thing. I will also try to replicate the issue in my lab but so far that hasn't been doable.

      In a production setup I have quite a few VMs that backup on a nightly basis to a very fast TrueNAS CORE server, the backups work well, but for some reason every once in a while I get the following errors on a VM backup and it reports as failed. It's almost always just 1 single VM, and after 3 or 4 additional backups the error will go away (despite retention being 7 and full backup interval being 30 days), also if I wipe the directory from the TrueNAS box for that VM, the next backup of it will succeed.

      • VHD Check Error
      • Parent VHD is Missing
      • Under the remote logs: VHD Check Error
      • EBUSY: resource busy or locked, unlink (VHD path)

      It's worth noting that these errors always seem to come up when the TrueNAS machine is backing up it's directory to a cloud provider, which would make sense if TrueNAS was working with the VHD that XCP-ng was trying to access, however, TrueNAS is setup to snapshot first and my understanding of that is TrueNAS ONLY touches the snapshot for the backup process, so the file shouldn't be locked. I may be wrong, but long ago I did NOT have TrueNAS set to snapshot before cloud backups and I got this same EBUSY error ALL the time, then the issue went away (mostly) when enabling "snapshot first".

      For reference, this reddit posts talks about this "snapshot first" feature: https://www.reddit.com/r/freenas/comments/gpz701/clarity_on_take_snapshot_for_cloud_sync_tasks/

      In short, it appears TrueNAS should be snapshotting the directory, then backing up that snapshot, then removing it, so that "live" data isn't effected/being written to during the backup.

      And my TrueNAS machine starts it's backups BEFORE XO does, so the snapshot shouldn't be happening at like the same time XO tries to access the directory. And the backup of this directory usually takes several hours, so the snapshot isn't being deleted while XO backs up either.

      It's entirely possible this is more of a TrueNAS issue than an XCP-ng/XO thing, but wanted to post about it.

      Anyone else seen this with large SMB VM backups?

      I'll keep trying to replicate in my lab too and report back if I can duplicate the issue.

      This isn't urgent (which is why I'm just posting and not filling out a ticket haha) since I have the same VMs backed up directly to a cloud provider, so isn't a data resilience issue.

      planedropP 1 Reply Last reply Reply Quote 0
      • planedropP Offline
        planedrop Top contributor @planedrop
        last edited by

        @planedrop Another interesting note, it seems my backup lists for this VM don't show the key backups on TrueNAS anymore, but TrueNAS definitely has the key backups.

        The VHD file that was locked or busy DOES exist on the TrueNAS directory though.

        I have tried force restarting these backups but the same error usually happens even during the times TrueNAS isn't snapshotting/backing up.

        I 1 Reply Last reply Reply Quote 0
        • I Offline
          imaginapix @planedrop
          last edited by

          @planedrop
          I usually don't use SMB for remotes - prefer for stability reasons NFS.
          But in the past on customer systems we had, from time to time, the problem that the SMB Client/Server (both had the problem, so it might be the case that they stalled each other) processes stalled for a moment, which might cause that file locks are not set or ended correctly.

          Maybe it's something like this in your case too.
          Catching that is a bit of a pain in the *** , because you need to do process memory tracking on both sides to see if and when they stall for a short moment.

          Not sure if there is a way under TrueNAS to check if a file has a lock set or not. If possible it might be already enough to remove the lock of the file (if possible via cli) to make it visible to the XCP-NG host again.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post