XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. lavamind
    3. Best
    L
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 3
    • Posts 13
    • Groups 0

    Posts

    Recent Best Controversial
    • RE: Proof-of-concept for VSS-like quiesce agent in Linux

      Maybe we could just fork xe-guest-utilities and figure out how to add it in there? Or, if we want a distinct package, how about a more generic name like xcp-guest-agent, the rationale being that the agent's purpose can be broader than just snapshot/quiesce. Watches and communication with xenstore opens up lots of possibilities in domU.

      posted in Development
      L
      lavamind
    • Proof-of-concept for VSS-like quiesce agent in Linux

      Dear XCP-ng community,

      In my unending quest for bulletproof backups, I've stumbled upon a technique which might be of interest to XCP-ng developpers and users. In short, it's a method for freezing filesystem I/O inside a domU during a snapshot event. It could potentially be extended into a hook provider, allowing not just blocking of filesystem I/O but coordinating the execution of any tasks moments before a snapshot is created, like activating database locks, unmounting filesystems, etc.

      Requirements

      By leveraging the existing vm-snapshot-with-quiesce XAPI call in XenServer/XCP-ng, no modifications whatsoever are required in dom0. In domU, the python-based agent requires the psutil and subprocess modules, as well as a slightly-modified version of pyxs. The upstream version needs a small fix to work around a bug in XenBus. The agent uses fsfreeze, a standard tool included in util-linux, to freeze the guest filesystems. Supported filesystems are ext3, ext4, xfs, jfs and reiserfs.

      Disclaimer

      The code below is just a rough proof-of-concept. It's far from being well tested and is missing lots of stuff like better error handling. I'm sharing it as an experiment for the benefit of the community. If you decide to test this on your own systems, make sure you have good backups. I'm not responsible for any damage to your data and/or systems!

      Code

      import psutil
      import subprocess
      from pyxs import Client
      
      with Client(xen_bus_path="/dev/xen/xenbus") as c:
      
          domid = c.read(b"domid")
          vsspath = c.read(b"vss")
          dompath = c.get_domain_path(domid)
      
          print("Got domain path:", dompath.decode('ascii'))
          print("Got VSS path:", vsspath.decode('ascii'))
      
          print("Enabling quiesce feature")
          c.write(dompath + b"/control/feature-quiesce", b"1")
      
          print("Establishing watches")
          m = c.monitor()
          m.watch(dompath + b"/control/snapshot/action", b"")
          m.watch(vsspath + b"/status", b"")
      
          print("Waiting for snapshot signal...")
      
          for wpath, _token in m.wait():
              if wpath == dompath + b"/control/snapshot/action" and c.exists(dompath + b"/control/snapshot/action"):
                  action = c.read(dompath + b"/control/snapshot/action")
      
                  if action == b"create-snapshot":
                      print("Received snapshot-create event")
      
                      # Acknowledge VSS request
                      c.delete(dompath + b"/control/snapshot/action")
                      c.write(vsspath + b"/status", b"provider-initialized")
      
                      # Construct list of VDIs to snapshot
                      devlist = []
                      vdilist = []
                      vbdlist = c.list(dompath + b"/device/vbd")
                      for vbd in vbdlist:
                          state = c.read(dompath + b"/device/vbd/" + vbd + b"/state")
                          devtype = c.read(dompath + b"/device/vbd/" + vbd + b"/device-type")
                          if state == b"4" and devtype == b"disk":
                              backend = c.read(dompath + b"/device/vbd/" + vbd + b"/backend")
                              vdiuuid = c.read(backend + b"/sm-data/vdi-uuid")
                              devlist.append(c.read(backend + b"/dev").decode('ascii'))
                              vdilist.append(vdiuuid)
                          else:
                              continue
      
                      # Populate VDI snapshot list
                      for vdi in vdilist:
                          c.mkdir(vsspath + b"/snapshot/" + vdi)
      
                      # Freeze filesystems
                      print("Begin freezing filesystems")
                      fslist = []
                      for p in psutil.disk_partitions():
                          if p.fstype not in ['ext3', 'ext4', 'xfs', 'jfs', 'reiserfs']:
                              continue
                          for d in devlist:
                              if p.device.startswith("/dev/" + d):
                                  r = subprocess.run(["/sbin/fsfreeze", "-f", p.mountpoint])
                                  if r.returncode == 0:
                                      fslist.append(p.mountpoint)
                                      print("Successfully froze " + p.mountpoint)
                                  else:
                                      print("Error, unable to freeze " + p.mountpoint)
      
                      # Instruct snapwatchd to create VM snapshot
                      print("Sending create-snapshots signal...")
                      c.write(vsspath + b"/status", b"create-snapshots")
      
              elif wpath == vsspath + b"/status" and c.exists(vsspath + b"/status"):
                  status = c.read(vsspath + b"/status")
      
                  if status == b"snapshots-created":
                      print("Received snapshot-created event")
      
                      # Unfreeze filesystems
                      for f in fslist:
                          #r = subprocess.run(["/sbin/fsfreeze", "-u", f])
                          r = subprocess.run(["/bin/true"])
                          if r.returncode == 0:
                              print("Successfully unfroze", f)
                          else:
                              print("Error, unable to unfreeze", p.mountpoint) # this should not happen ...
      
                      c.write(vsspath + b"/status", b"create-snapshotinfo")
      
                  elif status == b"snapshotinfo-created":
                      print("Received snapshotinfo-created event")
      
                      # Create fake VSS transport id (Windows-only...)
                      c.mkdir(dompath + b"/control/snapshot/snapid")
                      c.write(dompath + b"/control/snapshot/snapid/0", b"0")
                      c.write(dompath + b"/control/snapshot/snapid/1", b"1")
      
                      # Record snapshot uuid
                      snapuuid = c.read(vsspath + b"/snapuuid")
                      print("Snapshot created:", snapuuid)
                      c.write(dompath + b"/control/snapshot/snapuuid", snapuuid)
      
                      # Signal snapshot creation
                      print("Sending snapshot-created signal...")
                      c.write(dompath + b"/control/snapshot/status", b"snapshot-created")
      
                      # Cleanup vsspath
                      print("Cleaning up")
                      c.delete(vsspath + b"/snapshot")
                      c.delete(vsspath + b"/snaptype")
                      c.delete(vsspath + b"/snapinfo")
                      c.delete(vsspath + b"/snapuuid")
                      c.delete(vsspath + b"/status")
      
                  elif status == b"snapshots-failed":
                      print("Received snapshots-failed event")
      
                      # Unfreeze filesystems
                      for f in fslist:
                          r = subprocess.run(["/sbin/fsfreeze", "-u", f])
                          if r.returncode == 0:
                              print("Successfully unfroze", f)
                          else:
                              print("Error, unable to unfreeze", p.mountpoint) # this should not happen ...
      
                      # Signal snapshot error
                      print("Sending snapshot-error event...")
                      c.write(dompath + b"/control/snapshot/status", b"snapshot-error")
      

      Resources

      The key resources that allowed me to develop this was the xenstore command, available in both domU and dom0, in particular xenstore ls, which is unfortunately only available in dom0. Logs in /var/log/SMlog and /var/log/xensource.log were also invaluable, as was the source code for both XAPI and the snapwatchd storage-manager (sm) component.

      Having a Windows Server guest VM on hand was also useful to understand the xenstore messaging task sequence of a quiesce snapshot using the Citrix Xen VSS provider.

      Development

      I'm curious to see if this can actually be useful to the XCP-ng community.

      If you're curious and familiar with Xen, please do test it and report back!

      posted in Development quiesce vss snapshot
      L
      lavamind
    • RE: Backup Job HTTP connection abruptly closed

      For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.

      posted in Xen Orchestra
      L
      lavamind