XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. lavamind
    L
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 3
    • Posts 13
    • Groups 0

    lavamind

    @lavamind

    14
    Reputation
    3.1k
    Profile views
    13
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    lavamind Unfollow Follow

    Best posts made by lavamind

    • RE: Proof-of-concept for VSS-like quiesce agent in Linux

      Maybe we could just fork xe-guest-utilities and figure out how to add it in there? Or, if we want a distinct package, how about a more generic name like xcp-guest-agent, the rationale being that the agent's purpose can be broader than just snapshot/quiesce. Watches and communication with xenstore opens up lots of possibilities in domU.

      posted in Development
      L
      lavamind
    • Proof-of-concept for VSS-like quiesce agent in Linux

      Dear XCP-ng community,

      In my unending quest for bulletproof backups, I've stumbled upon a technique which might be of interest to XCP-ng developpers and users. In short, it's a method for freezing filesystem I/O inside a domU during a snapshot event. It could potentially be extended into a hook provider, allowing not just blocking of filesystem I/O but coordinating the execution of any tasks moments before a snapshot is created, like activating database locks, unmounting filesystems, etc.

      Requirements

      By leveraging the existing vm-snapshot-with-quiesce XAPI call in XenServer/XCP-ng, no modifications whatsoever are required in dom0. In domU, the python-based agent requires the psutil and subprocess modules, as well as a slightly-modified version of pyxs. The upstream version needs a small fix to work around a bug in XenBus. The agent uses fsfreeze, a standard tool included in util-linux, to freeze the guest filesystems. Supported filesystems are ext3, ext4, xfs, jfs and reiserfs.

      Disclaimer

      The code below is just a rough proof-of-concept. It's far from being well tested and is missing lots of stuff like better error handling. I'm sharing it as an experiment for the benefit of the community. If you decide to test this on your own systems, make sure you have good backups. I'm not responsible for any damage to your data and/or systems!

      Code

      import psutil
      import subprocess
      from pyxs import Client
      
      with Client(xen_bus_path="/dev/xen/xenbus") as c:
      
          domid = c.read(b"domid")
          vsspath = c.read(b"vss")
          dompath = c.get_domain_path(domid)
      
          print("Got domain path:", dompath.decode('ascii'))
          print("Got VSS path:", vsspath.decode('ascii'))
      
          print("Enabling quiesce feature")
          c.write(dompath + b"/control/feature-quiesce", b"1")
      
          print("Establishing watches")
          m = c.monitor()
          m.watch(dompath + b"/control/snapshot/action", b"")
          m.watch(vsspath + b"/status", b"")
      
          print("Waiting for snapshot signal...")
      
          for wpath, _token in m.wait():
              if wpath == dompath + b"/control/snapshot/action" and c.exists(dompath + b"/control/snapshot/action"):
                  action = c.read(dompath + b"/control/snapshot/action")
      
                  if action == b"create-snapshot":
                      print("Received snapshot-create event")
      
                      # Acknowledge VSS request
                      c.delete(dompath + b"/control/snapshot/action")
                      c.write(vsspath + b"/status", b"provider-initialized")
      
                      # Construct list of VDIs to snapshot
                      devlist = []
                      vdilist = []
                      vbdlist = c.list(dompath + b"/device/vbd")
                      for vbd in vbdlist:
                          state = c.read(dompath + b"/device/vbd/" + vbd + b"/state")
                          devtype = c.read(dompath + b"/device/vbd/" + vbd + b"/device-type")
                          if state == b"4" and devtype == b"disk":
                              backend = c.read(dompath + b"/device/vbd/" + vbd + b"/backend")
                              vdiuuid = c.read(backend + b"/sm-data/vdi-uuid")
                              devlist.append(c.read(backend + b"/dev").decode('ascii'))
                              vdilist.append(vdiuuid)
                          else:
                              continue
      
                      # Populate VDI snapshot list
                      for vdi in vdilist:
                          c.mkdir(vsspath + b"/snapshot/" + vdi)
      
                      # Freeze filesystems
                      print("Begin freezing filesystems")
                      fslist = []
                      for p in psutil.disk_partitions():
                          if p.fstype not in ['ext3', 'ext4', 'xfs', 'jfs', 'reiserfs']:
                              continue
                          for d in devlist:
                              if p.device.startswith("/dev/" + d):
                                  r = subprocess.run(["/sbin/fsfreeze", "-f", p.mountpoint])
                                  if r.returncode == 0:
                                      fslist.append(p.mountpoint)
                                      print("Successfully froze " + p.mountpoint)
                                  else:
                                      print("Error, unable to freeze " + p.mountpoint)
      
                      # Instruct snapwatchd to create VM snapshot
                      print("Sending create-snapshots signal...")
                      c.write(vsspath + b"/status", b"create-snapshots")
      
              elif wpath == vsspath + b"/status" and c.exists(vsspath + b"/status"):
                  status = c.read(vsspath + b"/status")
      
                  if status == b"snapshots-created":
                      print("Received snapshot-created event")
      
                      # Unfreeze filesystems
                      for f in fslist:
                          #r = subprocess.run(["/sbin/fsfreeze", "-u", f])
                          r = subprocess.run(["/bin/true"])
                          if r.returncode == 0:
                              print("Successfully unfroze", f)
                          else:
                              print("Error, unable to unfreeze", p.mountpoint) # this should not happen ...
      
                      c.write(vsspath + b"/status", b"create-snapshotinfo")
      
                  elif status == b"snapshotinfo-created":
                      print("Received snapshotinfo-created event")
      
                      # Create fake VSS transport id (Windows-only...)
                      c.mkdir(dompath + b"/control/snapshot/snapid")
                      c.write(dompath + b"/control/snapshot/snapid/0", b"0")
                      c.write(dompath + b"/control/snapshot/snapid/1", b"1")
      
                      # Record snapshot uuid
                      snapuuid = c.read(vsspath + b"/snapuuid")
                      print("Snapshot created:", snapuuid)
                      c.write(dompath + b"/control/snapshot/snapuuid", snapuuid)
      
                      # Signal snapshot creation
                      print("Sending snapshot-created signal...")
                      c.write(dompath + b"/control/snapshot/status", b"snapshot-created")
      
                      # Cleanup vsspath
                      print("Cleaning up")
                      c.delete(vsspath + b"/snapshot")
                      c.delete(vsspath + b"/snaptype")
                      c.delete(vsspath + b"/snapinfo")
                      c.delete(vsspath + b"/snapuuid")
                      c.delete(vsspath + b"/status")
      
                  elif status == b"snapshots-failed":
                      print("Received snapshots-failed event")
      
                      # Unfreeze filesystems
                      for f in fslist:
                          r = subprocess.run(["/sbin/fsfreeze", "-u", f])
                          if r.returncode == 0:
                              print("Successfully unfroze", f)
                          else:
                              print("Error, unable to unfreeze", p.mountpoint) # this should not happen ...
      
                      # Signal snapshot error
                      print("Sending snapshot-error event...")
                      c.write(dompath + b"/control/snapshot/status", b"snapshot-error")
      

      Resources

      The key resources that allowed me to develop this was the xenstore command, available in both domU and dom0, in particular xenstore ls, which is unfortunately only available in dom0. Logs in /var/log/SMlog and /var/log/xensource.log were also invaluable, as was the source code for both XAPI and the snapwatchd storage-manager (sm) component.

      Having a Windows Server guest VM on hand was also useful to understand the xenstore messaging task sequence of a quiesce snapshot using the Citrix Xen VSS provider.

      Development

      I'm curious to see if this can actually be useful to the XCP-ng community.

      If you're curious and familiar with Xen, please do test it and report back!

      posted in Development quiesce vss snapshot
      L
      lavamind
    • RE: Backup Job HTTP connection abruptly closed

      For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.

      posted in Xen Orchestra
      L
      lavamind

    Latest posts made by lavamind

    • RE: Backup Job HTTP connection abruptly closed

      For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.

      posted in Xen Orchestra
      L
      lavamind
    • RE: Backup Job HTTP connection abruptly closed

      FYI, we do our best to ensure master is not broken but we only do the complete QA process just before an XOA release

      Is that still the case?

      From https://github.com/vatesfr/xen-orchestra/issues/3784#issuecomment-447797895

      jcharaoui created this issue in vatesfr/xen-orchestra

      closed [Backup NG] Delta backup base VHDs missing after hitting retention limit #3784

      posted in Xen Orchestra
      L
      lavamind
    • RE: Backup Job HTTP connection abruptly closed

      @olivierlambert Yeah that's definately the next thing we'll try. For now we're using sources on release 5.59. If the problem persists we'll upgrade to 5.63 next week.

      Not too keen on following master, since we have issues with it in the past (including bad backups)...

      posted in Xen Orchestra
      L
      lavamind
    • RE: Backup Job HTTP connection abruptly closed

      We've been having the same problem with our Delta backups for several weeks now. The job runs every day and about 1 / 3 days, we have failures like this. It seems to affect random VMs, but one or two seem to be affected more often.

      We tried increasing the ring buffers on the physical network interfaces but it didn't help. Now we're going to try to pause GC during the backups to see if it helps.

      We looked at SMlog and daemon.log and could not find any obvious problems on the host occuring at the time of the error. If it's a problem with networking, how could we verify this?

      posted in Xen Orchestra
      L
      lavamind
    • Mandatory 2FA/OTP for login

      Hello, I'm trying to figure out if its possible to make 2FA (one-time password) mandatory for a subset of users in Xen Orchestra? Having the option is great, but some users just seem to "forget" to set it up, decresing the security of the whole platform. Thanks!

      posted in Xen Orchestra
      L
      lavamind
    • Hardened systemd unit file for xo-server

      It's generally considered risky to have long-running, network-facing daemons with root privileges. And while you can run Xen Orchestra as an unprivileged user, some functionality will be missing.

      A good compromise is to run Xen Orchestra with restricted root privileges. The service file below should considerably limit the possibility of the xo-server daemon to misbehave.

      [Unit]
      Description=Xen-Orchestra server
      After=network-online.target
      
      [Service]
      WorkingDirectory=/opt/xen-orchestra/packages/xo-server/
      ExecStart=/usr/bin/node ./bin/xo-server
      Restart=always
      SyslogIdentifier=xo-server
      NoNewPrivileges=yes
      PrivateTmp=yes
      DevicePolicy=closed
      DeviceAllow=block-loop rwm
      DeviceAllow=/dev/fuse rwm
      ProtectSystem=strict
      ReadWritePaths=/var/lib/xo-server
      ProtectHome=read-only
      ProtectControlGroups=yes
      ProtectKernelModules=yes
      ProtectKernelTunables=yes
      RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK
      RestrictRealtime=yes
      RestrictNamespaces=yes
      
      [Install]
      WantedBy=multi-user.target
      

      If you store backups locally you need to add an extra ReadWritePaths entry, and if you use the file restore feature, you need to make sure the loop kernel module is loaded at boot.

      posted in Xen Orchestra
      L
      lavamind
    • RE: Proof-of-concept for VSS-like quiesce agent in Linux

      Maybe we could just fork xe-guest-utilities and figure out how to add it in there? Or, if we want a distinct package, how about a more generic name like xcp-guest-agent, the rationale being that the agent's purpose can be broader than just snapshot/quiesce. Watches and communication with xenstore opens up lots of possibilities in domU.

      posted in Development
      L
      lavamind
    • RE: Proof-of-concept for VSS-like quiesce agent in Linux

      @olivierlambert @stormi I don't mind hosting in under XCP-ng as long as I get commit access, at least in a dev branch 🙂

      posted in Development
      L
      lavamind
    • Proof-of-concept for VSS-like quiesce agent in Linux

      Dear XCP-ng community,

      In my unending quest for bulletproof backups, I've stumbled upon a technique which might be of interest to XCP-ng developpers and users. In short, it's a method for freezing filesystem I/O inside a domU during a snapshot event. It could potentially be extended into a hook provider, allowing not just blocking of filesystem I/O but coordinating the execution of any tasks moments before a snapshot is created, like activating database locks, unmounting filesystems, etc.

      Requirements

      By leveraging the existing vm-snapshot-with-quiesce XAPI call in XenServer/XCP-ng, no modifications whatsoever are required in dom0. In domU, the python-based agent requires the psutil and subprocess modules, as well as a slightly-modified version of pyxs. The upstream version needs a small fix to work around a bug in XenBus. The agent uses fsfreeze, a standard tool included in util-linux, to freeze the guest filesystems. Supported filesystems are ext3, ext4, xfs, jfs and reiserfs.

      Disclaimer

      The code below is just a rough proof-of-concept. It's far from being well tested and is missing lots of stuff like better error handling. I'm sharing it as an experiment for the benefit of the community. If you decide to test this on your own systems, make sure you have good backups. I'm not responsible for any damage to your data and/or systems!

      Code

      import psutil
      import subprocess
      from pyxs import Client
      
      with Client(xen_bus_path="/dev/xen/xenbus") as c:
      
          domid = c.read(b"domid")
          vsspath = c.read(b"vss")
          dompath = c.get_domain_path(domid)
      
          print("Got domain path:", dompath.decode('ascii'))
          print("Got VSS path:", vsspath.decode('ascii'))
      
          print("Enabling quiesce feature")
          c.write(dompath + b"/control/feature-quiesce", b"1")
      
          print("Establishing watches")
          m = c.monitor()
          m.watch(dompath + b"/control/snapshot/action", b"")
          m.watch(vsspath + b"/status", b"")
      
          print("Waiting for snapshot signal...")
      
          for wpath, _token in m.wait():
              if wpath == dompath + b"/control/snapshot/action" and c.exists(dompath + b"/control/snapshot/action"):
                  action = c.read(dompath + b"/control/snapshot/action")
      
                  if action == b"create-snapshot":
                      print("Received snapshot-create event")
      
                      # Acknowledge VSS request
                      c.delete(dompath + b"/control/snapshot/action")
                      c.write(vsspath + b"/status", b"provider-initialized")
      
                      # Construct list of VDIs to snapshot
                      devlist = []
                      vdilist = []
                      vbdlist = c.list(dompath + b"/device/vbd")
                      for vbd in vbdlist:
                          state = c.read(dompath + b"/device/vbd/" + vbd + b"/state")
                          devtype = c.read(dompath + b"/device/vbd/" + vbd + b"/device-type")
                          if state == b"4" and devtype == b"disk":
                              backend = c.read(dompath + b"/device/vbd/" + vbd + b"/backend")
                              vdiuuid = c.read(backend + b"/sm-data/vdi-uuid")
                              devlist.append(c.read(backend + b"/dev").decode('ascii'))
                              vdilist.append(vdiuuid)
                          else:
                              continue
      
                      # Populate VDI snapshot list
                      for vdi in vdilist:
                          c.mkdir(vsspath + b"/snapshot/" + vdi)
      
                      # Freeze filesystems
                      print("Begin freezing filesystems")
                      fslist = []
                      for p in psutil.disk_partitions():
                          if p.fstype not in ['ext3', 'ext4', 'xfs', 'jfs', 'reiserfs']:
                              continue
                          for d in devlist:
                              if p.device.startswith("/dev/" + d):
                                  r = subprocess.run(["/sbin/fsfreeze", "-f", p.mountpoint])
                                  if r.returncode == 0:
                                      fslist.append(p.mountpoint)
                                      print("Successfully froze " + p.mountpoint)
                                  else:
                                      print("Error, unable to freeze " + p.mountpoint)
      
                      # Instruct snapwatchd to create VM snapshot
                      print("Sending create-snapshots signal...")
                      c.write(vsspath + b"/status", b"create-snapshots")
      
              elif wpath == vsspath + b"/status" and c.exists(vsspath + b"/status"):
                  status = c.read(vsspath + b"/status")
      
                  if status == b"snapshots-created":
                      print("Received snapshot-created event")
      
                      # Unfreeze filesystems
                      for f in fslist:
                          #r = subprocess.run(["/sbin/fsfreeze", "-u", f])
                          r = subprocess.run(["/bin/true"])
                          if r.returncode == 0:
                              print("Successfully unfroze", f)
                          else:
                              print("Error, unable to unfreeze", p.mountpoint) # this should not happen ...
      
                      c.write(vsspath + b"/status", b"create-snapshotinfo")
      
                  elif status == b"snapshotinfo-created":
                      print("Received snapshotinfo-created event")
      
                      # Create fake VSS transport id (Windows-only...)
                      c.mkdir(dompath + b"/control/snapshot/snapid")
                      c.write(dompath + b"/control/snapshot/snapid/0", b"0")
                      c.write(dompath + b"/control/snapshot/snapid/1", b"1")
      
                      # Record snapshot uuid
                      snapuuid = c.read(vsspath + b"/snapuuid")
                      print("Snapshot created:", snapuuid)
                      c.write(dompath + b"/control/snapshot/snapuuid", snapuuid)
      
                      # Signal snapshot creation
                      print("Sending snapshot-created signal...")
                      c.write(dompath + b"/control/snapshot/status", b"snapshot-created")
      
                      # Cleanup vsspath
                      print("Cleaning up")
                      c.delete(vsspath + b"/snapshot")
                      c.delete(vsspath + b"/snaptype")
                      c.delete(vsspath + b"/snapinfo")
                      c.delete(vsspath + b"/snapuuid")
                      c.delete(vsspath + b"/status")
      
                  elif status == b"snapshots-failed":
                      print("Received snapshots-failed event")
      
                      # Unfreeze filesystems
                      for f in fslist:
                          r = subprocess.run(["/sbin/fsfreeze", "-u", f])
                          if r.returncode == 0:
                              print("Successfully unfroze", f)
                          else:
                              print("Error, unable to unfreeze", p.mountpoint) # this should not happen ...
      
                      # Signal snapshot error
                      print("Sending snapshot-error event...")
                      c.write(dompath + b"/control/snapshot/status", b"snapshot-error")
      

      Resources

      The key resources that allowed me to develop this was the xenstore command, available in both domU and dom0, in particular xenstore ls, which is unfortunately only available in dom0. Logs in /var/log/SMlog and /var/log/xensource.log were also invaluable, as was the source code for both XAPI and the snapwatchd storage-manager (sm) component.

      Having a Windows Server guest VM on hand was also useful to understand the xenstore messaging task sequence of a quiesce snapshot using the Citrix Xen VSS provider.

      Development

      I'm curious to see if this can actually be useful to the XCP-ng community.

      If you're curious and familiar with Xen, please do test it and report back!

      posted in Development quiesce vss snapshot
      L
      lavamind
    • RE: XCP-ng 8.2 updates announcements and testing

      In a pool environment, does this package need to be installed on the master only, or on all the nodes?

      posted in News
      L
      lavamind