Maybe we could just fork xe-guest-utilities and figure out how to add it in there? Or, if we want a distinct package, how about a more generic name like xcp-guest-agent, the rationale being that the agent's purpose can be broader than just snapshot/quiesce. Watches and communication with xenstore opens up lots of possibilities in domU.
Best posts made by lavamind
-
RE: Proof-of-concept for VSS-like quiesce agent in Linux
-
Proof-of-concept for VSS-like quiesce agent in Linux
Dear XCP-ng community,
In my unending quest for bulletproof backups, I've stumbled upon a technique which might be of interest to XCP-ng developpers and users. In short, it's a method for freezing filesystem I/O inside a domU during a snapshot event. It could potentially be extended into a hook provider, allowing not just blocking of filesystem I/O but coordinating the execution of any tasks moments before a snapshot is created, like activating database locks, unmounting filesystems, etc.
Requirements
By leveraging the existing
vm-snapshot-with-quiesce
XAPI call in XenServer/XCP-ng, no modifications whatsoever are required in dom0. In domU, the python-based agent requires thepsutil
andsubprocess
modules, as well as a slightly-modified version of pyxs. The upstream version needs a small fix to work around a bug in XenBus. The agent usesfsfreeze
, a standard tool included in util-linux, to freeze the guest filesystems. Supported filesystems are ext3, ext4, xfs, jfs and reiserfs.Disclaimer
The code below is just a rough proof-of-concept. It's far from being well tested and is missing lots of stuff like better error handling. I'm sharing it as an experiment for the benefit of the community. If you decide to test this on your own systems, make sure you have good backups. I'm not responsible for any damage to your data and/or systems!
Code
import psutil import subprocess from pyxs import Client with Client(xen_bus_path="/dev/xen/xenbus") as c: domid = c.read(b"domid") vsspath = c.read(b"vss") dompath = c.get_domain_path(domid) print("Got domain path:", dompath.decode('ascii')) print("Got VSS path:", vsspath.decode('ascii')) print("Enabling quiesce feature") c.write(dompath + b"/control/feature-quiesce", b"1") print("Establishing watches") m = c.monitor() m.watch(dompath + b"/control/snapshot/action", b"") m.watch(vsspath + b"/status", b"") print("Waiting for snapshot signal...") for wpath, _token in m.wait(): if wpath == dompath + b"/control/snapshot/action" and c.exists(dompath + b"/control/snapshot/action"): action = c.read(dompath + b"/control/snapshot/action") if action == b"create-snapshot": print("Received snapshot-create event") # Acknowledge VSS request c.delete(dompath + b"/control/snapshot/action") c.write(vsspath + b"/status", b"provider-initialized") # Construct list of VDIs to snapshot devlist = [] vdilist = [] vbdlist = c.list(dompath + b"/device/vbd") for vbd in vbdlist: state = c.read(dompath + b"/device/vbd/" + vbd + b"/state") devtype = c.read(dompath + b"/device/vbd/" + vbd + b"/device-type") if state == b"4" and devtype == b"disk": backend = c.read(dompath + b"/device/vbd/" + vbd + b"/backend") vdiuuid = c.read(backend + b"/sm-data/vdi-uuid") devlist.append(c.read(backend + b"/dev").decode('ascii')) vdilist.append(vdiuuid) else: continue # Populate VDI snapshot list for vdi in vdilist: c.mkdir(vsspath + b"/snapshot/" + vdi) # Freeze filesystems print("Begin freezing filesystems") fslist = [] for p in psutil.disk_partitions(): if p.fstype not in ['ext3', 'ext4', 'xfs', 'jfs', 'reiserfs']: continue for d in devlist: if p.device.startswith("/dev/" + d): r = subprocess.run(["/sbin/fsfreeze", "-f", p.mountpoint]) if r.returncode == 0: fslist.append(p.mountpoint) print("Successfully froze " + p.mountpoint) else: print("Error, unable to freeze " + p.mountpoint) # Instruct snapwatchd to create VM snapshot print("Sending create-snapshots signal...") c.write(vsspath + b"/status", b"create-snapshots") elif wpath == vsspath + b"/status" and c.exists(vsspath + b"/status"): status = c.read(vsspath + b"/status") if status == b"snapshots-created": print("Received snapshot-created event") # Unfreeze filesystems for f in fslist: #r = subprocess.run(["/sbin/fsfreeze", "-u", f]) r = subprocess.run(["/bin/true"]) if r.returncode == 0: print("Successfully unfroze", f) else: print("Error, unable to unfreeze", p.mountpoint) # this should not happen ... c.write(vsspath + b"/status", b"create-snapshotinfo") elif status == b"snapshotinfo-created": print("Received snapshotinfo-created event") # Create fake VSS transport id (Windows-only...) c.mkdir(dompath + b"/control/snapshot/snapid") c.write(dompath + b"/control/snapshot/snapid/0", b"0") c.write(dompath + b"/control/snapshot/snapid/1", b"1") # Record snapshot uuid snapuuid = c.read(vsspath + b"/snapuuid") print("Snapshot created:", snapuuid) c.write(dompath + b"/control/snapshot/snapuuid", snapuuid) # Signal snapshot creation print("Sending snapshot-created signal...") c.write(dompath + b"/control/snapshot/status", b"snapshot-created") # Cleanup vsspath print("Cleaning up") c.delete(vsspath + b"/snapshot") c.delete(vsspath + b"/snaptype") c.delete(vsspath + b"/snapinfo") c.delete(vsspath + b"/snapuuid") c.delete(vsspath + b"/status") elif status == b"snapshots-failed": print("Received snapshots-failed event") # Unfreeze filesystems for f in fslist: r = subprocess.run(["/sbin/fsfreeze", "-u", f]) if r.returncode == 0: print("Successfully unfroze", f) else: print("Error, unable to unfreeze", p.mountpoint) # this should not happen ... # Signal snapshot error print("Sending snapshot-error event...") c.write(dompath + b"/control/snapshot/status", b"snapshot-error")
Resources
The key resources that allowed me to develop this was the
xenstore
command, available in both domU and dom0, in particularxenstore ls
, which is unfortunately only available in dom0. Logs in/var/log/SMlog
and/var/log/xensource.log
were also invaluable, as was the source code for both XAPI and the snapwatchd storage-manager (sm) component.Having a Windows Server guest VM on hand was also useful to understand the xenstore messaging task sequence of a quiesce snapshot using the Citrix Xen VSS provider.
Development
I'm curious to see if this can actually be useful to the XCP-ng community.
If you're curious and familiar with Xen, please do test it and report back!
-
RE: Backup Job HTTP connection abruptly closed
For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.
Latest posts made by lavamind
-
RE: Backup Job HTTP connection abruptly closed
For the record, since upgrading to 5.63 the issue hasn't re-occurred at all.
-
RE: Backup Job HTTP connection abruptly closed
FYI, we do our best to ensure master is not broken but we only do the complete QA process just before an XOA release
Is that still the case?
From https://github.com/vatesfr/xen-orchestra/issues/3784#issuecomment-447797895
-
RE: Backup Job HTTP connection abruptly closed
@olivierlambert Yeah that's definately the next thing we'll try. For now we're using sources on release
5.59
. If the problem persists we'll upgrade to5.63
next week.Not too keen on following
master
, since we have issues with it in the past (including bad backups)... -
RE: Backup Job HTTP connection abruptly closed
We've been having the same problem with our Delta backups for several weeks now. The job runs every day and about 1 / 3 days, we have failures like this. It seems to affect random VMs, but one or two seem to be affected more often.
We tried increasing the ring buffers on the physical network interfaces but it didn't help. Now we're going to try to pause GC during the backups to see if it helps.
We looked at
SMlog
anddaemon.log
and could not find any obvious problems on the host occuring at the time of the error. If it's a problem with networking, how could we verify this? -
Mandatory 2FA/OTP for login
Hello, I'm trying to figure out if its possible to make 2FA (one-time password) mandatory for a subset of users in Xen Orchestra? Having the option is great, but some users just seem to "forget" to set it up, decresing the security of the whole platform. Thanks!
-
Hardened systemd unit file for xo-server
It's generally considered risky to have long-running, network-facing daemons with root privileges. And while you can run Xen Orchestra as an unprivileged user, some functionality will be missing.
A good compromise is to run Xen Orchestra with restricted root privileges. The service file below should considerably limit the possibility of the
xo-server
daemon to misbehave.[Unit] Description=Xen-Orchestra server After=network-online.target [Service] WorkingDirectory=/opt/xen-orchestra/packages/xo-server/ ExecStart=/usr/bin/node ./bin/xo-server Restart=always SyslogIdentifier=xo-server NoNewPrivileges=yes PrivateTmp=yes DevicePolicy=closed DeviceAllow=block-loop rwm DeviceAllow=/dev/fuse rwm ProtectSystem=strict ReadWritePaths=/var/lib/xo-server ProtectHome=read-only ProtectControlGroups=yes ProtectKernelModules=yes ProtectKernelTunables=yes RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 AF_NETLINK RestrictRealtime=yes RestrictNamespaces=yes [Install] WantedBy=multi-user.target
If you store backups locally you need to add an extra
ReadWritePaths
entry, and if you use the file restore feature, you need to make sure theloop
kernel module is loaded at boot. -
RE: Proof-of-concept for VSS-like quiesce agent in Linux
Maybe we could just fork xe-guest-utilities and figure out how to add it in there? Or, if we want a distinct package, how about a more generic name like xcp-guest-agent, the rationale being that the agent's purpose can be broader than just snapshot/quiesce. Watches and communication with xenstore opens up lots of possibilities in domU.
-
RE: Proof-of-concept for VSS-like quiesce agent in Linux
@olivierlambert @stormi I don't mind hosting in under XCP-ng as long as I get commit access, at least in a dev branch
-
Proof-of-concept for VSS-like quiesce agent in Linux
Dear XCP-ng community,
In my unending quest for bulletproof backups, I've stumbled upon a technique which might be of interest to XCP-ng developpers and users. In short, it's a method for freezing filesystem I/O inside a domU during a snapshot event. It could potentially be extended into a hook provider, allowing not just blocking of filesystem I/O but coordinating the execution of any tasks moments before a snapshot is created, like activating database locks, unmounting filesystems, etc.
Requirements
By leveraging the existing
vm-snapshot-with-quiesce
XAPI call in XenServer/XCP-ng, no modifications whatsoever are required in dom0. In domU, the python-based agent requires thepsutil
andsubprocess
modules, as well as a slightly-modified version of pyxs. The upstream version needs a small fix to work around a bug in XenBus. The agent usesfsfreeze
, a standard tool included in util-linux, to freeze the guest filesystems. Supported filesystems are ext3, ext4, xfs, jfs and reiserfs.Disclaimer
The code below is just a rough proof-of-concept. It's far from being well tested and is missing lots of stuff like better error handling. I'm sharing it as an experiment for the benefit of the community. If you decide to test this on your own systems, make sure you have good backups. I'm not responsible for any damage to your data and/or systems!
Code
import psutil import subprocess from pyxs import Client with Client(xen_bus_path="/dev/xen/xenbus") as c: domid = c.read(b"domid") vsspath = c.read(b"vss") dompath = c.get_domain_path(domid) print("Got domain path:", dompath.decode('ascii')) print("Got VSS path:", vsspath.decode('ascii')) print("Enabling quiesce feature") c.write(dompath + b"/control/feature-quiesce", b"1") print("Establishing watches") m = c.monitor() m.watch(dompath + b"/control/snapshot/action", b"") m.watch(vsspath + b"/status", b"") print("Waiting for snapshot signal...") for wpath, _token in m.wait(): if wpath == dompath + b"/control/snapshot/action" and c.exists(dompath + b"/control/snapshot/action"): action = c.read(dompath + b"/control/snapshot/action") if action == b"create-snapshot": print("Received snapshot-create event") # Acknowledge VSS request c.delete(dompath + b"/control/snapshot/action") c.write(vsspath + b"/status", b"provider-initialized") # Construct list of VDIs to snapshot devlist = [] vdilist = [] vbdlist = c.list(dompath + b"/device/vbd") for vbd in vbdlist: state = c.read(dompath + b"/device/vbd/" + vbd + b"/state") devtype = c.read(dompath + b"/device/vbd/" + vbd + b"/device-type") if state == b"4" and devtype == b"disk": backend = c.read(dompath + b"/device/vbd/" + vbd + b"/backend") vdiuuid = c.read(backend + b"/sm-data/vdi-uuid") devlist.append(c.read(backend + b"/dev").decode('ascii')) vdilist.append(vdiuuid) else: continue # Populate VDI snapshot list for vdi in vdilist: c.mkdir(vsspath + b"/snapshot/" + vdi) # Freeze filesystems print("Begin freezing filesystems") fslist = [] for p in psutil.disk_partitions(): if p.fstype not in ['ext3', 'ext4', 'xfs', 'jfs', 'reiserfs']: continue for d in devlist: if p.device.startswith("/dev/" + d): r = subprocess.run(["/sbin/fsfreeze", "-f", p.mountpoint]) if r.returncode == 0: fslist.append(p.mountpoint) print("Successfully froze " + p.mountpoint) else: print("Error, unable to freeze " + p.mountpoint) # Instruct snapwatchd to create VM snapshot print("Sending create-snapshots signal...") c.write(vsspath + b"/status", b"create-snapshots") elif wpath == vsspath + b"/status" and c.exists(vsspath + b"/status"): status = c.read(vsspath + b"/status") if status == b"snapshots-created": print("Received snapshot-created event") # Unfreeze filesystems for f in fslist: #r = subprocess.run(["/sbin/fsfreeze", "-u", f]) r = subprocess.run(["/bin/true"]) if r.returncode == 0: print("Successfully unfroze", f) else: print("Error, unable to unfreeze", p.mountpoint) # this should not happen ... c.write(vsspath + b"/status", b"create-snapshotinfo") elif status == b"snapshotinfo-created": print("Received snapshotinfo-created event") # Create fake VSS transport id (Windows-only...) c.mkdir(dompath + b"/control/snapshot/snapid") c.write(dompath + b"/control/snapshot/snapid/0", b"0") c.write(dompath + b"/control/snapshot/snapid/1", b"1") # Record snapshot uuid snapuuid = c.read(vsspath + b"/snapuuid") print("Snapshot created:", snapuuid) c.write(dompath + b"/control/snapshot/snapuuid", snapuuid) # Signal snapshot creation print("Sending snapshot-created signal...") c.write(dompath + b"/control/snapshot/status", b"snapshot-created") # Cleanup vsspath print("Cleaning up") c.delete(vsspath + b"/snapshot") c.delete(vsspath + b"/snaptype") c.delete(vsspath + b"/snapinfo") c.delete(vsspath + b"/snapuuid") c.delete(vsspath + b"/status") elif status == b"snapshots-failed": print("Received snapshots-failed event") # Unfreeze filesystems for f in fslist: r = subprocess.run(["/sbin/fsfreeze", "-u", f]) if r.returncode == 0: print("Successfully unfroze", f) else: print("Error, unable to unfreeze", p.mountpoint) # this should not happen ... # Signal snapshot error print("Sending snapshot-error event...") c.write(dompath + b"/control/snapshot/status", b"snapshot-error")
Resources
The key resources that allowed me to develop this was the
xenstore
command, available in both domU and dom0, in particularxenstore ls
, which is unfortunately only available in dom0. Logs in/var/log/SMlog
and/var/log/xensource.log
were also invaluable, as was the source code for both XAPI and the snapwatchd storage-manager (sm) component.Having a Windows Server guest VM on hand was also useful to understand the xenstore messaging task sequence of a quiesce snapshot using the Citrix Xen VSS provider.
Development
I'm curious to see if this can actually be useful to the XCP-ng community.
If you're curious and familiar with Xen, please do test it and report back!
-
RE: XCP-ng 8.2 updates announcements and testing
In a pool environment, does this package need to be installed on the master only, or on all the nodes?