Categories

  • All news regarding Xen and XCP-ng ecosystem

    143 Topics
    4k Posts
    A
    @rzr Updated and running on most systems now. Rolling pool reboot worked correctly (need to disable backups first). VMs and CR running normally. After a large S3 backup session (that succeeded), GC on the master got stuck in a loop and failed (lock issue). A reboot of the pool master was required to resolve the problem then GC coalesced the VHDs without additional action. Other pools did not have the same problem. I have not seen that issue before, but everything seems fine now so I can't blame the update.
  • Everything related to the virtualization platform

    1k Topics
    15k Posts
    julienXOvatesJ
    Hi @mdm, Here is the response from a SME 1- XO get the VM metadata through soap api and create the vm /network XO use the nbdkit + nbdkit-vddk package , connecting to each disk XO connect through NBD to nbdkit XO list the allocated blocks XO create the disk XO read the block and import the data into the disk (though a vhd or qcow2 stream depending on the size) 2- though XO. There is a lot of disk transform occuring here 3- not anymore, it was too brittle, especially on when the disk are locked or not. Also it was not scaling too well 4- see 3- + vddk transfer is overwhelmly faster that previous access through http, is compatible with VSAN and don't lock the disks. (+ also see https://xen-orchestra.com/blog/xen-orchestra-5-110/ section "VMware to Vates (V2V)" where we made key improvements)
  • 3k Topics
    28k Posts
    A
    Hello all, Here we (the XCP-ng team) share a script that will help workaround the snapshot-of issue, also known as VDI not showing in XO. But first, a bit of context. The issue This issue has existed for quite some time, first report was late August / September 2025. Sadly, it gets more and more visible over time. The issue is that there are 2 database fields relating to snapshot that are redundant, and the issue is that sometimes they contradict each other. The database field that gets changed (snapshot_of), can be done so both by xapi, or SM (smapi). To fix this design issue, a new revert operation is needed but designing a new API takes time and a lot of care. On the XAPI side, this change is implemented. The smapi part is work in progress and high on the priority list. The script This script only treats the symptoms of the issue and not the cause. This script allows users to fix XAPI database records with incongruent snapshot metadata. Changing XAPI database is always a risky operation and should be considered carefully To operate, the script temporarily disables HA and stops XAPI to apply the changes to the database. This means that the pool is not running operations like handling backups, migrating, starting or stopping VMs, and HA is disabled during the operation. The script needs to be run on the master host of the affected pool. In the unlikely case the script corrupts the database, the script creates a backup before modifying the database and provides an option to restore the database from this backup. Usage Download the snapshot-fixer.py script here. Check that content of the file is correct: # sha256sum snapshot-fixer.py 3aad01563f813571364357364f803c61cc59049fee6e8b24cfa964a03444f609 snapshot-fixer.py snapshot-fixer.py usage: # python3 snapshot-fixer.py -h usage: snapshot-fixer.py [-h] {dry-run,restore-backup,rewrite} ... Rewrite erroneous VM snapshot links. positional arguments: {dry-run,restore-backup,rewrite} dry-run Prints invalid values in the database, does not modify any file restore-backup Find a previous backup and restore it rewrite Backup xapi's database, and rewrite it. optional arguments: -h, --help show this help message and exit First run the script in dry-run mode. This will not disable the HA nor the XAPI, only print the invalid values detected that could be corrected by the script: # python3 snapshot-fixer.py dry-run Check the output and confirm that the VDIs are the ones you need to fix before continuing with the next steps. The rewrite operation could take 10 seconds or more because disabling HA and XAPI are slow operations. Be patient and don't stop the script. Remember that this script should run on the master host of the affected pool. # python3 snapshot-fixer.py rewrite INFO:root:Check HA... INFO:root:Shutting down xapi... INFO:root:Regenerating database... INFO:root:Writing database to /var/lib/xcp/state.db INFO:root:Starting up xapi... The last command available is restore-backup. As rewrite this command will disable HA and XAPI to operate and that could take seconds, be patient. # python3 snapshot-fixer.py restore-backup The next section will list error messages that could be displayed if something goes wrong. Troubleshooting Starting xapi timed out. Please make sure it's working by running `systemctl status xapi` After a rewrite or restore-backup operation, the script will wait 15 seconds to re-enable the XAPI. If the timeout is reached, you will see this message asking you to re-enable it yourself. HA was disabled and needs to be enabled back again manually. Please re-enable it by running `xe pool-ha-enable` After a rewrite or restore-backup operation, the script will wait 30 seconds to re-enable the HA (it it was enabled before). If the timeout happens, you will see this message with the command to re-enable the HA by hand. File '/var/lib/xcp/state.db.snapshot_of.backup' already exists, aborting. If you are sure you want to run the command again, please delete the file This message means that the command rewrite was already run and a backup was created during the operation. Anything else? Don't hesitate to share your feedback on this thread to get help on this issue.
  • Our hyperconverged storage solution

    47 Topics
    745 Posts
    J
    @Mathieu-L linstor n l was included in my original post. All nodes were updated to May 2026 Security and Maintenance Updates for XCP-ng 8.3 LTS, all nodes were restarted. May 2026 Updates #2 for XCP-ng 8.3 LTS was released, and a couple days later I installed on all hosts. No host restarted. When xen04 was restarted, that is when this issue happened. I had used systemctl restart linstor-controller here (https://xcp-ng.org/forum/post/105309) to restart the controller.
  • 35 Topics
    113 Posts
    olivierlambertO
    Ah excellente nouvelle Je passe le sujet en résolu !