XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. anthoineb
    A
    Offline
    • Profile
    • Following 0
    • Followers 0
    • Topics 0
    • Posts 8
    • Groups 3

    anthoineb

    @anthoineb

    Vates 🪐 XCP-ng Team
    32
    Reputation
    15
    Profile views
    8
    Posts
    0
    Followers
    0
    Following
    Joined
    Last Online

    anthoineb Unfollow Follow
    Storage Team Vates 🪐 XCP-ng Team

    Best posts made by anthoineb

    • RE: VDI not showing in XO 5 from Source.

      Hello all,

      Here we (the XCP-ng team) share a script that will help workaround the snapshot-of issue, also known as VDI not showing in XO. But first, a bit of context.

      The issue

      This issue has existed for quite some time, first report was late August / September 2025. Sadly, it gets more and more visible over time.

      The issue is that there are 2 database fields relating to snapshot that are redundant, and the issue is that sometimes they contradict each other.

      The database field that gets changed (snapshot_of), can be done so both by xapi, or SM (smapi). To fix this design issue, a new revert operation is needed but designing a new API takes time and a lot of care.

      On the XAPI side, this change is implemented. The smapi part is work in progress and high on the priority list.

      The script

      ℹ This script only treats the symptoms of the issue and not the cause.

      This script allows users to fix XAPI database records with incongruent snapshot metadata.

      ⚠ Changing XAPI database is always a risky operation and should be considered carefully ⚠

      To operate, the script temporarily disables HA and stops XAPI to apply the changes to the database. This means that the pool is not running operations like handling backups, migrating, starting or stopping VMs, and HA is disabled during the operation.

      The script needs to be run on the master host of the affected pool.

      In the unlikely case the script corrupts the database, the script creates a backup before modifying the database and provides an option to restore the database from this backup.

      Usage

      Download the snapshot-fixer.py script here.

      Check that content of the file is correct:

      # sha256sum snapshot-fixer.py 
      3aad01563f813571364357364f803c61cc59049fee6e8b24cfa964a03444f609  snapshot-fixer.py
      

      snapshot-fixer.py usage:

      # python3 snapshot-fixer.py -h
      usage: snapshot-fixer.py [-h] {dry-run,restore-backup,rewrite} ...
      
      Rewrite erroneous VM snapshot links.
      
      positional arguments:
        {dry-run,restore-backup,rewrite}
          dry-run             Prints invalid values in the database, does not modify
                              any file
          restore-backup      Find a previous backup and restore it
          rewrite             Backup xapi's database, and rewrite it.
      
      optional arguments:
        -h, --help            show this help message and exit
      

      First run the script in dry-run mode. This will not disable the HA nor the XAPI, only print the invalid values detected that could be corrected by the script:

      # python3 snapshot-fixer.py dry-run
      

      Check the output and confirm that the VDIs are the ones you need to fix before continuing with the next steps.

      The rewrite operation could take 10 seconds or more because disabling HA and XAPI are slow operations. Be patient and don't stop the script. Remember that this script should run on the master host of the affected pool.

      # python3 snapshot-fixer.py rewrite
      INFO:root:Check HA...
      INFO:root:Shutting down xapi...
      INFO:root:Regenerating database...
      INFO:root:Writing database to /var/lib/xcp/state.db
      INFO:root:Starting up xapi...
      

      The last command available is restore-backup. As rewrite this command will disable HA and XAPI to operate and that could take seconds, be patient.

      # python3 snapshot-fixer.py restore-backup
      

      The next section will list error messages that could be displayed if something goes wrong.

      Troubleshooting

      Starting xapi timed out. Please make sure it's working by running `systemctl status xapi`
      

      After a rewrite or restore-backup operation, the script will wait 15 seconds to re-enable the XAPI. If the timeout is reached, you will see this message asking you to re-enable it yourself.

      HA was disabled and needs to be enabled back again manually. Please re-enable it by running `xe pool-ha-enable`
      

      After a rewrite or restore-backup operation, the script will wait 30 seconds to re-enable the HA (it it was enabled before). If the timeout happens, you will see this message with the command to re-enable the HA by hand.

      File '/var/lib/xcp/state.db.snapshot_of.backup' already exists, aborting. If you are sure you want to run the command again, please delete the file
      

      This message means that the command rewrite was already run and a backup was created during the operation.

      Anything else?

      Don't hesitate to share your feedback on this thread to get help on this issue.

      posted in Management
      A
      anthoineb
    • RE: Recovery from lost node

      @acp Here is the procedure to re-insert your host in the SR.

      Make sure you have the required packages by running these commands on it:

      yum install -y xcp-ng-release-linstor
      yum install -y xcp-ng-linstor
      

      It should be the case because the node was running the services before, but better to check.

      And then restart the toolstack to detect the LINSTOR driver:

      xe-toolstack-restart
      

      Ensure you have the same configuration on each PBD of your XOSTOR SR using this command.

      xe pbd-list sr-uuid=<UUID>
      

      All device-config must be the same.

      Then, use this command with the correct <GROUP_NAME> and <HOST_UUID> to add the node to the SR:

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=addHost args:groupName=<GROUP_NAME>
      

      For a short description, this command (re)create a PBD, open DRBD/LINSTOR ports, start specific services and add the node in the LINSTOR database.

      A storage layer is not directly added to this node. You can verify the storage state like this:

      linstor sp list
      

      You shouldn't see the storage pool of your node yet.

      Run the correct command where the controller is running to add the volume group in the LINSTOR database:

      # For thin:
      linstor storage-pool create lvmthin <NODE_NAME> <SP_NAME> <VG_NAME>
      
      # For thick:
      linstor storage-pool create lvm <NODE_NAME> <SP_NAME> <VG_NAME>
      

      A new linstor sp list should show you the node now.

      posted in XOSTOR
      A
      anthoineb
    • RE: VDI not showing in XO 5 from Source.

      @andrewperry Hi, we are working on 2 things:

      1. a proper fix under development to avoid new issue where snapshot-of field is garbage.
      2. a script to fix the wrongly set snapshot-of already present on affected pools.
        Both are necessary and we hope we can release the script soon to at least workaround if the issue happen again before we release the fix.
      posted in Management
      A
      anthoineb
    • RE: VDI not showing in XO 5 from Source.

      @olivierlambert Yes, we saw this before, we are investigating.

      posted in Management
      A
      anthoineb
    • RE: XCP-ng 8.3 updates announcements and testing

      Hello @igorglock, Damien is in holiday this week but he identified the issue and a patch should be tested for the next release of SM.

      posted in News
      A
      anthoineb

    Latest posts made by anthoineb

    • RE: VDI not showing in XO 5 from Source.

      @limezest You're right, we'll fix this.

      With this typo, the elif branch will always be false and not executed, it's dead code. As it is a less common issue than the first branch of the if, the script will left behind this particular case but will still fix the majority of snapshot_of problems.
      The script can still be used safely.

      posted in Management
      A
      anthoineb
    • RE: VDI not showing in XO 5 from Source.

      Hello all,

      Here we (the XCP-ng team) share a script that will help workaround the snapshot-of issue, also known as VDI not showing in XO. But first, a bit of context.

      The issue

      This issue has existed for quite some time, first report was late August / September 2025. Sadly, it gets more and more visible over time.

      The issue is that there are 2 database fields relating to snapshot that are redundant, and the issue is that sometimes they contradict each other.

      The database field that gets changed (snapshot_of), can be done so both by xapi, or SM (smapi). To fix this design issue, a new revert operation is needed but designing a new API takes time and a lot of care.

      On the XAPI side, this change is implemented. The smapi part is work in progress and high on the priority list.

      The script

      ℹ This script only treats the symptoms of the issue and not the cause.

      This script allows users to fix XAPI database records with incongruent snapshot metadata.

      ⚠ Changing XAPI database is always a risky operation and should be considered carefully ⚠

      To operate, the script temporarily disables HA and stops XAPI to apply the changes to the database. This means that the pool is not running operations like handling backups, migrating, starting or stopping VMs, and HA is disabled during the operation.

      The script needs to be run on the master host of the affected pool.

      In the unlikely case the script corrupts the database, the script creates a backup before modifying the database and provides an option to restore the database from this backup.

      Usage

      Download the snapshot-fixer.py script here.

      Check that content of the file is correct:

      # sha256sum snapshot-fixer.py 
      3aad01563f813571364357364f803c61cc59049fee6e8b24cfa964a03444f609  snapshot-fixer.py
      

      snapshot-fixer.py usage:

      # python3 snapshot-fixer.py -h
      usage: snapshot-fixer.py [-h] {dry-run,restore-backup,rewrite} ...
      
      Rewrite erroneous VM snapshot links.
      
      positional arguments:
        {dry-run,restore-backup,rewrite}
          dry-run             Prints invalid values in the database, does not modify
                              any file
          restore-backup      Find a previous backup and restore it
          rewrite             Backup xapi's database, and rewrite it.
      
      optional arguments:
        -h, --help            show this help message and exit
      

      First run the script in dry-run mode. This will not disable the HA nor the XAPI, only print the invalid values detected that could be corrected by the script:

      # python3 snapshot-fixer.py dry-run
      

      Check the output and confirm that the VDIs are the ones you need to fix before continuing with the next steps.

      The rewrite operation could take 10 seconds or more because disabling HA and XAPI are slow operations. Be patient and don't stop the script. Remember that this script should run on the master host of the affected pool.

      # python3 snapshot-fixer.py rewrite
      INFO:root:Check HA...
      INFO:root:Shutting down xapi...
      INFO:root:Regenerating database...
      INFO:root:Writing database to /var/lib/xcp/state.db
      INFO:root:Starting up xapi...
      

      The last command available is restore-backup. As rewrite this command will disable HA and XAPI to operate and that could take seconds, be patient.

      # python3 snapshot-fixer.py restore-backup
      

      The next section will list error messages that could be displayed if something goes wrong.

      Troubleshooting

      Starting xapi timed out. Please make sure it's working by running `systemctl status xapi`
      

      After a rewrite or restore-backup operation, the script will wait 15 seconds to re-enable the XAPI. If the timeout is reached, you will see this message asking you to re-enable it yourself.

      HA was disabled and needs to be enabled back again manually. Please re-enable it by running `xe pool-ha-enable`
      

      After a rewrite or restore-backup operation, the script will wait 30 seconds to re-enable the HA (it it was enabled before). If the timeout happens, you will see this message with the command to re-enable the HA by hand.

      File '/var/lib/xcp/state.db.snapshot_of.backup' already exists, aborting. If you are sure you want to run the command again, please delete the file
      

      This message means that the command rewrite was already run and a backup was created during the operation.

      Anything else?

      Don't hesitate to share your feedback on this thread to get help on this issue.

      posted in Management
      A
      anthoineb
    • RE: XCP-ng 8.3 updates announcements and testing

      Hello @igorglock, Damien is in holiday this week but he identified the issue and a patch should be tested for the next release of SM.

      posted in News
      A
      anthoineb
    • RE: VDI not showing in XO 5 from Source.

      @andrewperry Hi, we are working on 2 things:

      1. a proper fix under development to avoid new issue where snapshot-of field is garbage.
      2. a script to fix the wrongly set snapshot-of already present on affected pools.
        Both are necessary and we hope we can release the script soon to at least workaround if the issue happen again before we release the fix.
      posted in Management
      A
      anthoineb
    • RE: VDI not showing in XO 5 from Source.

      @olivierlambert Yes, we saw this before, we are investigating.

      posted in Management
      A
      anthoineb
    • RE: Orphan VDIs in XO show health problem

      @wilsonqanda qcow2 packages are in a separate repository. You should have setup the repo, a grep -r "qcow2" /etc/yum.repos.d/ should tell you if it was setup on your host.

      posted in Xen Orchestra
      A
      anthoineb
    • RE: Orphan VDIs in XO show health problem

      @wilsonqanda, can you share the /var/log/SMlog? Do you installed qcow2 release and use some qcow2 VDIs?

      posted in Xen Orchestra
      A
      anthoineb
    • RE: Recovery from lost node

      @acp Here is the procedure to re-insert your host in the SR.

      Make sure you have the required packages by running these commands on it:

      yum install -y xcp-ng-release-linstor
      yum install -y xcp-ng-linstor
      

      It should be the case because the node was running the services before, but better to check.

      And then restart the toolstack to detect the LINSTOR driver:

      xe-toolstack-restart
      

      Ensure you have the same configuration on each PBD of your XOSTOR SR using this command.

      xe pbd-list sr-uuid=<UUID>
      

      All device-config must be the same.

      Then, use this command with the correct <GROUP_NAME> and <HOST_UUID> to add the node to the SR:

      xe host-call-plugin host-uuid=<HOST_UUID> plugin=linstor-manager fn=addHost args:groupName=<GROUP_NAME>
      

      For a short description, this command (re)create a PBD, open DRBD/LINSTOR ports, start specific services and add the node in the LINSTOR database.

      A storage layer is not directly added to this node. You can verify the storage state like this:

      linstor sp list
      

      You shouldn't see the storage pool of your node yet.

      Run the correct command where the controller is running to add the volume group in the LINSTOR database:

      # For thin:
      linstor storage-pool create lvmthin <NODE_NAME> <SP_NAME> <VG_NAME>
      
      # For thick:
      linstor storage-pool create lvm <NODE_NAME> <SP_NAME> <VG_NAME>
      

      A new linstor sp list should show you the node now.

      posted in XOSTOR
      A
      anthoineb