XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login
    1. Home
    2. Maelstrom96
    3. Posts
    Offline
    • Profile
    • Following 0
    • Followers 1
    • Topics 0
    • Posts 30
    • Groups 0

    Posts

    Recent Best Controversial
    • RE: Icon appears in the XOA interface

      @olivierlambert Thanks for the suggestion! The problem is I have no idea where I would start to build it for Slackware. I'll see if I can figure it out but with my research, I'm not quite sure I'll be able to.

      posted in Management
      Maelstrom96M
      Maelstrom96
    • RE: Icon appears in the XOA interface

      @eurodrigolira Sorry to revive this topic, but do you have pointers on how to build a slackware package for xe-guest-utilities? I'm trying to add the VM guest tools to UnRAID and I'm not having much luck.

      posted in Management
      Maelstrom96M
      Maelstrom96
    • RE: Three-node Networking for XOSTOR

      @T3CCH What you might be looking for: https://xcp-ng.org/docs/networking.html#full-mesh-network

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Thanks a lot for that procedure.

      Ended up needing to do a little bit more, since for some reason, "evacuate" failed. I deleted the node and then went and just manually recreated my resources using:

      linstor resource create --auto-place +1 <resource_name>
      

      Which didn't work at first because the new node didn't have a storage-pool configured, which required this command to work (NOTE - This is only valid if your SR was setup as thin):

      linstor storage-pool create lvmthin <node_name> xcp-sr-linstor_group_thin_device linstor_group/thin_device
      

      Also, worth nothing that before actually re-creating the resources, you might want to manually clean up the lingering Logical Volumes that weren't cleaned up if evacuate failed.

      Find volumes with:

      lvdisplay
      

      and then delete them with:

      lvremove <LV Path>
      

      example:

      lvremove /dev/linstor_group/xcp-persistent-database_00000
      
      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Do you know of a way to update a node name in Linstor? I've tried to look in their documentation and checked through CLI commands but couldn't find a way.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a I will be testing my theory a little bit later today, but I believe it might be a hostname mismatch between the node name it expects in linstor and what it set to now on Dom0. We had the hostname of the node updated before the cluster was spinned up, but I think it still had the previous name active when the linstor SR was created.

      This means that the node name doesn't match here:
      https://github.com/xcp-ng/sm/blob/e951676098c80e6da6de4d4653f496b15f5a8cb9/drivers/linstorvolumemanager.py#L2641C21-L2641C41

      I will try to revert the hostname and see if it fixes everything.

      Edit: Just tested and reverted the hostname to the default one, which matches what's in linstor, and it works again. So seems like changing a hostname after the cluster is provisionned is a no-no.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a said in XOSTOR hyperconvergence preview:

      drbdsetup events2

      Host1:

      [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-controller
      ● linstor-controller.service - drbd-reactor controlled linstor-controller
         Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
        Drop-In: /run/systemd/system/linstor-controller.service.d
                 └─reactor.conf
         Active: active (running) since Thu 2024-05-02 13:24:32 PDT; 20h ago
       Main PID: 21340 (java)
         CGroup: /system.slice/linstor-controller.service
                 └─21340 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Controller --logs=/var/log/linstor-controller --config-directory=/etc/linstor
      [09:49 xcp-ng-labs-host01 ~]# systemctl status linstor-satellite
      ● linstor-satellite.service - LINSTOR Satellite Service
         Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/linstor-satellite.service.d
                 └─override.conf
         Active: active (running) since Wed 2024-05-01 16:04:05 PDT; 1 day 17h ago
       Main PID: 1947 (java)
         CGroup: /system.slice/linstor-satellite.service
                 ├─1947 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                 ├─2109 drbdsetup events2 all
                 └─2347 /usr/sbin/dmeventd
      [09:49 xcp-ng-labs-host01 ~]# systemctl status drbd-reactor
      ● drbd-reactor.service - DRBD-Reactor Service
         Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/drbd-reactor.service.d
                 └─override.conf
         Active: active (running) since Wed 2024-05-01 16:04:11 PDT; 1 day 17h ago
           Docs: man:drbd-reactor
                 man:drbd-reactorctl
                 man:drbd-reactor.toml
       Main PID: 1950 (drbd-reactor)
         CGroup: /system.slice/drbd-reactor.service
                 ├─1950 /usr/sbin/drbd-reactor
                 └─1976 drbdsetup events2 --full --poll
      [09:49 xcp-ng-labs-host01 ~]# mountpoint /var/lib/linstor
      /var/lib/linstor is a mountpoint
      [09:49 xcp-ng-labs-host01 ~]# drbdsetup events2
      exists resource name:xcp-persistent-database role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary
      exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.202:7000 established:yes
      exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7000 peer:ipv4:10.100.0.201:7000 established:yes
      exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary
      exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.202:7001 established:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.200:7001 peer:ipv4:10.100.0.201:7001 established:yes
      exists -
      

      Host2:

      [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-controller
      ● linstor-controller.service - drbd-reactor controlled linstor-controller
         Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
        Drop-In: /run/systemd/system/linstor-controller.service.d
                 └─reactor.conf
         Active: inactive (dead)
      [09:51 xcp-ng-labs-host02 ~]# systemctl status linstor-satellite
      ● linstor-satellite.service - LINSTOR Satellite Service
         Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/linstor-satellite.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:26:59 PDT; 23h ago
       Main PID: 1990 (java)
         CGroup: /system.slice/linstor-satellite.service
                 ├─1990 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                 ├─2128 drbdsetup events2 all
                 └─2552 /usr/sbin/dmeventd
      [09:51 xcp-ng-labs-host02 ~]# systemctl status drbd-reactor
      ● drbd-reactor.service - DRBD-Reactor Service
         Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/drbd-reactor.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:27:07 PDT; 23h ago
           Docs: man:drbd-reactor
                 man:drbd-reactorctl
                 man:drbd-reactor.toml
       Main PID: 1989 (drbd-reactor)
         CGroup: /system.slice/drbd-reactor.service
                 ├─1989 /usr/sbin/drbd-reactor
                 └─2035 drbdsetup events2 --full --poll
      [09:51 xcp-ng-labs-host02 ~]# mountpoint /var/lib/linstor
      /var/lib/linstor is not a mountpoint
      [09:51 xcp-ng-labs-host02 ~]# drbdsetup events2
      exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary
      exists connection name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.200:7000 established:yes
      exists peer-device name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7000 peer:ipv4:10.100.0.202:7000 established:yes
      exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Primary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 connection:Connected role:Secondary
      exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.200:7001 established:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:1 conn-name:xcp-ng-labs-host03 local:ipv4:10.100.0.201:7001 peer:ipv4:10.100.0.202:7001 established:yes
      exists -
      

      Host3:

      [09:51 xcp-ng-labs-host03 ~]# systemctl status linstor-controller
      ● linstor-controller.service - drbd-reactor controlled linstor-controller
         Loaded: loaded (/usr/lib/systemd/system/linstor-controller.service; disabled; vendor preset: disabled)
        Drop-In: /run/systemd/system/linstor-controller.service.d
                 └─reactor.conf
         Active: inactive (dead)
      [09:52 xcp-ng-labs-host03 ~]# systemctl status linstor-satellite
      ● linstor-satellite.service - LINSTOR Satellite Service
         Loaded: loaded (/usr/lib/systemd/system/linstor-satellite.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/linstor-satellite.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:10:16 PDT; 23h ago
       Main PID: 1937 (java)
         CGroup: /system.slice/linstor-satellite.service
                 ├─1937 /usr/lib/jvm/jre-11/bin/java -Xms32M -classpath /usr/share/linstor-server/lib/conf:/usr/share/linstor-server/lib/* com.linbit.linstor.core.Satellite --logs=/var/log/linstor-satellite --config-directory=/etc/linstor
                 ├─2151 drbdsetup events2 all
                 └─2435 /usr/sbin/dmeventd
      [09:52 xcp-ng-labs-host03 ~]# systemctl status drbd-reactor
      ● drbd-reactor.service - DRBD-Reactor Service
         Loaded: loaded (/usr/lib/systemd/system/drbd-reactor.service; enabled; vendor preset: disabled)
        Drop-In: /etc/systemd/system/drbd-reactor.service.d
                 └─override.conf
         Active: active (running) since Thu 2024-05-02 10:10:26 PDT; 23h ago
           Docs: man:drbd-reactor
                 man:drbd-reactorctl
                 man:drbd-reactor.toml
       Main PID: 1939 (drbd-reactor)
         CGroup: /system.slice/drbd-reactor.service
                 ├─1939 /usr/sbin/drbd-reactor
                 └─1981 drbdsetup events2 --full --poll
      [09:52 xcp-ng-labs-host03 ~]# mountpoint /var/lib/linstor
      /var/lib/linstor is not a mountpoint
      [09:52 xcp-ng-labs-host03 ~]# drbdsetup events2
      exists resource name:xcp-persistent-database role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 connection:Connected role:Primary
      exists connection name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 connection:Connected role:Secondary
      exists device name:xcp-persistent-database volume:0 minor:1000 backing_dev:/dev/linstor_group/xcp-persistent-database_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:0 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.200:7000 established:yes
      exists peer-device name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-persistent-database peer-node-id:2 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7000 peer:ipv4:10.100.0.201:7000 established:yes
      exists resource name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 role:Secondary suspended:no force-io-failures:no may_promote:no promotion_score:10103
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 connection:Connected role:Secondary
      exists connection name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 connection:Connected role:Primary
      exists device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 volume:0 minor:1001 backing_dev:/dev/linstor_group/xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0_00000 disk:UpToDate client:no quorum:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:2 conn-name:xcp-ng-labs-host01 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.200:7001 established:yes
      exists peer-device name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 volume:0 replication:Established peer-disk:UpToDate peer-client:no resync-suspended:no
      exists path name:xcp-volume-ace70b43-4950-49f7-9de2-cf9c358dc2b0 peer-node-id:0 conn-name:xcp-ng-labs-host02 local:ipv4:10.100.0.202:7001 peer:ipv4:10.100.0.201:7001 established:yes
      exists -
      
      

      Will be sending the debug file as a DM.

      Edit: Just as a sanity check, I tried to reboot the master instead of just restarting the toolstack, and the linstor SR seems to be working as expected again. The XOSTOR tab in XOA now populates (it just errored out before) and the SR scan now goes through.

      Edit2: Was able to move a VDI, but then, the same exact error started to happen again. No idea why.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Since XOSTOR is supposed to be stable now, I figured I would try it out with a new setup of 3 newly installed 8.2 nodes.

      I used the CLI to deploy it. It all went well, and the SR was quickly ready. I was even able to migrate a disk to the Linstor SR and boot the VM. However, after rebooting the master, it seems like the SR doesn't want to allow any disk migration, and manual Scan are failing. I've tried unmounting/remounting the SR fully, restarting the toolstack, but nothing seems to help. The disk that was on Linstor is still accessible and the VM is able to boot.

      Here is the error I'm getting:

      sr.scan
      {
        "id": "e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9"
      }
      {
        "code": "SR_BACKEND_FAILURE_47",
        "params": [
          "",
          "The SR is not available [opterr=Database is not mounted]",
          ""
        ],
        "task": {
          "uuid": "a467bd90-8d47-09cc-b8ac-afa35056ff25",
          "name_label": "Async.SR.scan",
          "name_description": "",
          "allowed_operations": [],
          "current_operations": {},
          "created": "20240502T21:40:00Z",
          "finished": "20240502T21:40:01Z",
          "status": "failure",
          "resident_on": "OpaqueRef:b3e2f390-f45f-4614-a150-1eee53f204e1",
          "progress": 1,
          "type": "<none/>",
          "result": "",
          "error_info": [
            "SR_BACKEND_FAILURE_47",
            "",
            "The SR is not available [opterr=Database is not mounted]",
            ""
          ],
          "other_config": {},
          "subtask_of": "OpaqueRef:NULL",
          "subtasks": [],
          "backtrace": "(((process xapi)(filename lib/backtrace.ml)(line 210))((process xapi)(filename ocaml/xapi/storage_access.ml)(line 32))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 35))((process xapi)(filename ocaml/xapi/message_forwarding.ml)(line 131))((process xapi)(filename lib/xapi-stdext-pervasives/pervasiveext.ml)(line 24))((process xapi)(filename ocaml/xapi/rbac.ml)(line 205))((process xapi)(filename ocaml/xapi/server_helpers.ml)(line 95)))"
        },
        "message": "SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )",
        "name": "XapiError",
        "stack": "XapiError: SR_BACKEND_FAILURE_47(, The SR is not available [opterr=Database is not mounted], )
          at Function.wrap (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_XapiError.mjs:16:12)
          at default (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/_getTaskResult.mjs:11:29)
          at Xapi._addRecordToCache (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1029:24)
          at file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1063:14
          at Array.forEach (<anonymous>)
          at Xapi._processEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1053:12)
          at Xapi._watchEvents (file:///opt/xo/xo-builds/xen-orchestra-202404270302/packages/xen-api/index.mjs:1226:14)"
      }
      

      I quickly glanced over the source code and the SM logs to see if I could identify what was going on but it doesn't seem to be a simple thing.

      Logs from SM:

      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] LinstorSR.scan for e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] Raising exception [47, The SR is not available [opterr=Database is not mounted]]
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] lock: released /var/lock/sm/e1a9bf4d-26ad-3ef6-b4a5-db98d012e0d9/sr
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242] ***** generic exception: sr_scan: EXCEPTION <class 'SR.SROSError'>, The SR is not available [opterr=Database is not mounted]
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 110, in run
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return self._run_locked(sr)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 159, in _run_locked
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     rv = self._run(sr, target)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/SRCommand.py", line 364, in _run
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return sr.scan(self.params['sr_uuid'])
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 536, in wrap
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return load(self, *args, **kwargs)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 521, in load
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return wrapped_method(self, *args, **kwargs)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 381, in wrapped_method
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     return method(self, *args, **kwargs)
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]   File "/opt/xensource/sm/LinstorSR", line 777, in scan
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]     opterr='Database is not mounted'
      May  2 13:22:02 xcp-ng-labs-host01 SM: [19242]
      
      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a said in XOSTOR hyperconvergence preview:

      @Maelstrom96 We must update our documentation for that, This will probably require executing commands manually during an upgrade.

      Any news on that? We're still pretty much blocked until that's figured out.

      Also, any news on when it will be officially released?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @Maelstrom96 said in XOSTOR hyperconvergence preview:

      Is there a procedure on how we can update our current 8.2 XCP-ng cluster to 8.3? My undertanding is that if I update the host using the ISO, it will effectively wipe all changes that were made to DOM0, including the linstor/sm-linstor packages.

      Any input on this @ronan-a?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      Is there a procedure on how we can update our current 8.2 XCP-ng cluster to 8.3? My undertanding is that if I update the host using the ISO, it will effectively wipe all changes that were made to DOM0, including the linstor/sm-linstor packages.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @gb-123 said in XOSTOR hyperconvergence preview:

      @ronan-a

      VMs would be using LUKS encryption.

      So if only VDI is replicated and hypothetically, if I loose the master node or any other node actually having the VM, then I will have to create the VM again using the replicated disk? Or would it be something like DRBD where there are actually 2 VMs running in Active/Passive mode and there is an automatic switchover ? Or would it be that One VM is running and the second gets automatically started when 1st is down ?

      Sorry for the noob questions. I just wanted to be sure of the implementation.

      The VM metadata is at the pool level, meaning that you wouldn't have to re-create the VM if the current VM host has a failure. However, memory can't/isn't replicated in the cluster, unless you're doing a live migration which would temporarily replicate the VM memory to the new host, so it can be moved.

      DRBD only replicates the VDI, or in other terms, the disk data across the active Linstor members. If the VM is stopped or is terminated because of host failure, you should be able to start it back up on another host in your pool, but by default, this will require manual intervention to start the VM, and will require you to input your encryption password since it will be a cold boot.

      If you want the VM to automatically self-start in case of failure, you can use the HA feature of XCP-ng. This wouldn't solve your issue with having to input your encryption password since, like explain earlier, the memory isn't replicated, and it would cold boot from the replicated VDI. Also, keep in mind that enabling HA adds maintenance complexity and might not be worth it.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Any news on when the new version of linstor SM will be released? We're actually hard blocked by the behavior with 4 nodes right now so we can't move forward with a lot of other tests we want to do.

      We also worked on doing a custom build of linstor-controller and linstor-satellite to allow support of Centos 7 with it's lack of sedsid -w support, and we might want to see if we could get a satisfactory PR that could be merged into linstor-server master so that people using XCP-ng can also use linstor's built-in snapshot shipping. Since K8s linstor snapshotter uses that functionality to provide volume backups, it means that using K8s with linstor on XCP-ng is not really possible unless this is fixed.

      Would that be something that you guys could help us push to linstor?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a said in XOSTOR hyperconvergence preview:

      You're lucky, I just produced a fix yesterday to fix this kind of problem on pools with more than 3 machines: https://github.com/xcp-ng/sm/commit/f916647f44223206b24cf70d099637882c53fee8

      Unfortunately, I can't release a new version right away, but I think this change can be applied to your pool.
      In the worst case I'll see if I can release a new version without all the fixes in progress...

      Thanks, that does look like it would fix the missing drbd/by-res/ volumes.

      Do you have an idea about the missing StoragePool for the new host that was added using linstor-manager.addHost? I've checked the code and it seems like it might just provision the SP on sr.create?

      Also, I'm not sure how feasible it would be for SM but having a nightly-style build process for those cases seems like it would be really useful for hotfix testing.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a

      We were able to finally add our new #4 host to the linstor SR after killing all VMs with attached VDIs. However, we've hit a new bug that we're not sure how to fix.

      Once we added the new host, we were curious to see if a live migration to it would work - It did not. It actually just resulted in the VM being in a zombie state and we had to manually destroy the domains on both the source and destination servers, and reset the power state of the VM.

      That first bug most likely was caused by our custom linstor configuration that we use where we have setup another linstor node interface on each nodes, and changed their PrefNics. It wasn't applied to the new host so the drbd connection wouldn't have worked.

      [16:51 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 node interface list ovbh-pprod-xen12
      ╭─────────────────────────────────────────────────────────────────────╮
      ┊ ovbh-pprod-xen12 ┊ NetInterface ┊ IP        ┊ Port ┊ EncryptionType ┊
      ╞═════════════════════════════════════════════════════════════════════╡
      ┊ + StltCon        ┊ default      ┊ 10.2.0.21 ┊ 3366 ┊ PLAIN          ┊
      ┊ +                ┊ stornet      ┊ 10.2.4.12 ┊      ┊                ┊
      ╰─────────────────────────────────────────────────────────────────────╯
      [16:41 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21 node list-properties ovbh-pprod-xen12
      ╭────────────────────────────────────╮
      ┊ Key             ┊ Value            ┊
      ╞════════════════════════════════════╡
      ┊ Aux/xcp-ng.node ┊ ovbh-pprod-xen12 ┊
      ┊ Aux/xcp-ng/node ┊ ovbh-pprod-xen12 ┊
      ┊ CurStltConnName ┊ default          ┊
      ┊ NodeUname       ┊ ovbh-pprod-xen12 ┊
      ┊ PrefNic         ┊ stornet          ┊
      ╰────────────────────────────────────╯
      
      

      However, once the VM was down and all the linstor configuration was updated to match the rest of the cluster, I've tried to manually start that VM on the new host but it's not working. It seems like if linstor is not called to add the volume to the host as a diskless volume, since it's not on that host.

      SMLog:

      Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] lock: opening lock file /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
      Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] lock: acquired /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
      Feb 28 17:01:31 ovbh-pprod-xen13 SM: [25108] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'True'}) returned: True
      Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
      Feb 28 17:01:33 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 0
      Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
      Feb 28 17:01:35 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 1
      Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
      Feb 28 17:01:37 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 2
      Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
      Feb 28 17:01:39 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 3
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] Got exception: No such file or directory. Retry number: 4
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] ', stderr: ''
      Feb 28 17:01:41 ovbh-pprod-xen13 SM: [25108] failed to execute locally vhd-util (sys 2)
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] call-plugin (getVHDInfo with {'devicePath': '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0', 'groupName': 'linstor_group/thin_device', 'includeParent': 'True'}) returned: {"uuid": "02ca1b5b-fef4-47d4-8736-40908385739c", "parentUuid": "1ad76dd3-14af-4636-bf5d-6822b81bfd0c", "sizeVirt": 53687091200, "sizePhys": 1700033024, "parentPath": "/dev/drbd/by-res/xcp-v$
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] VDI 02ca1b5b-fef4-47d4-8736-40908385739c loaded! (path=/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0, hidden=0)
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] lock: released /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] vdi_epoch_begin {'sr_uuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'subtask_of': 'DummyRef:|3f01e26c-0225-40e1-9683-bffe5bb69490|VDI.epoch_begin', 'vdi_ref': 'OpaqueRef:f25cd94b-c948-4c3a-a410-aa29a3749943', 'vdi_on_boot': 'persist', 'args': [], 'vdi_location': '02ca1b5b-fef4-47d4-8736-40908385739c', 'host_ref': 'OpaqueRef:3cd7e97c-4b79-473e-b925-c25f8cb393d8', 'session_ref': '$
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25108] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'False'}) returned: True
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25278] lock: opening lock file /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
      Feb 28 17:01:42 ovbh-pprod-xen13 SM: [25278] lock: acquired /var/lock/sm/a8b860a9-5246-0dd2-8b7f-4806604f219a/sr
      Feb 28 17:01:43 ovbh-pprod-xen13 SM: [25278] call-plugin on ff631fff-1947-4631-a35d-9352204f98d9 (linstor-manager:lockVdi with {'groupName': 'linstor_group/thin_device', 'srUuid': 'a8b860a9-5246-0dd2-8b7f-4806604f219a', 'vdiUuid': '02ca1b5b-fef4-47d4-8736-40908385739c', 'locked': 'True'}) returned: True
      Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] ', stderr: ''
      Feb 28 17:01:44 ovbh-pprod-xen13 SM: [25278] Got exception: No such file or directory. Retry number: 0
      Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] ['/usr/bin/vhd-util', 'query', '--debug', '-vsfp', '-n', '/dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0']
      Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] FAILED in util.pread: (rc 2) stdout: 'error opening /dev/drbd/by-res/xcp-volume-fb565237-b169-434d-b694-4707e6f51f4c/0: -2
      Feb 28 17:01:46 ovbh-pprod-xen13 SM: [25278] ', stderr: ''
      [...]
      

      The folder /dev/drbd/by-res/ doesn't exist currently.

      Also, not sure why, but it seems like when adding the new host, a new storage pool linstor_group_thin_device for it's local storage wasn't provisioned automatically, but we can see that there is a diskless storage pool that was provisionned.

      [17:26 ovbh-pprod-xen10 lib]# linstor --controllers=10.2.0.19,10.2.0.20,10.2.0.21  storage-pool list
      ╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
      ┊ StoragePool                      ┊ Node                                     ┊ Driver   ┊ PoolName                  ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName ┊
      ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
      ┊ DfltDisklessStorPool             ┊ ovbh-pprod-xen10                         ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-pprod-xen11                         ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-pprod-xen12                         ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-pprod-xen13                         ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-vprod-k8s04-worker01.floatplane.com ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-vprod-k8s04-worker02.floatplane.com ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-vprod-k8s04-worker03.floatplane.com ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-vtest-k8s02-worker01.floatplane.com ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-vtest-k8s02-worker02.floatplane.com ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ DfltDisklessStorPool             ┊ ovbh-vtest-k8s02-worker03.floatplane.com ┊ DISKLESS ┊                           ┊              ┊               ┊ False        ┊ Ok    ┊            ┊
      ┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen10                         ┊ LVM_THIN ┊ linstor_group/thin_device ┊     3.00 TiB ┊      3.49 TiB ┊ True         ┊ Ok    ┊            ┊
      ┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen11                         ┊ LVM_THIN ┊ linstor_group/thin_device ┊     3.03 TiB ┊      3.49 TiB ┊ True         ┊ Ok    ┊            ┊
      ┊ xcp-sr-linstor_group_thin_device ┊ ovbh-pprod-xen12                         ┊ LVM_THIN ┊ linstor_group/thin_device ┊     3.06 TiB ┊      3.49 TiB ┊ True         ┊ Ok    ┊            ┊
      ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
      
      
      [17:32 ovbh-pprod-xen13 ~]# lsblk
      NAME                                                                                                MAJ:MIN RM    SIZE RO TYPE  MOUNTPOINT
      nvme0n1                                                                                             259:0    0    3.5T  0 disk
      ├─nvme0n1p1                                                                                         259:1    0      1T  0 part
      │ └─md128                                                                                             9:128  0 1023.9G  0 raid1
      └─nvme0n1p2                                                                                         259:2    0    2.5T  0 part
        ├─linstor_group-thin_device_tdata                                                                 252:1    0      5T  0 lvm
        │ └─linstor_group-thin_device                                                                     252:2    0      5T  0 lvm
        └─linstor_group-thin_device_tmeta                                                                 252:0    0     80M  0 lvm
          └─linstor_group-thin_device                                                                     252:2    0      5T  0 lvm
      sdb                                                                                                   8:16   1  447.1G  0 disk
      └─md127                                                                                               9:127  0  447.1G  0 raid1
        ├─md127p5                                                                                         259:10   0      4G  0 md    /var/log
        ├─md127p3                                                                                         259:8    0  405.6G  0 md
        │ └─XSLocalEXT--ea64a6f6--9ef2--408a--039f--33b119fbd7e8-ea64a6f6--9ef2--408a--039f--33b119fbd7e8 252:3    0  405.6G  0 lvm   /run/sr-mount/ea64a6f6-9ef2-408a-039f-33b119fbd7e8
        ├─md127p1                                                                                         259:6    0     18G  0 md    /
        ├─md127p6                                                                                         259:11   0      1G  0 md    [SWAP]
        ├─md127p4                                                                                         259:9    0    512M  0 md    /boot/efi
        └─md127p2                                                                                         259:7    0     18G  0 md
      nvme1n1                                                                                             259:3    0    3.5T  0 disk
      ├─nvme1n1p2                                                                                         259:5    0    2.5T  0 part
      │ └─linstor_group-thin_device_tdata                                                                 252:1    0      5T  0 lvm
      │   └─linstor_group-thin_device                                                                     252:2    0      5T  0 lvm
      └─nvme1n1p1                                                                                         259:4    0      1T  0 part
        └─md128                                                                                             9:128  0 1023.9G  0 raid1
      sda                                                                                                   8:0    1  447.1G  0 disk
      └─md127                                                                                               9:127  0  447.1G  0 raid1
        ├─md127p5                                                                                         259:10   0      4G  0 md    /var/log
        ├─md127p3                                                                                         259:8    0  405.6G  0 md
        │ └─XSLocalEXT--ea64a6f6--9ef2--408a--039f--33b119fbd7e8-ea64a6f6--9ef2--408a--039f--33b119fbd7e8 252:3    0  405.6G  0 lvm   /run/sr-mount/ea64a6f6-9ef2-408a-039f-33b119fbd7e8
        ├─md127p1                                                                                         259:6    0     18G  0 md    /
        ├─md127p6                                                                                         259:11   0      1G  0 md    [SWAP]
        ├─md127p4                                                                                         259:9    0    512M  0 md    /boot/efi
        └─md127p2                                                                                         259:7    0     18G  0 md
      
      
      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      Not sure what we're doing wrong - Attempted to add a new host to the linstor SR and it's failing. I've run the install command with the disks we want on the host, but when running the "addHost" function, it fails.

      [13:25 ovbh-pprod-xen13 ~]# xe host-call-plugin host-uuid=6e845981-1c12-4e70-b0f7-54431959d630 plugin=linstor-manager fn=addHost args:groupName=linstor_group/thin_device
      There was a failure communicating with the plug-in.
      status: addHost
      stdout: Failure
      stderr: ['VDI_IN_USE', 'OpaqueRef:f25cd94b-c948-4c3a-a410-aa29a3749943']
      
      

      Edit : So it's not documented, but it looks like it's failing because the SR is in use? Does that mean that we can't add or remove hosts from linstor without unmounting all VDIs?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a I will copy those logs soon - Do you have a way I can provide you the logs off forum since it's a production systems?

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Is there a way to easily check if the process is managed by the daemon and not a manual start? We might have some point restarted the controller manually.

      Edit :

      ● minidrbdcluster.service - Minimalistic high-availability cluster resource manager
         Loaded: loaded (/usr/lib/systemd/system/minidrbdcluster.service; enabled; vendor preset: disabled)
         Active: active (running) since Wed 2023-01-25 15:58:01 EST; 1 weeks 0 days ago
       Main PID: 2738 (python2)
         CGroup: /system.slice/minidrbdcluster.service
                 ├─2738 python2 /opt/xensource/libexec/minidrbdcluster
                 ├─2902 /usr/sbin/dmeventd
                 └─2939 drbdsetup events2
      
      [11:58 ovbh-pprod-xen10 system]# systemctl status var-lib-linstor.service
      ● var-lib-linstor.service - Mount filesystem for the LINSTOR controller
         Loaded: loaded (/etc/systemd/system/var-lib-linstor.service; static; vendor p                                                                                                                                                                                                                                                                                                                                                           reset: disabled)
         Active: active (exited) since Wed 2023-01-25 15:58:03 EST; 1 weeks 0 days ago
        Process: 2947 ExecStart=/bin/mount -w /dev/drbd/by-res/xcp-persistent-database                                                                                                                                                                                                                                                                                                                                                           /0 /var/lib/linstor (code=exited, status=0/SUCCESS)
       Main PID: 2947 (code=exited, status=0/SUCCESS)
         CGroup: /system.slice/var-lib-linstor.service
      

      Also, what logs would you like to have?

      Edit2 : Also, I don't believe that service would've actually caught what happened, since it was mounted first using RW, but seems like DRBD had an issue while the mount was active and changed it to RO. The controller service was still healthy and active, just impacted on DB writes.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      We just hit a weird issue that we managed to fix, but wasn't really clear at first what was wrong, and might be a good idea for you guys to add some type of healthcheck / error handling to catch this and fix it.

      What happened was that for some unknown reason, our /var/lib/linstor mount (xcp-persistent-database) became read only, but everything kinda kept working-ish, but some stuff would randomly fail, like attempting to delete a ressource. Upon looking at the logs, saw this :

      Error message:                      The database is read only; SQL statement:
       UPDATE SEC_ACL_MAP SET ACCESS_TYPE = ?  WHERE OBJECT_PATH = ? AND        ROLE_NAME = ? [90097-197]
      

      We did a quick write test on the mount /var/lib/linstor and saw that it was indeed in RO mode. We also noticed that the last update time on the db file was 2 days ago.

      Unmounting the mount and remounting it had the controller start back again, but the first time, some nodes were missing from the node list, so we restarted the linstor-controller service again and everything is now up and healthy.

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96
    • RE: XOSTOR hyperconvergence preview

      @ronan-a Perfect, thanks a lot for your input 🙂

      posted in XOSTOR
      Maelstrom96M
      Maelstrom96