XCP-ng
    • Categories
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Register
    • Login

    SR.Scan performance withing XOSTOR

    Scheduled Pinned Locked Moved XOSTOR
    8 Posts 4 Posters 154 Views 4 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • D Offline
      denis.grilli
      last edited by

      Hello, we have a pool of 3 hosts with XOSTOR and we are having various performance issues related to tasks like start/stop VMs, VM migrations or similar which according to Vates support are down to the task waiting for Sr.scan to be completed.

      I can see sr.scan running all the time and some of them takes around 4 / 5 minutes...

      the XOSTOR storage is around 27Tb and store about 107 VMs base disk + same amount of snapshots + what I think is the current changes for each of them for a total of 348 disks.

      What your environments look likes? how long your sr.scans takes and most importantly do you have problems at start/stopping and migrating VMs?

      For instance when we migrate VMs, the VMs freeze for about 3 / 4 minutes just before re-attaching the VDI on the target host and according to support is due to the sr.scan.

      P 1 Reply Last reply Reply Quote 0
      • P Online
        Pilow @denis.grilli
        last edited by Pilow

        @denis.grilli My experience with XOSTOR was very similar (3 hosts, 18Tb) on XCP 8.2 at the time

        but less VMs... ~20

        we had catastrophic failures, with tap-disk locking the VDIs, hard to start/stop VMs (10+ mins to start a VM ? and same VM on local RAID5 storage, or even NFS storage, 5 seconds max)

        more problems with large VDIs (1.5Tb) on XOSTOR, and backups where painful to obtain

        after many ins and outs with support, we decided to get our VMs off XOSTOR for the time being, back to local RAID5 with replicas inter-hosts. No VM mobility, but redudancy anyway.

        I think that the way XOSTOR is implemented is not really the source problem.
        the combo DRBD+smapiv1 is ok for small amount of small VMs. at scale is another story.

        we still have to upgrade to 8.3 and give it another try.

        the more we exfiltrated the VDIs of XOSTOR, the more 'normal' and expected the behavior was.

        1 Reply Last reply Reply Quote 0
        • D Offline
          denis.grilli
          last edited by

          @pilow : thanks for let me know your experience.

          I was afraid someone would say that.

          Unfortunately for us going out of XOSTOR is not so simple because we really need VM mobility to allow for host maintenance which otherwise we would not be able to perform and budgeting for a redundant external storage is not an option either.

          The annoying thing is that before start this journey with Vates we did engage with them and made them aware of our environment ( it is a migration from vmware) and no one has ever mention that could have been a problem to store so many VMs into xostor so I am really hope that support can shed some light and provide some fix to the situation.

          Overall is not a bad experience but waiting for 3 / 4 minutes for a VM to start when you are in hurry is not really great.

          1 Reply Last reply Reply Quote 0
          • D Offline
            denis.grilli
            last edited by

            From another post I gathered that there is an auto-scan feature that run by default every 30 seconds which seems to cause a lot issue when the storage contains a lot of disks or you have a lot of storage.

            It is not completely clear if this auto-scan feature is actually necessary and to some customers Vates helpdesk has suggested to reduce the frequency of the scan from 30 seconds to 2 minutes and that seems to have improved the overall experience.

            The command would be this:

            xe host-param-set other-config:auto-scan-interval=120 uuid=<Host UUID>

            where UUID is the pool master UUID.

            Of course I won't run that in production without Vates support re-assurance that doing so it won't have a negative impact but I think is worth mentioning this.

            In my situation I can see how frequents scan would cause delay on the other tasks considering that effectively my system is always under scanning with probably the scan task itself being affected by it.

            1 Reply Last reply Reply Quote 0
            • I Online
              idar21
              last edited by

              Very similar situation on our end. Lots of back and forth with support. Basically, given up now. The latest patch seemed to have helped a bit but overall not a production ready product yet.

              ForzaF 1 Reply Last reply Reply Quote 0
              • ForzaF Online
                Forza @idar21
                last edited by

                What is the purpose of the SR scans, and why do they indeed have to run ao frequently?

                1 Reply Last reply Reply Quote 0
                • D Offline
                  denis.grilli
                  last edited by

                  After a back and forth with the Vates support they have confirmed my issue which is connected to the sr.scan but not the auto-scan. The auto-scan is just part of normal routine scan and don't necessary affects migrations / start and stop of VMs or anything else.

                  Probably reducing the frequency might help a little but the issue is mainly the scan procedure itself which is going to run multiple times during a migration or other tasks like start/stop VMs and the time taken for the scan itself it is causing the issue.

                  I have confirmation from the support team that they have already started rewrite the scan procedure so the issue should be fixed (I hope) in the next xcp-ng upgrade.

                  Hope this helps.

                  P 1 Reply Last reply Reply Quote 0
                  • P Online
                    Pilow @denis.grilli
                    last edited by

                    @denis.grilli really big news, I need to have XO STOR working 😃
                    Thanks for your problems and support correting them 😄

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post