Let's Test the HA

nikade

@456Q Yea in failover-cluster the secondary SQL service is not started, it starts when the failover is initiated.
How do you handle your applications? For example we have multiple customers where we have the following setup:

2 VM's running Windows Server, identical CPU/RAM/Disk and they have 1 shared disk for WSFC Witness and 1 shared disk for WSFC SQL Server Data.
On this WSFC we have the following "roles" or whatever to call them:

SQL Server: The clients databases are in this SQL Server.
File server: The clients files are on this file server.
Applications and Services: The clients applications and services runs here, many using the databases hosted in the same SQL Server.

When we have maintenance on VM1 we just failover the whole WSFC and the SQL Server, File Server and Applications and Services running on that VM1 is failed over to VM2 and everything is done within 1-2 minutes.

I dont think AlwaysOn supports this kind of scenario because it does not hare any shared storage. Am I correct in my assumption?

456Q

@nikade we have dedicated VMs for each role. So sql is only sql.

You would have to check if your application has some build in HA that would work without a shared disk.

We are setup in a way where mutilple application server run behind a load balancer (haproxy) with load balancing enabled.

file services are provided by truenas which is behind a load balancer as well in an active /backup configuration.

The app and file services files are synced by syncthing.

We are coming from a configuration similar to yours. But had to change to scale it more and increase the redundancy. We also considered the shared disks as single point of failure.

Just take your time in look into each component. I'm sure you will find a way.

nikade

@456Q OK I understand, totally different from our business.

Some of our bigger clients demands total segmentation for the system that we're hosting (Its developed by our sister company) so we're building a separate infrastructure for each client with AD, App, Db, File and RDS on its own private VLAN.

It is a traditional 3 layer application with DB backend, App server and then different clients connected to the App server. To ensure redundancy we run the clients App server on the WSFC as well as other related services and the file server.
There can only be 1 App server running, else the DB will be corrupted since it has locks in the DB and other oldschool stuff.

456Q

@nikade I just looked up an old documentation from 2019. We used at that time the vSAN iSCSI Target Service to create (multiple) LUNs. We also enabled Mulitpash I/O / Microsoft iSCSI in Windows and mapped those LUNs into the two VMs. We have then successfully created a WSCF that presents the disks in the exact way you have it today. It was our first design for file services and later replaced with an active / avtive truenas.

vSAN was used for iSCSI but i dont remember that this was particular limited to VMware. I believe that you could use any iSCSI storage system for this.

I also see that linstor has a feature that would allow to create an iSCSI target. Not sure if this is available for XOSTOR yet.
https://linbit.com/blog/create-a-highly-available-iscsi-target-using-linstor-gateway/

nikade

@456Q vSAN supports native scsi par2 reservations now which is required by the traditional WSFC SQL Server Failover-Cluster so no need for using the iSCSI service.

All tho, if XOSTOR would offer the possibility to run an iSCSI service that would easily resolve our issue, I havent found any information about that so im pretty sure it doesnt support it.

Having both compute and storage on the same nodes and cluster is really really nice when it comes to continuity since you can spread the vSAN cluster over 2 sites, meaning that if one of the sites goes down the other will restart all VM's and the WSFC will just restart all the services within minutes.

In our current setup with vSAN and stretch cluster we can recover from a total site failure within 5minutes for all of our bigger customers, thats pretty hard to beat.
To avoid split-brain we have a 3rd site where the vSAN witness sits and then a fiber-ring connecting all 3 sites.

456Q

@nikade keep your eyes open. Maybe reach out to support and ask if iscsi is a feature or on the road map.

nikade

@456Q I've commented on pretty much every post i've seen about this, I think they know it is a nice feature but there are so many nice things ppl want so we'll have to be patient.
Meanwhile we have no rush, we're just refreshing our hardware and renewed vmware licensing so we'll be set for another 3 years before this topic comes up again.

456Q

@nikade sound like plan. We could not afford another renew

nikade

@456Q Hehe many didn't, but we pretty much had no choice because our customers expect a certain level of SLA and redundancy.
We looked very closely to XCP-NG + XOA since we're already using it for 2 other clusters, but since we were not able to resolve this SQL Server Failover-Cluster situation we were forced to build new vmware clusters instead.

BHellman

Disclaimer: I work for LINBIT, makers of DRBD and LINSTOR the tech behind XOSTOR.

We can do highly available iscsi targets, we even have guides on our website that take you step by step to do it. These would be outside of XCP-NG, but would serve the same purpose.

If there is any interest from Vates to integrate DRBD/HA into XCP-NG, we're always open to discussions.

nikade

@BHellman said in Let's Test the HA:

Disclaimer: I work for LINBIT, makers of DRBD and LINSTOR the tech behind XOSTOR.

We can do highly available iscsi targets, we even have guides on our website that take you step by step to do it. These would be outside of XCP-NG, but would serve the same purpose.

If there is any interest from Vates to integrate DRBD/HA into XCP-NG, we're always open to discussions.

Sounds interesting for pretty much everyone comming from VMware vSAN running SQL Server in a failover-cluster if im allowed to jump the gun