vCenter Server database and MSCS issue in vSphere 5.1

A couple of months ago a customer of mine changed their vCenter Server database location from a standalone MSSQL 2008 SP2 installation to a Microsoft Cluster Service (MSCS) based MSSQL 2008 R2 SP2 installation.

The intention is very good and it all comes down to increase the availability of the MSSQL server since you can do (at least) operating system, of the MSCS nodes, maintenance without taking the MSSQL service out of service. This is only partially true since the MSSQL service will be out of service during the time it takes for the MCSC service to fail over the MSSQL service from one MSCS node (VM or physical machine) to another MSCS node (VM or physical server).

My customer runs the following environment:
  • vCenter Server 5.1.0 1123961
  • ODBC File version on the vCenter Server 6.1.7600.16385 with date modified 2009-07-14
  • MSSQL 2008 R2 SP 2 10.50.4000

So guess what happens during a MSCS MSSQL service failover from node 1 to node 2?

The vCenter Server shuts down!

By taking a look in the VM, running the vCenter Server, event viewer (application log) we get the explanation:

Event ID 1000:
General message:
The following information was included with the event:
An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Error: Error[VdbODBCError] (-1) “ODBC error: (42000) – [Microsoft][SQL Server Native Client 10.0][SQL Server]SHUTDOWN is in progress.” is returned when executing SQL statement “UPDATE VPX_ENTITY WITH (ROWLOCK) SET NAME = ? , TYPE_ID = ? , PARENT_ID = ? WHERE ID = ?”

Click here to get the entire application log message and also the system log message.
I have highlighted, in red, the most critical part in the message above. So if the vCenter Server tries to connect to the MSSQL Server during a MSCS fail over it will receive a message, via the ODBC driver, where the MSSQL Server tells the vCenter Server that it is being shut down.
This is what actually happens, you stop the MSSQL Server service on one MSCS node and start the MSSQL Server service on another MSCS node.
When vCenter Server receives such a message from the MSSQL Server it will by design shutdown it self or if that is not possible the vCenter Server will crash/fail.

The thing is that MSCS is not a supported platform to run your vCenter Server database for any vCenter Server version prior to vSphere 5.5.
This statement is published in the VMware KB 1024051

“As of vCenter Server 5.5 in vSphere 5.5, VMware introduced support for using Microsoft SQL cluster service for use as a back end database. Previously, using Microsoft SQL Cluster was not supported for any version of vSphere”
The temporary solution implemented, until my customer upgrades his VMware environment to vSphere 5.5, includes:
  • Operational procedures e.g. before a MSCS fail over there must be information to vSphere Admins.
  • The vCenter Server service configuration “Restart service after” was changed from 0 minutes to 4 minutes. This will cover restarting the vCenter Server service the times when vCenter Server crashes and not being able to perform a clean shut down.
    Screen Shot 2013-11-06 at 11.40.12 
The MSSQL Server fail over from MSCS node one to MSCS node two takes a minimum of one minute and maximum of three minutes meaning the vCenter Server restart attempt carried out after the default “Restart service after” configuration of 0 minutes failed.
The change of the vCenter Server service “Restart service after” configuration has been successful during a couple of tests and also during at least one live failover.

2 pings

Comments have been disabled.