I want to highlight a risk in one of my previous designs and also explain what you can do to avoid it, if you get approval from the customer. The VMware vSphere versions referenced in the blog post are VMware vCenter Server 5.1.0b 947939 and VMware ESXi server 5.1.
Background
Below i have listed a few customer requirements which influenced the design a lot:
- Physical separation between traffic types according to:
- ESXi Management – All management traffic must be managed by its own physical network infrastructure.
- FCoE & virtual machine traffic – FCoE and virtual machine traffic must be managed by its own physical network infrastructure.
- Two separate physical network infrastructures were already in place:
- 1 Gbps for ESXi management traffic.
- 10 Gbps for FCoE and virtual machine traffic.
- Physical servers used as ESXi hosts were already determined including (from a network perspective)
- 2 On-board NICs connected to the 1 Gbps physical network infrastructure.
- 2 CNA connected to the 10 Gbps physical network infrastructure.
- The ESXi hosts must use FCoE (using the ESXi host CNA) to connect to the Storage array/arrays.
Based on the above requirements i needed to implement the ESXi host according to the below figure from a network perspective.
VDS (VDS-01) = vNetwork Distributed Switch
vSwitch (vSwitch-01) = vNetwork Standard Switch
The below table outlines the same thing as the figure explains.
The vSphere HA feature relies on two things before taking the action specified as the vSphere HA “Isolation response”:
- Network connectivity using the ESXi management network connection. In my case using vmnic0 and vmnic1.
- Datastore heartbeat using the ESXi storage connection, in my case using vmnic2 and vmnic3 which are the same vmnics used for virtual machine traffic.
Network connectivity is always checked before the datastore connectivity. The datastore heartbeat mechanism can be turned off, even though i have never used that option, by selecting the vSphere HA configuration “Use datstores only from the specified list”:
When using the above configuration you’ll get the following warning per ESXi host:
To disable the warning you need to use the advanced vSphere HA configuration “das.ignoreInsufficientHbDatastore=true”
Problem description
When using the described setup we can end up in a situation where the ESXi host is not able to run virtual machines without the vSphere HA feature kicking in. This happens if the ESXi CNA adapters loses its network connectivity but the on-board NICs don’t. Even though the ESXi host can not connect to the storage array it can still communicate with the other ESXi hosts meaning the datastore heartbeat functionality won’t be tested and vSphere HA will not kick in.
if you need (for whatever reason) to implement a design like this i suggest you add at least two things to the design document:
- 1 constraint describing why this is present in the design. In my case, based on customer physical network separation requirements and physical network infrastructure.
- 1 risk describing what can happen. In my case, if/when the ESXi host CNA loses network connectivity the virtual machines will not be able to deliver its application and can/will crash without vSphere HA taking the action specified in the “Isolation response” configuration.
Solution
You can fix the potential problem but it includes to run the vSphere HA communication, classified as ESXi management traffic, using the CNA in my case. This was not an option based on the customer requirements but below you’ll find the necessary steps to avoid the potential problem:
- Create a VDS port group on the VDS backed by the CNA. In my case “1900-vSphere-HA”
- Create a VMkernel Network Adapter on each ESXi pointing to the VDS port group created in the above task.
- Enable “Management Traffic” for the above created VMkernel Network Adapter. In my case “vmk1”.
- Disable “Management Traffic” for the original VMkernel Network Adapter. In my case “vmk0”.
- Optional – Use the vSphere HA advance configuration “das.useDefaultIsolationAddress = false” to make sure the default gateway specified during the ESXi host installation is not used as the vSphere HA isolation address.
- Optional – Use the vSphere HA advance configuration “das.isolationAddress0 = IP-to-use-as-the-ESXi-host-isolation-address”. In my case 192.168.20.1 which is a pingable address.
Configuration options 5 and 6 are used to change the default vSphere HA isolation address from the ESXi host default gateway to the gateway address on the new vSphere HA network.
By following the suggestion configuration steps you’ll have an ESXi host Virtual adapter configuration (per ESXi host) according to the below figure where the Management traffic is enabled for the newly created device vmk1 only:
The vSphere HA advanced configuration is presented in the below figure:
Summary
This article applies to situations when you run FCoE, iSCSI and/or NFS connections from the ESXi host to the storage array.
When using a separate physical network infrastructure for the ESXi host management and the ESXi host storage connection & virtual machine traffic you might end up in situations where vSphere HA does not kick in even though the ESXi host is not able to run virtual machines.
To solve the potential risk, take advantage of the ESXi host management traffic per virtual adapter (vmk) configuration and the advanced vSphere HA configuration options to create a highly available solution that meets the customer demands.
4 pings