A few days back i had an interesting chat with one of my customers. I’ll provide some background information before going in to the actual problem statement.
Background information.
The vSphere setup includes:
- Rack servers
- 2 x 8 core CPUs per ESXi host.
- 192 GB RAM per ESXi host.
- 2 x 1 port Fibre Channel (FC) host bus adapters (HBA) per ESXi host.
- Round robin is used as the multipath configuration.
- Storage system not using auto-tiering.
- Virtual machines are placed on datastores based on Service Level Agreement classification meaning all virtual machines placed on the same datastore have the same priority.
- The vSphere Enterprise Plus is used meaning we can use Storage I/O Control (SIOC) to give priority to storage among virtual machines.
- Every virtual machine has one virtual disk in this example and uses the default disk shares configuration meaning 1000 shares each.
- Datastore extents are not in use.
The Storage I/O Control feature is used to give priority to virtual machines running on the same datastore no matter if the virtual machines runs on the same ESXi host or on different ESXi hosts. SIOC kicks in based when a specified threshold, specified in either ms or percentage of maximum utilization, is reached.
Using the below virtual machine to ESXi host, virtual machine to datastore placement and SIOC configuration we can ensure that all virtual machines placed on:
- Datastore DS-SLA01 gets the same priority to the datastore when the SIOC configuration threshold is used.
- Datastore DS-SLA02 gets the same priority to the datastore when the SIOC configuration threshold is used.
Problem statement
My customer asked me how we can make sure that the SLA1 based virtual machines gets higher priority to the ESXi hosts HBAs compared to the SLA2 based virtual machines during resource contention?
If i apply the question to my scenario, only including 1 ESXI host, it would be:
- How can we ensure that the SLA 1 based virtual machines VM1 and VM2 gets more priority to the ESXi host HBAs compared to the SLA 2 based virtual machines VM3 and VM4 during resource contention.
The answer is that it is not possible since the queue throttling works on a per ESXi host per device/datastore (LUN) level and not HBA level.
If anyone got a different understanding, please provide me with the information and include VMware based references.
Solution
In many cases this is not a problem but if it is this must be taken care of during the design. Since i have not been able to find a solution that works out of the box we needed to work around the potential problem. I can think of a couple of scenarios to solve this and hopefully one of them is a match for you:
- Place the most critical virtual machines on its own vSphere cluster. Depending on the importance e.g. mission critical workload, this might be a good choice.
- Add HBAs (4 ports in total) per ESXi host and make sure the SLA1 based datastores uses 2 HBA ports and the other datastores uses the 2 other HBA ports.
- Do not separate the virtual machines on different datastores based on SLA/importance. This requires some intelligence, perhaps auto-tiering, on the storage array level to guarantee correct virtual machine performance at the storage level.
- Use the multipath policy Fixed or Most Recently Used (MRU) instead of the round robin multipathing policy. Two things before considering this option:
- Create an automatic routine to ensure all SLA1 based datastores are accessed throught HBA port 1 and that all SLA based datastores are accessed through HBA port 2.
- Make sure the selected multipath policy is supported by the storage vendor.
Feel free to add your implementation examples and thoughts.
4 pings