ESXi 5.1 U1 Purple Screen of Death

A few days ago i was contacted by a customer who was running vSphere 5.1 U1 which is about one year old. For various reasons they have not yet upgraded to a later version of vSphere 5.1 or to vSphere 5.5. However, there has been quite some time since i a have seen a purple screen of death (PSOD) but that was the reason my customer called and emailed me. Two of their ESXi hosts crashed within 20 minutes with the exact same PSOD:

Screen Shot 2014-04-04 at 21.02.04

I know it is not the best print screen but click the picture and you’ll see the PSOD message in more detail and taking a closer look it actually gives you an idea of the root cause for the problem.
Both E1000PollRxRing and E1000DevRx points to the Virtual machine (VM) E1000 and/or E1000e driver and this made me verify two things:

  • Are there any VMs using the E1000 and/or E1000e vNIC driver.
  • Are there any existing VMware KB available reporting issues related to the PSOD message. I remember there was quite a few discussions about this a few months back so i figure this is a known problem.

My first investigation showed that around 150 VMs used the E1000 and/or E1000e vNIC driver and i also found the VMware KB article 2059053 in which the following is described:

  • ESXi 5.x host fails with a purple diagnostic screen. Pay attention to the ESXi 5.x description. This actually means that ESXi 5.0, ESXi 5.1 and ESXi 5.5 are affected.
  • This is a known issue affecting ESXi 5.0, 5.1, and 5.5 hosts and virtual machines using the E1000 and E1000e virtual network adapters

The issue is resolved in the ESXi 5.1 Update 2, in ESXi 5.5 Update 1 and for ESXi 5.0 the issue is resolved in patch ESXi500-201401001.

I identified one VM, a Windows Server 2012 R2, that was running on both ESXi hosts before they PSOD and the same time utilized quite a lot resources compared to previous days. That very same VM also used the E1000e vNIC driver so we contacted the VM sysadmin and asked him if he:

  • was running any specific tasks on the VM since it utilize more resources compared to previous days.
  • could change the vNIC driver to VMXNET3.

The reason for the increased VM resource utilization was that the VM was going through performance testing and the sysadmin told us that the VM had crashed two times today about 10 minutes after the performance test was started. This correlates perfectly with the ESXi host PSODs.
The sysadmin change the vNIC driver and successfully (no ESXi host PSOD) completed the 30 minutes long performance test after the vNIC change.

To be on the safe side we also started the process to update the ESXi hosts to 5.1 Update 2 in a few days. The customer will also start changing the vNIC driver from E1000/E1000e to VMXNET3 when possible.

I guess this is just another good reason to use the paravirtualized vNIC driver, VMXNET3, instead of the E1000 and/or E1000e.

9 pings

Skip to comment form

  1. Newsletter: May 4, 2014 | Notes from MWhite

    […] and yet it isn’t enough.  So I share things out too.  But in any-case here is a good article about what happens if you are using the Intel E1000 with servers using various versions of […]

Comments have been disabled.