During VMware vCloud Director (vCD) vApp deployments using the latest versions of both vSphere (ESXi 5.1 and VMware vCenter Server 5.1.0b 947939) and VMware vCloud Director (5.1.1 868405) i got the below vDC error message:
The vCD cell logs located in the /opt/vmware/vcloud-director/logs directory told me e.g. “vCenter Server is unresponsive”
These errors made me start investigating the vCenter Server itself and i made a couple of interesting findings, a few of them related to the VMware software and a few of them not related to the VMware software. The findings and the actions taken for the non VMware software related problems are presented below.
- The C: of the virtual machine running the vCenter Server had 32 MB of free space. Solved by cleaned up 800 MB of temp files and increased the hard disk from 30 GB to 50 GB.
- The antivirus engine used 50% CPU, using 1 of the 2 vCPUs assigned to the vCenter Server. The temporarily solution includes disabling a few of the antivirus services and the permanent solution will be to create a better antivirus schema for the virtual machine.
- The virtual machine took a long time to reboot and e.g remote desktop was unresponsive for 7-10 minutes. The virtual machine remote desktop service was unresponsive for 5 minutes after rebooting the virtual machine with all the antivirus and vCenter Server software related services set to manual startup.
Reinstalling the VMware Tools software made the remote desktop service responsive approximately 1-1.5 minutes after virtual machine reboot.
I also found a few VMware software related problems:
- The vpxd.exe used 50% CPU, 1 of the 2 vCPUs assigned to the vCenter Server.
- The vpxd.log was rotated every other minute because of its size, approximately 50 MB large logs files were created every other minute.
There are quite a few VMware KB articles, e.g. 1034309 1006257 2007600 2034127, describing what to investigate and what action to take when the vpxd.exe runs heavily on CPU.
None of the symtoms described in the KBs applied to my case. The below message is just a section of the entire message in the vpxd.log file, see “Entire vpxd.log entry” further down in the blog post to review the entire message.
2013-04-05T04:14:46.798+02:00 [00908 error ‘Default’ opID=task-internal-2-89479eca-9e] Alert:total >= 0@ d:buildobbora-947673boravpxdrsalgodrmRebalanceInt.h:2210
I have never seen this error message before and trying to “google” the drmRebalanceInt gave me nothing. I have seen other vpxd.log messages including the “drm” word and all of them has been related to the vSphere DRS feature/function.
Continued my investigation by comparing the vCD Organization virtual datacenter (Org vDCs) configuration (allocation model configuration), number of vApps, number of virtual machines and number of vApp templates with the corresponding vCenter Server resource pools and vCenter Server virtual machines.
During the investigation i found 2 virtual machines in the vCenter Server Cluster, to which the vCD Provider Virtual Datacenter (PvDC) was pointing, that was not managed by vCD.
Another interesting thing i found was related to the Storage DRS configuration which was changed from the “No automation” level configured during the implementation to “Fully Automated”.
The actions, 3 main actions and each including 5 sub actions, were taken to solve the problem includes:
- Removed the 2 non vCD virtual machines
- Stopped the vCloud Director Service
- Shut down the vCNS Manager
- Rebooted the virtual machine running the vCenter Server
- Start the vCNS Manager
- Start the vCloud Director Service
- Changed the Storage DRS configuration to “No automation”
- Stopped the vCloud Director Service
- Shut down the vCNS Manager
- Rebooted the virtual machine running the vCenter Server
- Start the vCNS Manager
- Start the vCloud Director Service
- Removed the Datastore Cluster
- Stopped the vCloud Director Service
- Shut down the vCNS Manager
- Rebooted the virtual machine running the vCenter Server
- Start the vCNS Manager
- Start the vCloud Director Service
After each of the above described main actions we saw the following:
- The vpxd.log errors including the “drmRebalanceInt” string immediately stopped.
- The vpxd.exe process CPU utilization decreased to an acceptable level.
- I could deploy new vApps without any error.
After 2 of the 3 main actions described the error came back after a few hours but since we remove the datastore cluster we have not seen the error.
Entire vpxd.log entry
The vpxd.log file was filled up by the below message.
2013-04-05T04:14:46.798+02:00 [00908 error ‘Default’ opID=task-internal-2-89479eca-9e] Alert:total >= 0@ d:buildobbora-947673boravpxdrsalgodrmRebalanceInt.h:2210
–> Backtrace: –> backtrace[00] rip 000000018018a8ca –> backtrace[01] rip 0000000180102f28 –> backtrace[02] rip 000000018010423e –> backtrace[03] rip 000000018009a09a –> backtrace[04] rip 00000001403109b2 –> backtrace[05] rip 000000014032e595 –> backtrace[06] rip 00000001403a16d6 –> backtrace[07] rip 00000001403b087b –> backtrace[08] rip 00000001403b3f0f –> backtrace[09] rip 00000001403c3c64 –> backtrace[10] rip 00000001403671e6 –> backtrace[11] rip 000000014036817e –> backtrace[12] rip 00000001402f14c8 –> backtrace[13] rip 000000014020b513 –> backtrace[14] rip 000000013f778fff –> backtrace[15] rip 000000013f77cb74 –> backtrace[16] rip 000000013f78acd4 –> backtrace[17] rip 000000013f793655 –> backtrace[18] rip 00000001801a8ee6 –> backtrace[19] rip 00000001801abfc4 –> backtrace[20] rip 000000018019c72a –> backtrace[21] rip 0000000074252fdf –> backtrace[22] rip 0000000074253080 –> backtrace[23] rip 0000000076bc652d –> backtrace[24] rip 0000000076dfc521 –>
I wrote another blog post about vCloud Director Internal server error message which can be found here.
Thanks to Frank Denneman and Henry Persson for supporting me during the troubleshooting.
2 pings
VSM response error – Not licensed for entity | vcdx56
April 12, 2013 at 9:26 am (UTC 0) Link to this comment
[…] wrote a blog post, which can be found here about how to fix a vCloud Director “Internal server error” message when trying to […]
vCloud Director: Internal Server Error when starting a VM - viktorious.nl - Virtualization & Cloud Management
February 27, 2014 at 3:18 pm (UTC 0) Link to this comment
[…] check this article by Magnus Andersson which also deals with this error […]