A couple of weeks ago i raised a question on twitter if someone had a recommendation regarding a free vSphere monitoring tool. I received quite a lot of answers but not that many fulfilled my customer requirements:
- Free of charge.
- Must be able to monitor the virtual machine virtual hard disk utilization e.g. only 10% free space left on the device.
- Must be available for commercial purposes.
- E-mail notification must be included.
My customer runs VMware vSphere 5.1 and that is the only information i’ll include in this blog post about my customer. I’ll not e.g. discuss their monitoring solution before the new solution was implemented.
I decided to try Veeam One Free Edition 6.5 that can be downloaded here. I’ll cover the alarm definition section of the monitoring part of Veeam One in this blog post meaning the Reporting part of Veeam One is not mentioned at all.
The installation of Veeam One free edition is really simple, just follow the Veeam deployment guide that can be downloaded here. We decided to use a MSSQL 2008 database hosted on a separate system instead of using the SQL Server 2008 R2 Express included in the installation package.
There are a lot of pre-defined alarms in Veeam One but we did not leave all pre-defined alarms enabled, we disable quite a few alarms. The reason for not keeping all pre-defined alarms is that i think an alarm, no matter if it is from a VMware vSphere environment or any other environment, should require an action by the group receiving the alarm.
There might be useful alarms provided by Veeam One excluded from our monitoring configuration and the reason is that no process exists today to take care of these alarms.
Things happening within the vSphere environment that does not require an immediate action but is useful, for e.g. system administrators or system owners, can/should be included in a report process rather than in the monitoring process.
The below table presents the Veeam One Monitoring objects included in our initial monitoring configuration but does not include the customer action associated with the Veeam object alarm.
Whenever a Veeam object is added, removed or reconfigured and/or the associated customer action is changed the table must/will be updated.
Veeam Object | Name | Information | Warning | Error |
Any Object | VM Instance UUID Conflict | X –ignore after 1 | ||
Any Object | Host failure detected | 1 | ||
Any Object | Host Isolation in HA cluster | 1 | ||
Cluster | Host cluster capacity overcommitted | X –ignore after 1 | ||
Cluster | Admission control disabled | X –ignore after 1 | ||
Cluster | HA disabled for cluster | X –ignore after 1 | ||
Cluster | DRS invocation failure | X –ignore after 1 | ||
Datastore | Datastore free space | 10% | 5% | |
Host | vCenter Server lost connection to host | X – ignore after 1 | ||
Host | Host CPU Usage | 95% – 30 min | 99% – 30 min | |
Host | Host available memory | 90% – 30 min | 96% – 30 min | |
Host | Host disk SCSI aborts | 2 – 15 min | 2 – 15 min | |
Host | Host disk bus resets | 2 – 15 min | 2 – 15 min | |
Host | Host not compliant | X – ignore after 1 | ||
Host | Host short name inconsistent | X – ignore after 1 | ||
Host | Host short name IP resolve failed | X – ignore after 1 | ||
Host | DVS host configuration out of sync | X – ignore after 1 | ||
Host | Storage connection failure | X – ignore after 1 | ||
Host | Storage connection redundancy failure | X – ignore after 1 | ||
Host | Host hardware status | X | X | |
Host | ESX(i) host network uplink problems | X | ||
Host | ESX(i) host network uplink failure | X | ||
Host | vSphere Distributed Switch MTU mismatch | X – ignore after 1 | ||
Host | Network rollback detected | X – ignore after 1 | ||
Host | Teaming mismatch error | X – ignore after 1 | ||
Host | Uplink port MTU error | X – ignore after 1 | ||
Host | Uplink port VLAN error | X – ignore after 1 | ||
Host | ESX(i) host storage error | X – ignore after 1 | ||
Host | ESX(i) host storage failure | X – ignore after 1 | ||
Host | ESX(i) host storage warning | X – ignore after 1 | ||
VM | VM Consolidation needed status | x | ||
VM | VM CPU Usage | 95% – 30 min | 99% – 30 min | |
VM | VM CPU Ready | 10 – 15 min | 20 – 15 min | |
VM | VM HA reset failure | X –ignore after 1 | ||
VM | VM HA reset | X –ignore after 1 | ||
VM | Snapshot age | 96h | ||
VM | High Memory Usage | 90% – 15 min | 95% – 15 min | |
VM | Guest disk space | 5% | 2% | |
VM | VM disk SCSI connection resets | 2 – 15 min | 2 – 15 min | |
VM | VM disk SCSI connection failures | 2 – 15 min | 2 – 15 min | |
VM | VM memory swap usage | 64 MB – 30 min | 128 MB – 30 min | |
VM | High balloon memory utilization | 10 MB – 30 min | 50 MB – 30 min | |
vCenter | Insufficient user access permissions | X –ignore after 1 | ||
vCenter | Insufficient user access permissions | X –ignore after 1 | ||
vCenter | Non VI workload detected | X –ignore after 1 | ||
vCenter | Storage ATS support failure | X –ignore after 1 | ||
vCenter | vCenter storage availability error | X –ignore after 1 | ||
vCenter | vCenter storage locking error | X –ignore after 1 | ||
vCenter | vSphere cluster HA error | X –ignore after 1 | ||
vCenter | vSphere cluster HA warning | X –ignore after 1 |
8 Veeam objects triggered immediately meaning my customer solved 8 things that could have caused them issues in the near future. Over the past days we have seen a few more alarms that have solved a potential issue before it happened.
Both me and my customer are really satisfied with the value provided by Veeam One free edition for their VMware vSphere 5.1 environment.
4 pings