Many of you might be adverse to letting an automated process reboot your guests if it thinks there’s something wrong. And that’s reasonable. However I wanted to share a neat feature that might cause you to take a second look.
Last week we had some environmental issues that caused VM monitoring to reboot a number of guests. I was sifting through Log Insight to research what happened and I stumbled upon a log entry similar to this…
2014-12-09T00:11:15.308Z [FFCF2B70 verbose 'Policy' opID=task-internal-13-557681f2] [VmOperationsManager::VmResetOperation::ScreenshotResultCallback] The screenshot path returned is [SSD1] Desktop1/Desktop1-1.png
The screenshot path returned is [SSD1] Desktop1/Desktop1-1.png
Ok VMware, that’s cool!
Some of you might be thinking “Duh! Everybody knows that!” It might have even be part of the vSphere 5.5 Install, Configure, Manage course; there was probably a whole lab on it. But that was a while ago and I had since forgotten this gem exists. It’s still cool nonetheless!
Basically VM monitoring is a feature of your cluster’s high availability (HA). VMware tools will send a heartbeat back to the host to indicate that everything is running. If the heartbeats stop, the host will look for storage and network I/O. If there’s no I/O for 120 seconds (this is configurable), VM monitoring takes a screenshot and restarts the guest.
So, does it work? You bet!
I took a VM in my home lab and crashed it good, and about two minutes later HA took over and restarted it.
vCenter was kind enough to let me know that HA was responsible for the restart. I browsed the datastore and there it was in all its pixelated glory!
That’s handy but what happens if VMware tools craps out while the VM is still running? Ideally, nothing! Two hours after killing VMware tools on my guest, nothing happened. There was enough storage and network I/O on the idle guest to satisfy HA.
Of course this means you’ll need to be more vigilant about making sure VMware tools is running on your precious VM’s. Without them you’re only two checks away from a potential reboot. VMware did a great job planning the mechanisms of this feature, and it’s solid. But there’s always an exception. So play around with it before deciding whether it’s right for you. In some cases you might not be comfortable with the idea, but it could save you from another 2 AM call to reboot a crashed system. Best of all you can be an IT hero when you show your sweet screenshot to the server team!