I have identified an issue in Log Insight 2.5 where alerts passed via email or to vROPS contain the following text in the message:
“Notification event – The worker node sending this alert was unable to contact the standalone node. You may receive duplicate notifications for this alert.”
I also confirmed that DNS resolution and reverse lookup functions are working as expected. I was also able to reproduce this issue successfully in a lab environment, with DNS working correctly.
While VMware vRealize Operations Manager makes use of a Gemfire database and vRealize Hyperic makes use of vPostgress, VMware vRealize Log Insight makes use of Cassandra. You might wonder why knowing that even matters. Well, as I’ve seen again this week, the database engine that drives each of these products essentially dictates the design and deployment of their environments and their limitations.
This week, we had a situation where our newly deployed Log Insight cluster wasn’t performing. In fact it was so bad, that it took 20 – 30 minutes to simply log into the admin interface. Yet the CPU and Memory usage counters for each of the appliances weren’t even being tickled. It was a strange issue for sure, and by 5pm on Monday 31st of August, we were in the process of logging a P1 call with VMware support.