I have identified an issue in Log Insight 2.5 where alerts passed via email or to vROPS contain the following text in the message:
“Notification event – The worker node sending this alert was unable to contact the standalone node. You may receive duplicate notifications for this alert.”
I also confirmed that DNS resolution and reverse lookup functions are working as expected. I was also able to reproduce this issue successfully in a lab environment, with DNS working correctly.
While VMware vRealize Operations Manager makes use of a Gemfire database and vRealize Hyperic makes use of vPostgress, VMware vRealize Log Insight makes use of Cassandra. You might wonder why knowing that even matters. Well, as I’ve seen again this week, the database engine that drives each of these products essentially dictates the design and deployment of their environments and their limitations.
This week, we had a situation where our newly deployed Log Insight cluster wasn’t performing. In fact it was so bad, that it took 20 – 30 minutes to simply log into the admin interface. Yet the CPU and Memory usage counters for each of the appliances weren’t even being tickled. It was a strange issue for sure, and by 5pm on Monday 31st of August, we were in the process of logging a P1 call with VMware support.
Following on from my previous blog post where I mentioned that we’ve discovered a bug in the Hyperic 5.8.4 client (on both Windows and Linux), I think it’s only fair that I share our findings. It’s a bug that we discovered whilst deploying a very large vRealize Suite (two maximum sized global clusters of vROPS, vRLI, Hyperic and vRA/vRO).
Whilst carrying out some testing in my lab surrounding the impact of replacing SSL certificates in Hyperic, I noticed that if for whatever reason authentication between the Hyperic agent and Hyperic server fails, the Hyperic agent increases CPU utilisation of the client machine it’s running on to between 85% and 100%. At first I thought that it’s an anomaly, but I was then able to reproduce the symptoms a further 3 times in proving to VMware GSS that the issue really does exist. A long story short
It's been a long time since I've last posted any new content on here. The truth is, I've got a few blog articles drafted, but I've just not had the time to post them properly. Since March, I've been very busy on two customer projects, one of which came to an end successfully at the end of May. I'm still in the middle of the second customer project where we are deploying the VMware vRealize Suite across the Americas, EMEA and APAC regions for a global customer. For the vRealize Operations part of this project, we are really pushing the product beyond it's officially supported limits in terms of objects monitored, however as we are working with VMware on this particular deployment, we have a custom support statement that will see this huge environment supported regardless of what the office limitations of the products may be (in terms of the number of monitored objects).
Anyway, during the course of the current project, we have encountered many stumbling blocks with the vRealize Operations and Hyperic products. Hyperic in particular has a rather problematic "defect" that I am surprised has not been picked up by anyone until now. It's in the way that the default Hyperic agent configuration is used on both Windows and Linux distributions which could cause major performance problems on monitored endpoints (up to a constant 100% CPU utilisation). However, working with VMware GSS, we have now been able to raise an official bug ticket for the issue.
I'm working on two particular posts regarding Hyperic. In the first post, I cover the bug that we have found and how to work around it, or rather preempt it. The second post is focused on how to replace Hyperic 5.8.4 SSL certificates. During this post I will generate SSL certificates with OpenSSL and format them in a Java keystore that is used by the Hyperic server, with the root and subordinate root certificate authority certificates included. I'll go through the process of replacing the self signed keystore with our own custom keystore as well as performing the necessary database queries and updates required to replace the certificates properly.
I'm hoping to get at least one of these posts completed before the end of the weekend.
vSphere 6 makes managing SSL certificates a lot easier than previous releases. It ships with its own Certificate Authority, (VMCA) that issues certificates for all components on your behalf, rather than having to replace each service certificate manually, or relying on self-signed certificates. This new VMCA comes with the Platform Services Controller (PSC) that can be installed as a separate appliance, or embedded within the vCenter Server installation or Appliance.
By default, the VMCA will self-sign its own certificate to be used as a CA certificate that will sign all requests for certificates. This self-signed CA certificate can be replaced by a certificate that is signed by a 3rd party root CA or your own root CA. Any certificate signed by the VMCA, which is an intermediate CA to your root CA, can then be validated by clients with the root CA and VMCA certificates installed.
I've decided to create this dedicated page where I'll place "one line scripts". I sometimes use these one line commands to run reports on vSphere or SCVMM inventories, if I'm not permitted or able to run full length scripts in an environment.
Almost a year since I've last tried it, but is it time to convert my mindset from CentOS/RHEL 4,5,6 to 7? Initial impressions weren't good.