So, you've done all the hard work to change your Hyperic Server certificate (or not). Now you browse to your Hyperic server's management page via HTTPS on port 7443 and you're presented with this uninspiring message from your browser:
I've been working intensively with the VMware vRealize product suite over that past 4 months, including Hyperic. One of the things we have to do on our current project is to replace the Hyperic server certificate whenever a new Hyperic instance is introduced into the environment. This is a relatively straight forward task, but one that consists of quite a few steps. In this blog post, I've documented exactly how to go about replacing Hyperic server certificates.
I have identified an issue in Log Insight 2.5 where alerts passed via email or to vROPS contain the following text in the message:
“Notification event – The worker node sending this alert was unable to contact the standalone node. You may receive duplicate notifications for this alert.”
I also confirmed that DNS resolution and reverse lookup functions are working as expected. I was also able to reproduce this issue successfully in a lab environment, with DNS working correctly.
While VMware vRealize Operations Manager makes use of a Gemfire database and vRealize Hyperic makes use of vPostgress, VMware vRealize Log Insight makes use of Cassandra. You might wonder why knowing that even matters. Well, as I’ve seen again this week, the database engine that drives each of these products essentially dictates the design and deployment of their environments and their limitations.
This week, we had a situation where our newly deployed Log Insight cluster wasn’t performing. In fact it was so bad, that it took 20 – 30 minutes to simply log into the admin interface. Yet the CPU and Memory usage counters for each of the appliances weren’t even being tickled. It was a strange issue for sure, and by 5pm on Monday 31st of August, we were in the process of logging a P1 call with VMware support.
Following on from my previous blog post where I mentioned that we’ve discovered a bug in the Hyperic 5.8.4 client (on both Windows and Linux), I think it’s only fair that I share our findings. It’s a bug that we discovered whilst deploying a very large vRealize Suite (two maximum sized global clusters of vROPS, vRLI, Hyperic and vRA/vRO).
Whilst carrying out some testing in my lab surrounding the impact of replacing SSL certificates in Hyperic, I noticed that if for whatever reason authentication between the Hyperic agent and Hyperic server fails, the Hyperic agent increases CPU utilisation of the client machine it’s running on to between 85% and 100%. At first I thought that it’s an anomaly, but I was then able to reproduce the symptoms a further 3 times in proving to VMware GSS that the issue really does exist. A long story short
It's been a long time since I've last posted any new content on here. The truth is, I've got a few blog articles drafted, but I've just not had the time to post them properly. Since March, I've been very busy on two customer projects, one of which came to an end successfully at the end of May. I'm still in the middle of the second customer project where we are deploying the VMware vRealize Suite across the Americas, EMEA and APAC regions for a global customer. For the vRealize Operations part of this project, we are really pushing the product beyond it's officially supported limits in terms of objects monitored, however as we are working with VMware on this particular deployment, we have a custom support statement that will see this huge environment supported regardless of what the office limitations of the products may be (in terms of the number of monitored objects).
Anyway, during the course of the current project, we have encountered many stumbling blocks with the vRealize Operations and Hyperic products. Hyperic in particular has a rather problematic "defect" that I am surprised has not been picked up by anyone until now. It's in the way that the default Hyperic agent configuration is used on both Windows and Linux distributions which could cause major performance problems on monitored endpoints (up to a constant 100% CPU utilisation). However, working with VMware GSS, we have now been able to raise an official bug ticket for the issue.
I'm working on two particular posts regarding Hyperic. In the first post, I cover the bug that we have found and how to work around it, or rather preempt it. The second post is focused on how to replace Hyperic 5.8.4 SSL certificates. During this post I will generate SSL certificates with OpenSSL and format them in a Java keystore that is used by the Hyperic server, with the root and subordinate root certificate authority certificates included. I'll go through the process of replacing the self signed keystore with our own custom keystore as well as performing the necessary database queries and updates required to replace the certificates properly.
I'm hoping to get at least one of these posts completed before the end of the weekend.
@jonathanmedd Where did you get that superhero action figure? That's so cool!!