Today I was planning on testing the new 16 patches released by VMware for vSphere 4. I wanted to place these on my 2nd ESX host. I normally place my ESX hosts in maintenance mode before I remedaite updates. As I placed esx2 in maintenance mode, the VMs, as expected, started to migrate over to the other hosts in the cluster with VMotion. The VMotion migration of two of my VMs running Windows XP, failed with the following error message:
A general system error occurred: Failed to write checkpoint data (offset 33558328, size 16384): Limit exceeded
It turns out that a VM must have less than 30MB Video RAM or VRAM assigned in order to be compatible with VMotion. As I normally run these two VMs at 1680 x 1050 resolution, I went all out an assigned the maximum amount of memory allowed, which is 128MB as VRAM, hence the reason for the VMotion failure.
Now this may be old news, but I decided to blog it anyway for my own reference.
Simon Long has written a good article on how to improve VMotion performance when performing mass migrations. This is very handy when you are putting a host into maintenance mode. Thanks to Jason Boche for blogging it first!
I'll set the scene a little...
I’m working late, I’ve just installed Update Manager and I‘m going to run my first updates. Like all new systems, I’m not always confident so I decided “Out of hours” would be the best time to try.
I hit “Remediate” on my first Host then sat back, cup of tea in hand and watch to see what happens….The Host’s VM’s were slowly migrated off 2 at a time onto other Hosts.
“It’s gonna be a long night” I thought to myself. So whilst I was going through my Hosts one at time, I also fired up Google and tried to find out if there was anyway I could speed up the VMotion process. There didn’t seem to be any article or blog posts (that I could find) about improving VMotion Performance so I created a new Servicedesk Job for myself to investigate this further.
3 months later whilst at a product review at VMware UK, I was chatting to their Inside Systems Engineer, Chris Dye, and I asked him if there was a way of increasing the amount of simultaneous VMotions from 2 to something more. He was unsure, so did a little digging and managed to find a little info that might be helpful and fired it across for me to test.
After a few hours of basic testing over the quiet Christmas period, I was able to increase the amount of simultaneous VMotions…Happy Days!!
But after some further testing it seemed as though the amount of simultaneous VMotions is actually set per Host. This means if I set my vCenter server to allow 6 VMotions, I then place 2 Hosts into maintenance mode at the same time, there would actually be 12 VMotions running simultaneously. This is certainly something you should consider when deciding how many VMotions you would like running at once.
Here are the steps to increase the amount of Simultaneous VMotion Migrations per Host.
1. RDP to your vCenter Server.
2. Locate the vpdx.cfg (Default location “C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter”)
3. Make a Backup of the vpdx.cfg before making any changes
4. Edit the file in using WordPad and insert the following lines between the <vpdx></vpdx> tags;
5. Now you need to decide what value to give “maxCostPerHost”.
A Cold Migration has a cost of 1 and a Hot Migration aka VMotion has a cost of 4. I first set mine to 12 as I wanted to see if it would now allow 3 VMotions at once, I now permanently have mine set to 24 which gives me 6 simultaneous VMotions per Host (6×4 = 24).
I am unsure on the maximum value that you can use here, the largest I tested was 24.
6. Save your changes and exit WordPad.
7. Restart “VMware VirtualCenter Server” Service to apply the changes.
Now I know how to change the amount of simultaneous VMotions per Host, I decided to run some tests to see if it actually made any difference to the overall VMotion Performance.
I had 2 Host’s with 16 almost identical VM’s. I created a job to Migrate my 16 VM’s from Host 1 to Host 2.
Both Hosts VMotion vmnic was a single 1Gbit nic connected to a CISCO Switch which also has other network traffic on it.
The Network Performance graph above was recorded during my testing and is displaying the “Network Data Transmit” measurement on the VMotion vmnic. The 3 sections highlighted represent the following;
Section 1 - 16 VM’s VMotioned from Host 1 to Host 2 using a maximum of 6 simultaneous VMotions.
Time taken = 3.30
Section 2 - This was not a test, I was simply just migrating the VM’s back onto the Host for the 2nd test (Section 3).
Section 3 - 16 VM’s VMotioned from Host 1 to Host 2 using a maximum of 2 simultaneous VMotions.
Time taken = 6.36
Time Different = 3.06
3 Mins!! I wasn’t expecting it to be that much. Imagine if you had a 50 Host cluster…how much time would it save you?
I tried the same test again but only migrating 6 VM’s instead of 16.
Migrating off 6 VM’s with only 2 simultaneous VMotions allowed.
Time taken = 2.24
Migrating off 6 VM’s with 6 simultaneous VMotions allowed.
Time taken = 1.54
Time Different = 30secs
It’s still an improvement all be it not so big.
Now don’t get me wrong, these tests are hardly scientific and would never have been deemed as completely fair test but I think you get the general idea of what I was trying to get at.
I’m hoping to explore VMotion Performance further by looking at maybe using multiple physical nics for VMotion and Teaming them using EtherChannel or maybe even using 10Gbit Ethernet. Right now I don’t have the spare Hardware to do that but this is definitely something I will try when the opportunity arises.
There is a problem with the with HA in VMware ESX 3.5 Update 3. Virtual Machines may reboot unexpectedly when migrated with VMotion or after a Power On operation. This is only when the Virtual Machine is running on an ESX 3.5 Update 3 Host and the ESX Host has VMware HA enabled with "Virtual Machine Monitoring" option active.To work around this problem:
Option 1: Disable Virtual Machine Monitoring
1. Select the VMware HA cluster and choose Edit Settings from the right-click menu.
2. In the Cluster Settings dialog box, select VMware HA in the left column.
3. Un-Check the Enable virtual machine monitoring check box.
4. Click OK.
Option 2: Set hostd hearbeat delay to 0
1. Disconnect the host from VC (Right click on host in VI Client and select "Disconnect" )
2. Login as root to the ESX Server with SSH.
3. Using a text editor such as nano or vi, edit the file /etc/vmware/hostd/config.xml
4. Set the "heartbeatDelayInSecs" tag under "vmsvc" to 0 seconds as shown here:
5. Restart the management agents for this change to take effect.
service mgmt-vmware restart
6. Reconnect the host in VC ( Right click on host in VI Client and select "Connect" )
As of VMware VirtualCenter 2.5 Update 1, ESX Server 3i systems can only be added to an HA cluster if the system has swap enabled.
This article applies to:
- VMware ESXi 3.5.x Embedded
- VMware ESXi 3.5.x Installable
- VMware VirtualCenter 2.5.x
ESX 3i Servers with swap not enabled will show the following message(s):
An error occured during configuration of the HA agent of the host.
HA Agent has an error : Host in HA cluster must have userworld swap enabled.
To enable swap on the ESX 3i Server: