I don't know if anyone else has come across this issue, but since upgrading to ESX 3.5 Update 2 we've been having strange problems with VCB snapshots. I've not had much time for troubleshooting in the last 3 or so weeks, but I found a workaround. Before I get ahead of myself, let me just first explain the issue we are having.
When backing up our VMs with VCB, the snapshot delta files created by VCB needs to be merged back into the main VMDK. However, for the past few weeks I've seen cases on some of our ESX hosts where the snapshots never gets merged back, and the delta files just keep on stacking up every time a snapshot is created. In other words, here’s what happens:
1. VCB creates a snapshot of a VM. This creates delta files such as VMNAME-000001.vmdk.
2. When the backup process completes, the snapshot delta VMDK is supposed to merge back into the main or its parent VMDK file but fails to do so. Now, this is not normally a problem, as you can just go and "delete" the snapshot using the snapshot manager in the VI Client.
3. However, when you go to the snapshot manager in the VI client, there are no VCB snapshots but there may be a "consolidate helper" snapshot. Even if I delete this snapshot, the process fails to merge the VMDK files back.
4. If I then create another snapshot manually using the snapshot manager, this creates the second set of delta files such as VMNAME-000002.vmdk.
5. When I then try to delete the snapshot, the VI Client reports the Virtual Machine as having no snapshots, however when browsing the data store, I can still see all the delta files. Also, when I log onto the ESX server where the VM is running, and issue vmware-cmd /vmfs/volumes/<DATASTORE>/<VMNAME>/<VMNAME>.vmx hassnapshot, the ESX server returns no snapshots for that VM.
This is a strange problem. The Virtual Machine clearly has snapshots delta files in its data store; however the ESX host is unaware of any snapshots for that VM.Now I did find a workaround for this problem, but I've been unable to find the root cause of this problem as I've been way to busy the last few weeks to have a good look at it.
The workaround is:
1. Log onto the console of the ESX host where the VM with snapshot problems is running on.
2. Restart the management agent on that server with service mgmt-vmware restart.
3. In the VI Client, go to the snapshot manager and manually create a snapshot for the VM (without a memory snapshot).
4. Now, "Delete" all snapshots. This should merge all delta files back into the main VMDK file.
The workaround suggests that there is a problem with the management agent on the ESX hosts, but if so, it has to be something in the Update 2 release as my cluster has 16 hosts and the problem seems to be popping up on random hosts daily. I am now planning to upgrade to Update 3 to see if that will clear the problem.
If anyone else has come cross a similar issue, please drop me an email or a comment.