Thursday, October 15, 2009

NetApp mbrscan and mbralign for Virtual Machine Alignment In-Depth

Alignment of VMware virtual machines has been an issue for quite some time. This issue exists no matter who is the storage vendor is but I will use NetApp because it is what I know. Here are some links to get you up to speed in case you don't fully understand the situation:

Link to VMWare document on alignment
Link to NetApp document on alignment

What has NetApp done about the situation? I hear that NetApp will be releasing a tool that plugs into vCenter specifically for vSphere 4.0 and vSpherei (ESXi) 4.0 shortly. In the meantime, if you are still on ESX 3.5 (or vSphere with a Service Console) there is another answer that you can use today. This will not work on ESXi since you need a service console for the tools.

Eric Forgette at NetApp created a set of tools about a year ago that has matured and found its way into the NetApp VMware Host Utilities Kit version 5.1. If you do not have this loaded on your ESX/vSphere servers and you are connecting to NetApp, go load it NOW! (A reboot is required for the settings to take affect) Most of the following is research I have conducted in conjunction with conversations directly with Eric over at this thread on NetApp Communities.

Included in this tool are two executables, mbrscan (scans a vmdk for alignment) and mbralign (performs the alignment). The default installation location is /opt/netapp/santools.

While the readme does a good job of going over the basics, there are a number of caveats to run the tools correctly. I will go into each executable in depth but before I go down into the weeds, you need to know when NOT to run it!

You can not or do not want to use the alignment tool for the following:
  • Windows 2008 Server is aligned if the machine was created as a Windows 2008 server. If the machine is upgraded from Windows Server 20003, it will not be aligned.
  • Citrix Servers are not supported because they remap the c:\
  • Windows Dynamic Disks are not supported and will be corrupted if an alignment is performed (but mbrscan will detect them - see below)
  • Linux LVM volumes are not supported (mbrscan may NOT detect all LVM partitions)
  • Windows Server 2003 non-boot disks that have been added (d:, e:, etc) will need to be remapped in Computer Management. The drive letter will be lost on alignment.
  • GRUB booted Linux and Solaris will need to have GRUB reinstalled after alignment
With all that out of the way, it is a basic two step process: 1. run mbrscan, 2. run mbralign on machines as needed.

Step #1 - mbrscan

  • In order for the mbrscan to give reliable results, the machine must either be powered off or have a VMware snapshot!
  • I have a simple script that I put together that just takes a VMware snapshot on all machines on an ESX/vSphere host.
  • I then execute mbrscan using the scan all virtual machines parameter: mbrscan --all
  • After I have the scan results I need I execute another script to remove the VMware snapshots I just created for all machines on the host
  • NOTE: Windows Dynamic Disks will report a partition type of: unknown - 0x42. Do Not Align These Partitions!
Step #2 - mbralign
  • In order to perform an alignment, ALL VMware snapshots MUST be removed!
  • In order to perform the alignment, you NEED an amount of free space equal to the size of the vmdk. mbralign will make a backup of the file as the first step. This file have an extension: -mbralign-backup.
  • In order to perform the alignment, the virtual machine MUST be powered off!
  • If the virtual machine has multiple vmdk's, only one can be aligned at a time! 
  • Execute mbralign against the vmdk - I usually get about 1-2GB per minute speed
  • Boot up the virtual machine. If it works, delete the -mbralign-backup file
  • On Windows systems it will ask you to reboot one more time because it detects the hard disks as new hard disks
  • If it doesn't work, run mbralign again and it will detect the -mbralign-backup and ask you if you would like to restore the file. Very Nice!

25 comments:

Eric Forgette said...

Hi Aaron,
Thanks for putting that post together. I think its a great resource. While I can't comment on specifics, 'shortly' is probably overly optimistic with regard to a plugin. You can however use mbralign and mbrscan on ESX 4.0. In fact the NetApp VMware Host Utilities Kit is fully supported on ESX 4.0.
Thanks again for all your work on this!
Cheers,
-Eric

Aaron Delp said...

Thanks for the comment Eric!

The new tool is out:

http://blogs.netapp.com/storage_nuts_n_bolts/2009/10/netapp-virtual-storage-console-vsc-for-esx-ready-for-download.html

I am hoping to play with it soon.

bhavani said...

You can do citrix servers aswell
but there is a procedure to change the drive letters first in registry and then changing the userinit.exe path.

Anonymous said...

Is there a powercli script to shutdown the VM and kick-off the mbralign process on the vm

NinjaDad said...

Hi, Aaron,

I am getting "Device busy" message after taking the snapshot for a VM for mbrscan. Does the VM have to be shut down after taking a VM? I took the snapshot from vSphere client with and without "Snapshot VM's memory" and "Quiesce guest file system", and it didn't seem to help out. Once the snap shot is taken, you still run mbrscan against the *-flat.vmdk file, correct?

This "Device busy" messages also comes up with the version 5.2 of the mbralign tool.

Could I also see the contents of the simple script to take snapshots you are talking about too?

Thanks!

Aaron Delp said...

@anon - There might be a Powercli script out there but I haven't seen one. (I really need to play with PowerCLI someday, just no time right now)

@masa That is interesting. Yes, you either need the vm powered off or you need a VMware snapshot for the virtual machine or you will get the resource busy error. Are you sure that the VMware snapshot isn't timing out?

You are correct, you scan the flat file.

If you installed the tool with the NetApp VSC or the HUK version 5.1 or greater NetApp will provide support on the tool now as well. You may give them a shot or also try the NetApp Communities link listed in the article. I haven't tried in awhile but maybe Eric (original author of the tool) may be able to help you out a little more. Good Luck!

Oh, about the scripts. I see if I can dig them up and I'll post them on a separate post later this week.

Thanks!

NinjaDad said...

Hi, Aaron,

I finally figured out what I was doing wrong. Our environment has many ESX hosts with several shared storage accessible by all of these ESX hosts, and this mbrscan or mbralign --scan requires to be run against the *-flat.vmdk file on the ESX host which is hosting the VM's configuration file, .vmx. I don't remember reading this in documentation, but I probably missed it somewhere. I was initially thinking mbrscan could be run on any hosts to any shared *-flat.vmdk files, but that doesn't appear to be the case. Once I determined the correct ESX hosts with vmware-cmd -l, I was able to get mbrscan running for VMs that were currently running with snapshots. I plan on scripting the rest tomorrow.

Thank you again for your quick response!!

NinjaDad said...

Hi, Aaron!

I recently encountered an issue that alignment "appears" to have failed on a W2K8 VM. I hear W2K8 should be aligned automatically, so I assume this VM may have been upgraded from W2K3.
So when I run mbralign command, it goes through all the process, and it says it completed ok. I don't get any error messages. However, when I power on the VM, the console says "Error loading operating system". Somebody was talking about driver letter change, and I removed the data disk.. so it should only have C:. Still the same error. The documentation says "Windows VM SHOULD load", but in case it doesn't, it doesn't give you anything to try like the Linux GRUB ones... The question then is "How do you recover the aligned Windows VM that does not boot?". Any info would be appreciated. Thanks!

Aaron Delp said...

Hey Masa - Is the disk a dynamic disk? Dynamic Disks don't align and will become unusable in the way you describe if you try to align them. That is the only thing I can think of. Good Luck!

Anonymous said...

Tried mbrscan and mbralign on a RH5 VM. a good tool, did alignment smoothly. But caught one issue.
disk partition overlap start/end cylinder. protential file system crash issue.

any comment?

thanks,

> Disk /dev/sda: 23.6 GB, 23622348288 bytes
> 255 heads, 63 sectors/track, 2871 cylinders Units = cylinders of 16065
> * 512 = 8225280 bytes
>
> Device Boot Start End Blocks Id System
> /dev/sda1 * 1 1045 8385898+ 83 Linux
> /dev/sda2 1045 2089 8385930 82 Linux swap
> /dev/sda3 2089 2350 2096482+ 83 Linux
> /dev/sda4 2350 2872 4192975 5 Extended
> Partition 4 does not end on cylinder boundary.
> /dev/sda5 2350 2611 2096451 83 Linux
> /dev/sda6 2611 2872 2096451 83 Linux

Aaron Delp said...

Sorry, nothing I can add on that one. My experience aligning Linux is very limited.

NinjaDad said...

Hi, Aaron,

I was still having some issues aligning Windows VMs (basic disk), and found the following:

http://communities.vmware.com/message/1638108

""Error loading operating system" after running mbralign from EHU 5.2 on a VMDK"

Duh!!

"DESCRIPTION:
Under some circumstances, the mbralign utility that is packaged with the EHU 5.2
and VSC 2.0 may incorrectly calculate the Cylinder/Head/Sector information used
by the bootloader. When this occurs, the resulting GOS will be left in a state
that is not bootable."

The ones I was having issues in particular was WindowsXP VM. The EHU 5.2 seemed to work on Windows 2003 VM when I tried.

So, the current recommendation may be to just use mbralign from EHU 5.1...

Hope this helps somebody...

Aaron Delp said...

Masa - Good catch and thank you for posting! I haven't had a chance to use the 5.2 version so any information you have is very useful. Thank you again!

Satinder Sharma said...

@Masa The problem with mbralign bundled with VSC2.0 has been fixed in the latest release of mbralign bundled with VSC 2.0.1

vmexplorer said...

"Windows 2008 Server is aligned if the machine was created as a Windows 2008 server. If the machine is upgraded from Windows Server 20003, it will not be aligned. "

OR If your build process is windows 2008 server being pushed from an image file to your VM, there is a chance your not aligned...

Never assume its aligned... check it alwasy :)

Bogdan said...

Hi Aaron!

Is it possible to send me those scripts mbrscan&align to check my infrastructure ?

I'm running vsphere 4.1 with an IBM DS4700 storage system and i don't have a client account at netapp to download those tools.

Best regards,
Bogdan Leonte

Aaron Delp said...

Sorry, the only way I know to get them is from NetApp as they were written by NetApp employees as a side project initially.

Do you have a local IBM Storage technical resource you could call to help you out?

Good Luck!

Anonymous said...

Hey Aaron,

Hoping you could help. I am testing out the restore feature of the mbralign tool but it doesn't look like it's working for me. I can successfully align a disk on esx 3.5) and also create the backup file. However, when I run the align tool again (with the backup file in place) it doesn't prompt me if I want to restore. It just ends with the message saying Please remove or rename all files ending in -mbralign-backup..

any reason why it is not prompting me with a restore option?

Thanks a ton

Aaron Delp said...

Where did you get the tool and what version is it? I know early versions of the tool didn't have the restore option.

I also know later it was bundled in with the plug ins and that version had (I thought) the restore option.

Thanks!

Anonymous said...

Hey Aaron,

I believe it's version 4.. I did not install it but a colleague did and he's out of office :-|..

Thanks for your reply.

Aaron Delp said...

Ah - So that is a relatively new version I believe. I'm sorry but I haven't worked with the tool in a long time now (I no longer have access to NetApp gear). I'm wondering if they didn't pull it out for some reason.

All the tool did was take the vmdk file and made a copy and stuck a backup extension on it. When you executed the tool it would check for the presence of this file and prompt you if you wanted to restore or not.

Would I would suggest is to make a copy manually your self before the alignment and then you have it if you need it and you can delete it all goes well.

Of course, this mean you need 100% disk space overhead which may cause issues but better safe than sorry!

It was VERY rare but I did have one or two that did corrupt so I would always recommend a backup in some way.

Thanks!

Anonymous said...

Hey Aaron,

thanks for your reply. Here's how I made it work - I manually renamed the backup vmdk to the original vmdk from the ssh session to my host. And as you said, all that the tool does is takes a copy and slaps a backup extension to the copy, the manual method I did to copy it back to the original file name just worked. :-)..

Thanks for your time to respond to my queries.

Cheers

Aaron Delp said...

Awesome!! I'm glad it worked! Thanks for coming by!

cmcrci said...

In our environment, we have a Vm named abc and ithas 3 vmdks, naming abc-flat.vmdk, disk1.vmdk and disk2.vmdk.

While performing mbralign for the disk1 and disk2 i am getting the belo message and not able to run the alignment.Please help me out.
Message:
The same vmdk was found in multiple vmx files.
Please give explicit path to vmdk file location

Aaron Delp said...

Sorry, I personally haven't seen that error message before so I can't tell you which way to go with that one.

I would check the directory that contains the vmdk's to see if you have multiple .vmx files for some reason. The only reason I could think of is if the machine was renamed or copied and for some reason you have more than .vmx file. That usually isn't the case.

The .vmx file that matches the virtual machine name currently in use should be the one you are looking for but again, no guarantees! Good Luck!