Thursday, June 17, 2010

My #1 Issue with VMware ESXi Today

It is no secret that VMware has anointed ESXi the future hypervisor of choice.  I am often asked what I think of ESXi and if I think it is functionally equivalent to ESX.  My short answer is NO, not today.  My opinion is due to one key reason, realignment of vmdk's on VMFS datastores today.  If you don't know what I mean, you need to understand the concepts below.

What is the difference between vmdk alignment and realignment and why should you care?

The alignment of vmdk's is the big white (or pink if you prefer) elephant in the room from a performance standpoint.  It is often ignored by many customers and misalignment can lead to as much as a 30% degradation in performance!!  To their credit, both NetApp and EMC have recognized this and and released documentation to address the issue.  Here are links to VMware & NetApp papers on the subject.  EMC documents H2197, H2529 & H5536 on PowerLink also provide information on the subject.  EVERY storage vendor has this problem; EMC and NetApp are the only two I know of that have spoken about it.  If aren't aligning/realigning your virtual machines as part of standard best practices, you should be!


What is the difference between alignment and realignment?

As pointed out to me by Duncan Eppping in a Twitter conversation, there is a big difference between alignment and realignment.  Alignment happens when the partitions are created but before the OS is installed.  Let's be honest, it is a pain in the butt and many don't learn about alignment issues until they start to have problems.  Because it is a pain, many choose realignment to remedy the problem.  Realignment is the concept of using tools (vOptimizer and NetApp's mbrscan/mbalign come to mind) to align an existing vmdk that is misaligned.  This process requires taking the machine offline and then rewriting the entire vmdk in an aligned format.  I'm speaking about realignment in this article.

Are all virtual machine operating systems affected?

NO - Windows 2003 and earlier are misaligned by default but Microsoft changed the alignment of Windows 2008 so it is now aligned by default.  Be careful though, Windows Dynamic Disks, Cirtix Servers, and many other special cases often won't work. See my article on the NetApp tools for more information.  I'm not an expert on the Linux vm's but if you have experience, please leave a comment!  Lastly, if you align your template, all deployments from that template will also be aligned.  This is really nice IF you remember to align your template before installation of the OS and before you deploy all your machines!

What does this have to do with ESXi?

Both vOptimizer and NetApp mbrscan/mbralign utilize the service console so ESX works just fine.  Due to the lack of a service console, ESXi is a different story.  With NFS datastores, you can mount a Linux host to the datastore and perform the alignments using the NetApp mbrscan/mbralign.  This process is documented by Nick Triantos here.  I'm not sure if you can use vOptmizer in this way.  VMFS is another story.  Currently, there doesn't appear to be a clear method to realign vmdk's.  No service console to run the utility and no way to access the VMFS outside of ESXi.  What do you do if your storage vendor doesn't support NFS?  What would you do if your storage vendor is VMFS (LUN based) only?  You have problems my friend.

I did a poll on Twitter yesterday and the results confirmed the same findings as my customer base.  Many still use win2k03 based servers and most P2V's are Windows 2003 server and earlier.  In addition, ESXi is trending to the market faster than windows 2008 virtual machines yet many have no idea about the concept of vmdk alignment/realignment.
What are the work-arounds?

  • Create a linux host and attach it to an NFS datastore and run NetApp mbrscan/align from there (not sure if that will work on vOptimizer)
  • Set up a single ESX host to perform the alignments on VMFS datastores
  • Align your Windows 2003 template and deploy aligned machines on VMFS and P2V all machines to NFS
When/How could this be fixed (I'm guessing here, I have no inside knowledge of any vendor products)?

  • ESXi includes the ability to detect and realign vmdk's
  • Storage Vendors include the ability to detect and realign vmdk's
  • Incoming P2V's of older OS's are aligned in transit
  • Windows Server 2008 Server becomes the standard 
Bottom Line - Can you get around the issue?  Yes you can.  Is it a pain?  Yes it is.  Until this is fixed I will continue to recommend ESXi with NFS environments but I will continue to resist ESXi with VMFS unless the customer fully understands the ramifications of this issue.  I look forward to using ESXi as my standard and we get closer and closer everyday but we also need to be careful.

39 comments:

William Lessard said...

Great article but one question. Last paragraph is the recommendation of esxi for nfs a typo?

Riley said...

Great post Aaron! VMWare says:

"The best practice for adding VMware VMFS storage to ESX Server is to use the VI Client, because it automatically aligns VMware VMFS partitions when it creates them."

Am I understanding correctly that the above does not apply to ESXi?

Anonymous said...

Very good post to summarize the point.

No way to get around this using vMA ?

Aaron Delp said...

@William - I made the last statement a little more clear. What I meant to say was today I use ESXi with NFS with no issues but if a customer wants to use VMFS then I usually make sure they understand all of this before proceeding.

@Riley - You actually have two sets of alignment to worry about. The first one is the datastore level. This is handled but what you are saying. If you use vCenter, you DATASTORE is aligned, but the vmdk's within the datastore may still be misaligned. You need to align both.

@anon - Very good question. I don't know that answer. I don't have much experience with vMA at this point but I hope to in the future. If anybody out there has experience, I would love to know!

Daniel Hernandez said...

William - I believe he's recommending NFS because at least there is a work around to mount the NFS datastore and perform the necessary realignment against the vmdk.

Dave Convery said...

Great post! @Stevie_Chambers and I posted an article about ALIGNMENT that includes how to align Linux ->
http://communities.vmware.com/docs/DOC-11458

It doesn't include REALIGNMENT, but its a start.

I have not really seen performance issues when people follow best practices of putting the data on an aligned partition. After a P2V, you can just create a new aligned partition and copy everything over (services stopped, re-letter drives, etc.) The OS may get a little bit of value on an aligned partition, but the work may not be worth the effort.

That being said, I always recommend aligned partitions, even for the OS for templates. ESPECIALLY for desktops, where you need to eek out every last bit.

Aaron Delp said...

@Daniel - You are correct.

@Dave - Thank you for all the information and the link!

Anonymous said...

Agree. my other issue is Host Utilities for multipathing I haven't seen yet from my san vendor which doesn't require the SC. So another issue.

Aaron Delp said...

@Anon - Who is your SAN vendor? I know NetApp has the VSC (Virtual Service Console) that will perform the host kit tweaks for ESXi without the service console. It appears that vCetner plug-ins will be the way to solve that issue going forward.

Anonymous said...

IBM NSeries, they still don't have the VSC avail.

Aaron Delp said...

VSC will work on N-Series, just not technically supported. I've run it before. Might want to check it out but no guarantee on my end! :)

Vijay Swami said...

Nice article.

How about this when using ESXi

http://www.virtualizationadmin.com/articles-tutorials/vmware-esx-articles/general/how-to-access-the-vmware-esxi-hidden-console.html

Andrew Storrs said...

Great post Aaron and an issue dear to my heart these days. As part of my work with clients this past year, I've realigned over 500 VMs. scarry I know. ;)

A couple comments I'd like to add:

- "Set up a single ESX host to perform the alignments on VMFS datastores" this is my preferred choice where the environment is either large enough or not yet 100% virtual; I use this ESX host as a landing zone for future P2Vs.

- Even if you've deployed pre-aligned templates if you add an additional virtual disk to the deployed VM it will NOT be aligned by default so you need to make sure you use the O/S native partitioning tools to align it before putting data on it (as per the article Dave linked to: http://communities.vmware.com/docs/DOC-11458).

- While VMware Converter does not align X2V's as a part of the conversion process, PlateSpin Migrate (PowerConvert) does. This can save you an immense amount of time on a large P2V project. With that said I've noticed the caveat that it only auto-aligns the first partition on any disk, so ALWAYS break out your partitions into seperate VMDKs when using PlateSpin.

- Aside from Windows dynamic disks, Linux LVM (logical volume manager) disks also cannot be realigned using mbralign. The same of course is true for GPT formated disks.

Aaron Delp said...

@Vijay - That might work, I haven't tried it. I may have to try that in my lab soon.

@Andrew - All awesome points! Thank you!!!

Anonymous said...

- There is an IBM branded VSC I believe.

Wells stated!

Rob Quast said...

@Vijay CC: Aaron

mbrscan/mbralign no go in esxi tech support mode, tried it. The install for the utilities fails and trying to reverse engineer it even after getting the binaries extracted the libraries they reference aren't in ESXi. I stopped there, but even if it could get working I would question how much stress it would generate, mbralign is pretty memory intensive on a normal service console.

Andrew Storrs said...

Rob hints at something I wanted to add. If you build a dedicated landing zone host running ESX (like I mentioned in my previous comment) make sure you set the service console memory to the max of 800MB (http://is.gd/cSVg6) so as to improve the speed of multiple mbralignment tasks. It can make a huge difference.

Chad Sakac said...

Disclosure - EMCer here...

Aaron, great post - and great to bring attention to something people should know.

Alignment (on both physical and virtualized environments) is always a good idea (it decreases extraneous "heavy lifting" arrays need to do internally, allowing them to do more with less).

Consider alignment "good hygiene" (no matter what array you use) which is fundamentally about efficiency - doing more with less.

Be smart, make it part of your templates (BTW - this applies to LInux VMs too if that wasn't clear).

Gonna get cracking on a ESXi workaround...

Aaron Delp said...

@Chad - Thanks for the comment! Yes, it exists in the Linux world and not sure when (if ever) that it will go away like it has on Windows with 2008. The big caveat to Linux (at least with the NetApp tools) is that GRUB needs to be reloaded. Painful!

I would love to see a workaround ;)

Nick Triantos said...

Aaron,

this is an excellent post. While mbrscan/mbralign work well, there are some caveats (i.e no dynamic disk, LVM, no ESXi w/ VMFS)

We have been looking at ways to see if we can align dynamically and address ESXi with VMFS.

I see this becoming a larger issue as ESXi goes mainstream but also as Linux implementations start to increase. In fact, from a growth perspective Linux is projected to grow faster than windows in the next 3-4 years.

BTW...VSC works for N-Series. Identical code. We, NetApp, built 2 versions of our tools...One NetApp Branded and one IBM N-series + docs. Only difference is the logo.

Apparently IBM does not distribute them due to some IBM legal issues with the tools been distributed for free. Sounds like rather bizarre issue.

I have a proposal on the table to put these tools at the NetApp N-Series community site so N-series customers can download them but also have a support disclaimer. Will see if this flies. I hope it does.

simonious said...

Just wondering if there is a tool to report on aligmnent status across a cluster or data center of VMs, without having to check each one individually?

Aaron Delp said...

@simon - I answered this on Twitter but I'll answer here as well for the record. You can use the NetApp mbrscan utility with the --all option and it will scan and report all the vm's on a datastore, no matter the cluster. I don't know of a way to cross datastores with the tool though. You can use the tool on storage other than NetApp as well although you are unsupported at this point.

Andrew Storrs said...

@simon I've created a script in PowerShell to run the tool across all datastores by using plink to SSH into a host and run the command with the --all switch as Aaron described then pull all the results back into a report. I can see if I can clean it up and share.

drakpzone said...

Hi Aaron.

First: mbralign (netapp tools). They do work for Netapp arrays, indeed. But since they do align at 32k, they shouldn't be used for other arrays, such as EMC (which stands at 64k).

Second: linux. Our solution for new vms is to create a kickstart file (RHEL, but other distros apply) that ditches the standard partitioning toolset, and uses instead parted to:
- align first partition to 1Mib
- align following partitions (even swap!) to 64K boundaries (we're running on EMC arrays). see Note below)
Works like a charm, verified that the alignment is ok.

For existing misaligned Linux vms, we use this technique: pxe boot of the vm with gparted (BEWARE: use latest Beta, which added the 1Mib align feature), drop the swap (if disk is full), move/realign the partitions, and recreate the swap.
Indeed it requires downtime and takes some time, but it works just fine.

Note: the automagic math we're using to align each partition can be summarized this way:
Say we need two partitions (root, swap):

D=DISK SIZE (in sectors)
R=ROOT SIZE (in sectors)
S=SWAP SIZE (in sectors)
K = (D-S)\64*64 = starting sector at 64k boundary for swap
R=D-S (in case we want to fill the disk)
M=1Mib (2048 sectors)
Partitioning then should go this way (units=sector):
Root: from M to K-1
Swap: from K to D

There's some rounding in this (we lose a 1MB in front and few KBs at the end) but the partitions are surely aligned.

Hope that this helps :-)

Aaron Delp said...

@Andrew - Considering how many you have done, that would be awesome if you can share it!!

@Drak - Wow! Great information, thank you! My question to you (and everyone else) is what happens if you use mbralign on EMC? I have been told by many that it works just fine. Can you explain the difference between the 32k and 64k and why we shouldn't do it?

Anonymous said...

IBM distributes the Host Utilities so what's the difference between that and the VSC? Doesn't make sense to me an I have been waiting forever to use it.

drakpzone said...

Simply put, NetApp filers use 32k blocks, and that's why mbralign does align to 32k boundaries.

EMC instead uses 64k blocks, thus you can't use mbralign (which can't be told to align to anything different from 32k).

Note 1: mbralign has the "--bs" option, but it changes the block size he moves while doing the alignment, not the starting block, which is fixed to 32K.

Note 2: if you align to 64K, it's ok also for Netapp, since (according to recent studies) 64=32*2 ;-)

Note 3: AFAIK, EMC doesn't have (yet) an equivalent of the mbralign tool for 64K blocks (EMC, please DO!!!).

drakpzone said...

oops, just to clarify. Netapp stays at 4k blocks, but chose to align to 32k for a number of reasons (afair it was about 32k being the first common multiple for windows partitions).

Chad Sakac said...

Disclosure - I'm an EMC employee

@Aaron, @drakpzone

I'm not going to claim to be an expert on NetApp, so take anything implied there with a grain of salt...

The purpose of alignment is to minimize extraneous internal array operations. All arrays have internal constructs that are generally a function of the RAID model (and also the filesystem alignment, and in some cases logical page table constructs in virtually provisioned models).

You want to maximize full-stripe operations, and minimize stripe crossings (where an IO which should land within a stripe spans stripes).

With all the funky stuff arrays do now (thin, snap, dedupe, compress, auto-tier), you also want to decrease the number of metadata operations caused by these "unnecessary" spanned objects. All the funky goodness is done via either filesystem or another (pages commonly) abstraction on TOP of the RAID abstraction. Think of a 4K NTFS IO operation in a Guest making it's way down to the array. Once it gets there, let's say the array has a 64K stripe, but a 1MB "page" used for these fancy features. Falling into two 1MB logical memory pages as an example - where statistically it's much more likely to land on a boundary if the volume is aligned on a 4K boundary.

So older Windows revs (W2K3, Windows XP, etc) and LInux - like all OSes, label their volumes (just like VMware does). This volume header offsets the beginning of the data volume - the amount of which varies.

In NetApp's case, the natural "alignment values" are (I'd have to assume) would be the 4K WAFL allocation size (a filesystem attribute) and the underlying RAID stripe under the aggregate (perhaps this is the 32K value).

In EMC's case, the natural "alignment values" are the 8K UxFS allocation size (a filesystem attribute) and the underlying RAID stripe (64K).

If you understand this, you can understand why in VMware (or Hyper-V's or Xen's) case, you need to align the "container" (in VMware land the datastore), AND the VMDK. If the datastore is aligned on an even multiple of 4K boundary (and ideally an even multiple of the array RAID stripe) - that means the virtual disk starts aligned. BUT, then in the GOS, it also signs the volume, starting on an offset.

The reality is that the biggest benefit starts by just aligning on a nice boundary (multiple of 4K) rather than the messy start of "right after the volume indentifier". This "multiple of 4K is the default across a lot of arrays, but the closer you get to these "natural values", the better.

I have a tendency to explain things the long way (it's not because I'm trying to be complicated, but rather that the way I learn personally is by understanding it at the low level and working my way up), so let me make this simple:

1) Alignment is good. Not being aligned is not fatal. Larger arrays (more brains and cache) tend to offset the impact of misalignment a bit - BUT let's start back at the basics: Alignment is good.
2) Follow your array vendor's best practice.
3) If you choose to use mbralign on an EMC array (or another), it won't **hurt** you (32K aligned better than not aligned - though 64K is better on an EMC array).
4) How to do it "upfront"? Personally, I use diskpart for older windows hosts and template up front. I use GParted for linux.
5) How to do it if it's not "upfront"? If I have a customer who is running into performance problems, has a bunch of mis-aligned VMs, and I KNOW it's about misalignment (BTW, you can actually "see this" on an EMC array - use Analyzer and look at the "stripe crossings" value) - I'm a fan of vOptimizer for doing it en-masse.

This whole post has re-engergized me to persue a broader solution along with my buddies at VMware.

Chad Sakac said...

Aaron - sorry, one addition.

(and FYI - I linked to you here http://virtualgeek.typepad.com/virtual_geek/2010/06/so-why-all-the-fuss-about-alignment.html)

If anyone wants the EMC techbooks, they are all publicly available on the EMC Resource Library.

You can find the Resource Library here:
http://www.emc.com/resource-library/resource-library.esp

You can search for the document numbers ("H____")

Here they are:

CLARiiON: http://www.emc.com/collateral/software/solution-overview/h2197-vmware-esx-clariion-stor-syst-ldv.pdf

Celerra:
http://www.emc.com/collateral/hardware/technical-documentation/h5536-vmware-esx-srvr-using-celerra-stor-sys-wp.pdf

Symmetrix:
http://www.emc.com/collateral/hardware/solution-overview/h2529-vmware-esx-svr-w-symmetrix-wp-ldv.pdf

IF you love EMC :-), you can get them in physical book form here:

http://store.vervante.com/c/v/category.html?base_cat=EMC%3a%20EMC%20TechBooks&pard=emc&id=HNirXcfI

jnerdyboy said...

Just a picky thing that made me think twice when I read your article. Its not VMDK's that need aligning but the OS partitions within the VMDK.

If a datastore is created with vClient then the VMDK file is aligned on the VMFS. The guest partition within the VMDK also requires an alignment via diskpart, gparted etc.

Leif said...

Great post! It's funny, I was just thinking about this around a month ago pondering whether I should introduce ESXi into our environment with our next host. Initially, I thought, I'm good to go, since I use the NetApp VSC. Then I remembered how I use those wonderful mbrscan/mbralign tools. Fortunately, the workaround to keep 1 ESX host available should work fine.

Today, I came across this blog post stating that mbralign runs significantly faster on a Linux box (10MB/s on ESX vs 80MB/s on Linux). For those with NFS environments, you may want to look into this, or at least verify the service console has 800MB of memory as Andrew stated:
http://blogs.netapp.com/dropzone/2010/04/mbralign-on-steroids.html

Andrew said...

I keep intending to go with ESXi but just haven't yet....having this item as something "out there" is one of the reasons.

Although...has anyone played around with vmfs-fuse? That could potentially be an option for the ESXi + VMFS cases....

http://glandium.org/projects/vmfs-tools/

mtneagle said...

I recommend the fast and easy supergrubdisk boot iso - the easiest way to fix linux grub after using mbralign. Won't fix if you have LVM; remove LVM first via a V2V conversion or dd.

http://download.linux-live-cd.org/Super_Grub_Disk/download/binaries/sgd/cdrom/

Anonymous said...

Thanks for the article and info and links!

But you CAN get to an ESXi console. They say it doesn't exist or that it isn't supported. But I was shown how to do it on an official VMware course and it's well documented online. google "esxi console".

Aaron Delp said...

Anon - The big difference here is between CAN and SHOULD. The console while now supported by VMware as of 4.1, I doubt the tool is supported by NetApp running in 4.1. I know their long term vision is to approach this a different way so I'm not sure how much work they are putting into the product anymore.

If anybody from NetApp knows the future direction and if the tool is supported, I would LOVE to know.

If I remember right, you can get it to work, but no guarantees. If you do try it, please come back here and let us know!

Thank you!

Anonymous said...

I just moved from ESX 3.5 to ESXi 4.1 and missing the NetApp mbrtools badly. We have to realign VMs on a regular base. It seems that there is still no solution for those that use ESXi and VMFS.

Joe Spencer said...

Great Post Aaron!

This topic is also very near and dear to my heart as I have been helping my customers with VM realignments. I work for Double-Take Software, our "real time" replication solution provides V2V realignments without taking the VM's offline.

Aaron Delp said...

Joe - Thanks for the heads up! I'll have to check it out some time and do a post on it! Thank you!