Wednesday, June 9, 2010

Comparing Vblocks

I believe one of the most interesting concepts to come along in our industry recently has been Cisco/EMC/VMware's Vblock.  My best definition for Vblock is a reference architecture that you can purchase.  Think about that for a second.  Many vendors publish reference architectures that are guidelines for you to build to their specifications.  Vblock is different because it is a reference architecture you can purchase.  This concept is a fundamental shift in our market to simplify the complexity of solutions as we consolidate Data Center technologies.  We are no longer purchasing pieces and parts, we are purchasing solutions.
Anybody who knows me knows I love the term "cookie cutter".  I use it all the time because it very simply conveys the idea of mass replication in a predefined way.  Vblock is a "Cookie Cutter" Data Center.  As long as you stay within the guidelines presented in the reference architecture, the product is guaranteed to work.
I took some time this week to compare and contrast all of the various Vblock configurations.  Take particular notice to the items highlighted in yellow below.  Vblock 0 is very different from Vblock 1 and 2 due to the basis in IP storage vs. FC storage as well as using ESXi vs. ESX.  Here are my findings (Please excuse the use of a graphic for the table, blogger sucks at tables):

UPDATE (I forgot this part) - A Few Notes on Vblock0
Because Vblock0 boots from local disks instead of FC boot from SAN, you lose the stateless ability of Cisco UCS.  This isn't a big deal in a smaller environment but stateless is increasingly important as the solution scales up. The Vblock0 reference document doesn't list the disk characteristics at all.  As a matter of fact, it makes no mention of the disk configuration and the IOP's Block 0 will generate.  I hope the document is updated to include this information in the near future.

Now, the million dollar question: How many virtual machines can I fit on each solution?  The reference materials I used for this didn't really provide numbers that I would use so I have decided to use my own.  My assumptions are presented as well as my work in case you want to change the math, or call me an idiot.
The Vblock 1 and Vlock 2 Reference Architecture document lists a 4 vm's per core estimate using 2GB for the virtual machine size so I thought I would start there.  vBlock 0 and vBlock 1 contain blades with 48GB. vBlock 1 also contains a few blades with 96GB and vBlock 2 is all 96GB.  Here is the math I used to figure out the proper ratio of vm's per core per GB:

4 vm's per core w/ 48GB:
4 vm's per core * 8 cores per blade = 32 vm's per blade * 2 GB per vm = 64 GB total needed / 48 GB total memory = 1.3 oversubscribe (30% memory over subscription)

Using the numbers presented in the reference architecture works out pretty well.  I wouldn't be comfortable with a higher density than that for the 48GB solution.

8 vm's per core w/ 96GB:
8 vm's per core * 8 cores per blade = 64 vm's per blade * 2 GB per vm = 128 GB total needed / 96 GB total memory = 1.3 oversubscribe (30% memory over subscription)

Since the blade has double the memory, I doubled the vm's per core to achieve the same over subscription ratio.  I don't mind telling you that 64 vm's on a blade takes me to the edge of my comfort zone but I'll accept that density level for this calculation.

Using the values of 32 vm's per 48GB blade and 64 vm's per 96GB blade you achieve the following minimum and maximum ranges for each of the Vblocks.  Remember the Vblock 1 contains BOTH 48GB and 96GB blades so the math gets a little harder there.

Aaron's Fuzzy Math Vblock Minimums and Maximums:

There you have it.  What do you think?  Am I even close?  Please leave a comment!


Jeremy Waldrop said...

Aaron, great post. One question, if you put an FC expansion module in the Nexus 5010, connect the CX4-120 and the FC uplinks from the 6120 to the Nexus 5010 couldn't you still do boot from SAN on the UCS blades? All the FC zoning would happen on the Nexus 5010. This would also require a Storage Services license on the Nexus 5010.

Anonymous said...

Nice post. Those numbers fall in line with what I am expecting to see. I have 2 chassis each with 4xB250 w/ 96gb. In planning this I went with an estimate of 50-60 guest per host.

Aaron Delp said...

@Jeremy - Hey! I believe (I'll check for you to be sure) that this may invalidate the Vblock0. There is a fine line where a Vblock ceases to be a Vblock through modification and just becomes a fancy VCE package. I'm in meetings with some Vblock people tomorrow and I'll ask.

@Rod - Thank you very much! Always good to have extra eyes on my calculations.

Dan Libonati said...

Aaron, Good post. Two comments. First, shouldn't architects and engineers be worried about how to evacuate the ESX hosts? Does it make sense to have that number of virtual machines on a single host when you can only do one vMotion at a time? While I understand that the number of vMotions will be increasing (I believe to 8)…this still would require a considerable amount of time to evacuate the host in the event of a hardware failure (given that the device was a component of a redundant piece of hardware… Example redundant NIC’s… or if the hardware is reporting an eminent failure from its predicative agents monitoring the host).

Second, should architects be designing solutions that boot from the SAN? Since the direction from vmware is ESXi; wouldn’t be best to have that image stored on local disks (and use local disks for swap)? There is an open issue where customers who were booting from SAN in ESX3.5 and replicating the boot disks worked fine. However, ESX 4.x boot disks cannot be replicated and expected to boot correctly without manual intervention.


Aaron Delp said...

Dan - Great points!

Question #1 on virtual machine density. Yes, I tend to agree with you. I don't have many customers running more than 30 vm's per server at this time. That seems to be the magic "comfort level" given the technology today. I am seeing interest in higher vm density and the ability to evacuate a server needs to be a design decision of the solution.

Boot from SAN - again, great point. FC booting ESX is the only thing supported today to get stateless Cisco UCS. I would predict that ESXi will gain some form of remote booting before too much longer (maybe PXE, FC, iSCSI, etc). Replication of the vm's vs the host OS is something that needs to be considered in the design. My initial reaction is to have two sites, both with boot from SAN, but NOT replicated. This would have a similar design to local boot but provide UCS statelessness (is that a word?) without the complications of replicating the boot drives.

Anonymous said...

Great Post.

Dan, excellent points. Do you happen to have some reference to the vmotion limitation ? I would like to discuss with my cisco contacts.