Sunday, February 7, 2010

Why VMWare's VMmark Scores Have Become Useless

Well, it was bound to happen.  Every time an industry benchmark standard comes out, the manufacturers eventually figure out ways to "cook the books".  I've seen a LOT of FUD flying around from both HP and Cisco lately about VMmark scores and I have been asked a lot of questions about both platforms.  After a taking close look at the scores, I'm ready to throw in the towel.

Before I go further, take a look at the VMmark, 8 cores scores posted here.  You will see the Cisco B200 Blade is on top right now (of the major vendors, I don't count Fujistu, sorry Fujistu) with 25.06 and the HP is next with the BL490 24.54.  A couple of points:

What is the different between really really fast and really really really fast??

What is the difference between 25.06 and 24.54.  Maybe 1-2%?  Honestly, not much if they both meet your needs and you won't be pushing them to their limits.  I'm sorry but that is within a margin of error and/or the test could be reconfigured by everybody to meet the score.  At the end of the day both of them will meet your needs very well and the title of "fastest blade" means nothing!

Both Cisco and HP sell "big memory solutions" but they are no where to be seen!

Take a look at the memory in the details for both of them.  Both the HP 490 and B200 use 96GB of memory.  Where is the B250 with the larger memory footprint? Where is the BL490 with either 144GB or 192GB of memory?  You will also notice that the HP BL490 memory is running at 1333Mhz and the B200 is running at 1066 Mhz.  Since there is no big jump in performance numbers the VMmark score isn't memory bandwidth bound or HP would have had an advantage.  I suspect (although I don't have proof) that the VMmark score is now CPU bound and any memory above 96GB doesn't help the scores.  I further think (again, no proof) that the test isn't pushing the maximum memory bandwidth because there is no change from 1333 Mhz to 1066 Mhz.  It would be interesting to see if the drop to 800Mhz by HP would be noticed in the scores.

Cisco is using an EMC SAN with SSD's on the back end!

Take a look at the EMC Storage section on the Cisco benchmark.  They are using an EMC CX-240 with SSD drives!  There is NOTHING wrong with this, SSD's are coming down in prices but they provide a clear, known advantage to the IOP's numbers that could easily be the sole reason for the 1%-2% increase.  I'm willing to bet that if HP used the same storage configuration, they would produce similar scores.

Why didn't Cisco use the Palo card?

Cisco is using the Q-Logic CNA for the tests.  Why didn't they use the Palo card?  I suspect because it isn't "technically" released yet but that is the benchmark everyone wants to know about.

What am I trying to say here?

What I'm saying is that both HP and Cisco make great products and they will go to great lengths to make the other look bad.  They are so close to each other from a VMmark score perspective that any clear difference can't be shown with the current test.  Don't make a purchase based on a score!

7 comments:

Brad Hedlund said...

Aaron,

I couldn't agree with you more about VMmark scores being largely irrelevant. If you make a server decision based on a 1-2% difference in a VMmark officially have your head buried in the sand. Most customers want to buy a data center architecture, not just a server.

As for Fujistu, they are using the Xenon 5590 which is not supported by Intel for server use.

Cheers,
Brad

Omar Sultan said...

Aaron:

I have to admit, its nice to be able to claim top spot, but I also know tomorrow, or next week, or next month, things will change again.

While synthetic testing can provide an interesting data point, that may or may not be useful and relevant, I agree with you that it certainly should not be the sole buying criteria.

Regards,

Omar Sultan
Cisco

virtaulTodd said...

I agree with you that using VMmark to make a decision between two very similar servers isn't very useful. But VMmark scores can be very useful in lots of other ways. Like showing the difference between generations of servers or between 2-socket and 4-socket or Intel and AMD.

I think that most of your comments are really more generic about benchmarks in general if you think about it. Benchmarks that are able to combine a cost component with the performance metric can be better, but getting everybody to agree on how to price things is really really tough.

I really like your blog because of the great deal of analysis that you did in looking at the underlying config details such as memory speed and spindle type.

Todd

Aaron Delp said...

Hey Todd - Thank you very much for the comment. I agree with you about comparing platforms. I was recently asked how an IBM HS22 Blade would perform against an IBM 3850 M2 4 socket box. That is a perfect example of a situation where the VMmark score would be very helpful!

My beef isn't with VMware, it is with the vendors. Too many times vendors (all of them, not picking on anybody) bring out their lab queens to show off. By this I mean a system that isn't very real world due to cost or configuration. This happens ALL the time in storage in particular.

Now it seems the server vendors are doing it as well.

I would like to see VMmark test on the things that make the vendors DIFFERENT rather than what makes them the same. I would like an HP490 with 144GB or 192 GB. I would like a Cisco B250 with 384GB or a Palo card. These tests that are currently posted are all the same with subtle tweaks.

Thanks for the comment!

BladeGuy said...

Todd, I think you were in the right track to add a cost component to make VMmark more useful. The article stated that there wasn't a problem with using SSDs to gain an advantage. But those SSDs add $220,000 to the cost for hardware to support the benchmark. I believe VMware should add a $/VMmark score to the benchmark as Todd suggested. Then the score will become relevant.

cedb said...

We are in the process of evaluating new blade servers platforms mainly for our vmware platform...
The benchmark dilenma does hit us pretty heavily as we try to be "agnostic" in every ways (hardware vendors as well as processors...).
Not convinced that specs (cint, cfp) is th right answer we investigat vmmark...
The benchmark is certainly a valid one thought limited in availability of results compared to results from specs.org.
Still clearly care must be taken reading results as the test certainly depends on storage setup as well... getting more juice from the fibers seems very intresting to vendors indeed... Aaron is true about ssd disks, ther's also setups differentiating storage for frontend and for databases, i/o queues tweeks, etc.
Not that it wouldn't be a logical step in a "industry" setup but it does shows that storage is a key factor to having a good score...
Thus we keep in mind that vmmark benchmarks an infrastructure setup mainly depending on server but also on other factors.
I then can't agree more that the purshase process can't be only based on benchmarks but also on a technical evaluation and price naturaly. These can't be included in a benchmark... a company infrastructure and knowledge as well as negociation capacity can't be standardized.

BladeGuy said...

CEDB, I agree you can't standardize a companies negotiating prowess, but there needs to be some method to ensure you're comparing apples to apples. Cisco used $200,000 worth of solid state disks to gain a 2% advantage in their 8 core VMmark. That fact is lost on the casual observer. But if a $/VMmark was added to the scoring, it will create a disincentive for this type of gamesmanship, and level the playing field.