Thursday, October 8, 2009

It's Now All About the IOPs!

I have seen a shift in my VMware customers recently that I wanted to share. As the industry makes the move to vSphere and Intel Nehalem (5500 series) processors, the bottlenecks are shifting. The bottleneck used to be memory capacity and it has now shifted to disk IOP's (Input/Output Operations Per Second, pronounced eye-ops).

Before I go into details, here is a quick introduction to performance and bottlenecks. Performance of a system has four major areas: memory (speed or capacity), processing power, network bandwidth, and disk (IOPs and capacity). Every system is always limited in one of these areas at any given time, you just hope that your system isn't hitting the current bottleneck! Performance tuning is simply the art of trying to move each bottleneck back so it isn't in the way.

In days of the Intel 5400 and ESX 3.5, my customers were almost always bound by memory capacity. If I had more memory in the system, I could put more virtual machines in the cluster. Also, many customers were "new" to virtualization and were getting their feet wet by virtualizing the "low hanging fruit". These virtual machines often didn't consume a lot of resources in all areas. Because memory was the smallest pool, it was typically exhausted first.

Fast forward to today: The memory limitation has been removed and we are now comfortable putting high I/O, critical boxes on the virtualization infrastructure. What is the next bottleneck I am seeing? It is now disk IOP's. You may have enough capacity, but you don't have the disk performance you need. Since IOPs really isn't included in vCenter and we haven't had to consider it until today, many people forget all about it!

How many of you actually calculate the impact of disk IOP's before adding a virtual machine? Do you know how to do it? Why do you need to do it?

To start, every disk drive in a SAN array generates a number of IOP's. This is how many reads or writes the disk can do per second. I will list the common drive types and an average of the IOP's below. To add up the total number of IOP's, you simply add the number of drives in the datastore and multiply by the IOP's average. Have you ever heard the term "You need more spindles for that"? What they mean is you may have enough disk to support the capacity you need, but you need to add more drives (spindles).

Let me demonstrate with an example. This is oversimplified but hopefully it will prove the point. You have four 1TB SATA drive in a raid group. This means the pool has 4TB of capacity and generates a total average of 300 IOPs (75x4). You now decide you want to create fifty server virtual machines with a 20GB virtual hard disk. You just consumed 1TB of the 4TB of space but are you really going to boot and run fifty virtual machines on four SATA disks? As one of my managers used to say to me, you won't get there from here.

Something to think about...

Average IOPs per drive
  • 7200 RPM SATA = 75-100
  • 10k RPM SAS/FC = 100-130
  • 15k RPM SAS/FC = 150-190


Anonymous said...

Using the approximate SAS disk IOPs numbers you posted, how is it possible to approximate the number of VMs I can have on a SAN with 6 300GB SAS disks in RAID 1??

Thank you, Tom

Karan Bhagat said...

You are right on!! Always size a SAN for Performance and not space.

It is also good to know that IOPS are directly tied to latency.

Eg. SATA 40 to 60 IOPS < 20ms
FC 15K drive of 200to 300 IOPS < 20ms

Casue one can say 100 IOPS but it might be at latency to 100ms.

PiroNet said...

Hi Aaron, I came up with the same conclusions, for the next couple or 3 years, the bottleneck will be at the storage level... Then the SSD disks will take over. They are already up to 30x faster than a regular spindle... If you're looking at simple formulas to calculate IOPS read these blog posts:


Aaron Delp said...

Karan - Thank you, that is great information!

Tom - I'm not an expert here by any stretch. I could be completely wrong so don't take this as fact. This is what I'm seeing and I'm over simplifying the calculations. As I understand for RAID-1, read performance and write performance are calculated differently. For read performance, IOPs are calculated as N*IOPS where N is the number of drives and IOPS is the number of IOPs the drive will generate. For write performance, IOPs are calculated as N*IOPS/2.

In your example you would take the 6 drives and multiply by the IOPs based on RPM (sorry, I didn't see what speed you 6 drives were). This gives you a total.

Once you have that total you need to calculate the number of IOPs each vm will generate on average for both reads and writes. You would then subtract the number of IOPs from the total number to determine if there is room in the pool.

Make sense?

Andrew Storrs said...

Your numbers are correct Aaron. For RAID5 the write calculation is N*IOPS/4. RAID6 is harder to calculate as the algorithm used can vary greatly from one vendor/controller to the next, but the write penalty should be somewhere between 3-20% greater than RAID5.

In your example of 4xSATA disks if it was RAID5 (I've seen this many times before in SMBs) the write performance would be no better than a single disk. :)

As Karen said, always size storage for performance first then space. Don't decide on SATA vs. SAS/Fiber-channel disks or what RAID level to use; figure out your IOPS first and work out the cost to meet those performance and capacity numbers using different combinations of disk/RAID.

Aaron Delp said...

Andrew - Thank you for your comment!

Unknown said...

With the latest Seagate SAS drives and WD Sata drives having random read/write numbers not that far apart what would the driving points be towards staying with the far more ( double ? ) expensive drives.

I don't think recent testing backs those numbers up. You're overspecing FC and underselling the SATA [look at WD Raptor drives which are within 20% of Seagates fastest SAS drives] ... at less than a third of the price.

Anonymous said...

curious arrpn where did you get you 75 from in the example ?

Andrew Storrs said...

80 (SATA), 120 (10K SCSI), 180 (15K SCSI) IOPS are pretty commonly used/safe examples.

Aaron Delp said...

Thanks for the comment Andrew! I admit I don't have a source I can point you to other than "experience". By that I mean I have run across those numbers from many different manufacturers (NetApp and EMC for instance) but I would have a hard time digging up a source because I have been throwing them around for some time now.

Computer memory and performance said...

As you pointed out, there will always be something that limits performance on an individual device or on a network. No mater how many improvements are made, there will always be something that takes priority because of need.

When one thing is enhanced, then the shortfalls in another part of the system become more obvious. It may also be that at times the level of innovation related to memory is higher than that for other aspects of digital devices.

However I think that what drives development is need. "Necessity is the mother of invention". This is true in IT as well as in other industries that more and more nowadays, rely on IT.

Unknown said...

What is an IOP?

Input/Output Per..... Input/Output/Processor?

From the context it's pretty clear you meant IOPS, but five years after writing, the typo still confuses.

Half a decade and continued linkage essentially turns your blog post into journalism. Journalism requires quite definition of otherwise obvious acronyms, so as to retain the reader and answer nagging questions that might cloud the reader's mind. You know, the "Bill DiMaggio (no relation...)" thing.

This is especially important when you changed the spelling to IOP's, which seems to infer that "P" did not refer to "Per" but rather a noun.

Please correct. Thank you.

Aaron Delp said...

I added my definition of IOPs as well as a link to the Wikipedia article for clarification.