Monday, March 15, 2010

Cisco UCS QoS vs. HP Flex-10 vNICs in VMware

This post will be more conceptual than technical.  I recently was asked how Cisco's UCS &  HP's Flex-10 network design approaches affect vSphere designs.  Even though the industry is moving towards a unified 10GB fabric, there are different ways to move data through this big "pipe" and still ensure/prioritize delivery.  As you would guess, Cisco and HP approach this problem very differently.  Cisco takes a network centric approach to the problem and HP takes a server centric approach to the problem.

HP's Flex-10

HP Flex-10 takes a 10GB connection and carves it up into multiple virtual NICs.  The size of the "pipes" can be turned up and down to match the amount of bandwidth needed for the NIC.  Think of it as placing smaller pipes in the big 10GB pipe.  This approach is great for vSphere admins because the virtual switches in vSphere can be configured to look just like they did with a bunch of 1GB links into the server.  The transition to this technology is seamless for the vSphere administrator.  I'll borrow a diagram from Barry's awesome article on Flex-10.  If you haven't read it, please do!


What is the down side to this method?
The down side to this approach is by placing multiple pipes within the larger pipes, you have now placed a CEILING on how much data can pass through that particular pipe.  Let's say you present a 1GB vNIC to vMotion and during a vMotion it would be to your advantage to have access to more bandwidth.  Too bad, 1GB is all you will ever get.

 Cisco UCS's QoS

Cisco UCS uses a method known as Quality of Service (QoS).  Most of us "server guys (and gals)" have no idea what this is.  Here is how I have come to understand it.  If this is wrong, please correct me.  Network traffic is given a priority and this priority kicks in WHEN THERE IS CONTENTION on the network.  So, instead of smaller pipes inside a large pipe, you have more of a priority system in place to guarantee certain levels of service.  Think of this as a FLOOR model.  You can have as much as you want as long as everyone else gets their minimums (they get their quality/guarantee of service).  If something needs to spike and there is room, it can spike and then return to normal.  Here is a diagram of our Cisco UCS with traditional switches.  This isn't 1000v but you get the idea.  As you can see, two big 10GB pipes into the virtual switches instead of smaller pipes into multiple virtual switches.


As the vSphere administrator, this looks very different from my old multiple 1GB links into my multiple virtual switches!

What is the down side to this method?

At this time, QoS for Cisco UCS appears complex to configure and represents a shift in thinking for the vSphere administrator. 

How is the QoS implemented for Cisco UCS and VMware?

That is a very good question.  I can't seem to find any documentation on how to actually do this yet.  I'm sure there is a Cisco internal doc somewhere but I haven't found anything public that lays out the hardware that is needed (do I need 1000v or Palo for this, can I use a CNA and the standard switches?) nor have I found a "cook book" that documents how to properly make QoS happen in a vSphere environment.  I'm sure this will happen in time and if you have a link, please leave a comment!

Which is better?

It depends on your point of view and the comfort level of your team.  I can easily see advantages to both approaches.  One is easier to implement, the other appears to be a more elegant (but complex) solution.  Cisco has once again brought a disruptive technology to the table that can't be ignored.  What are your thoughts?

9 comments:

Steve Chambers said...

Another great post, Aaron. Think of Cisco's approach as being exactly the same as VMware ESX: you have minimum guarantees now for network, just like for CPU and RAM. The rate limiting available on the VIC/Palo will be like the Limit feature.

Easy!

Mark Vaughn said...

In vCenter, you can set Traffic Shaping policies at the vSwitch and the vNetwork Distributed Switch (sure the 1000V does the same). You can set Ingress and Egress Traffic Shaping to control Avg Bandwidth, Peak Bandwidth and Burst Size. You can also override the vSwitch/vDS traffic shaping settings at the port group level for more fine grained control.

http://www.vmware.com/pdf/vsphere4/r40_u1/vsp_40_u1_esx_server_config.pdf

This is not quite the same as most QoS solutions, in that it does not guaruntee minimums/reservations, but it is a start in the right direction when you have one large pipe.

Aaron Delp said...

Steve - Thanks for the quick reply. It sounds easy but I guess I'm not seeing where/how to get it done. I admit I maybe haven't dug enough into everything. Let's talk more when you can and we'll figure out how to get it done.

Mark - Nice, thanks for the link! I'll take a look!.

Vijay Swami said...

In a Cisco UCS/vSphere deployment I can think of 3 methods of "QoS".

1) You have the QoS capability of the CNAs themselves. This basically controls the shape of the DCE/FCoE traffic. For example, the percentage of the 10GB link dedicated to FC versus Ethernet traffic. You can apply different priority levels (Gold, Bronze, Silver, etc) to a vNIC or vHBA via service profiles and those will be enforced inside that particular chassis.

1a) QoS with Palo. I call this 1a because its related to 1 above. Similar DCE style COS (as defined by the standards) are possibly here, but instead of only being able to do it to a pair of vNICs or vHBAs you can apply them to ALL vNICs and all vHBAs. Again set at the UCS level.

2) vDS/vSwitch, which would be vSphere style QoS which is more rate limiting than anything else. This would apply to vNIC traffic ONLY. You cannot control the FC/Ethernet split here, that must be done at the UCS level.

3) Nexus 1000v in lieu of the vSphere vDS. In this method, you can apply traditional "Cisco style" polocy maps for QoS. This is something network admins will be VERY familiar with.

So I think a full-end-to-end QoS solution would require both UCS QoS settings AND vDS or 1000v QoS settings. The UCS QoS settings would control the DCE lanes (split between vNIC and vHBA weights for example), and the pure Ethernet/IP QoS can happen at the vDS and 1000v level.

A consolidated document describing all this would be very useful. The above is just what I have deduced from reading various sources and configuring with my own UCS/vSphere lab.

Aaron Delp said...

Vijay - Great information! Thank you!

Unknown said...

Aaron, another thing to keep in mind is that with Palo you can have the same type of environment with multiple NICs, AND apply the QOS and rate limiting. It's the best of both worlds.

Vikash Kumar Roy said...

Do you really need to dedicate uplink for vmotion ? I have used internal link for vmotion so that traffic will be confined to Chassis itself rather than leaving it

Anonymous said...

Cisco's approach sounds good but QOS is not new and HP has sold over 3Million Flex Fabrics in around 14 months. I think disruption from Cisco in this case is a marketing idea. HP's solution is elegant and works for Ethernet, ISCSI, FCOE.
Oh and Microsoft, VM, Zen, ........
:)

Aaron Delp said...

Anon - Sorry but you're drinking WAY to much of the HP Kool-Aid. Yes, Flex-10/Fabric has a large install base but the only thing it has going for it is simplicity. The idea of HP Flex-10 being elegant is a matter of your opinion, not mine.

With the introduction of vSphere 4.1 vMotion (and other forms of traffic but vMotion is the best example) can now operate at speeds greater than 2.5GBps. If you are using Flex-10 and split it up into 4 nics of 2.5 each, you will never be able to go faster. On the other hand your Cisco UCS system can burst as high as need (i.e. faster vMotions) while still keeping everything running. That is elegant.