Monday, February 22, 2010

How Cisco UCS Deals with Split Brains

This will be a short post this morning.  I wanted to pass along how Cisco UCS deals with a split brain scenario.  I'll start by explaining how you would get into a split brain scenario.  In normal operations, one of the 6100's is the active brain and the other is the stand-by.  A split brain in UCS would happen when both of the cluster interconnects betweenthe 6100 Fabric Interconnects are severed (the L1 and L2 ports).  The active brain still thinks he is active and the stand-by no longer sees the active so he tries to take over.  You now have a potential power struggle because both brains think they are in charge.

Luckily the Cisco UCS folks are way ahead of this scenario.  They added logic to the Serial EPROM (SEEPROM) in the UCS chassis to resolve the situation.  The odd number of chassis that are added to a UCS Domain act as judges during split brains.  For example with four chassis, three are acting as judges.  A marker is added to the SEEPROM on these chassis to make them quorum resources.  To clarify this a little further bit more, if there is an odd number of chassis, all of them will be used.  If there is an even number of chassis, it will drop the last one (n-1) so the number of quorum chassis will always be odd.

When the split brain is detected, both 6100's will immediately demote themselves and then claim as many of the quorum resources as possible.  Whoever claims the most quorum chassis wins and promotes himself back to the active manager. The scenario would look something like the following.  Pretty slick!

3 comments:

Anonymous said...

Thanks for the nice post Aaron. I have a few questions though:

1) What exactly do you mean when you say that the odd numbered chassis are chorum chassis. Does that mean that the even number chassis are non-chorum chassis?

2) So what in case we have 10 chassis, & 5 are chorum while 5 are non-chorum chassis, how would be the contention solved.

Thanks again for the nice post.

Aaron Delp said...

Sorry for the confusion. I'll see if I can clarify the point. What I mean is if an even number of chassis, it will drop the last one as a quorum node. So, if you had 10 chassis, 9 would be quorum nodes. This way you will always have an odd number of quorum nodes.

Make sense?

Anonymous said...

Perfect, I see you also clarified it on the blog, makes perfect sense now :)

Thanks...