L
24

That one Tuesday our core switch died at 2 PM in Chicago

We had a full network outage for 4 hours because a single power supply failure cascaded into a total crash and our backup just sat there doing nothing, anyone else deal with a failover that just refuses to fail over?
2 comments

Log in to join the discussion

Log In
2 Comments
stellam89
stellam894d ago
Three years of uptime on that 3750 stack at our old data center, never had a single failover issue when we actually tested it quarterly. We ran those tests like clockwork every first Saturday of the month and the standby took over in under 30 seconds each time. Sounds like the real problem was never testing the failover under load with real traffic hitting it, not the hardware itself. You probably had some config mismatch or a routing protocol that didn't reconverge properly when the primary dropped offline.
6
jessel35
jessel354d ago
I remember reading somewhere that a lot of these failover issues come down to the convergence time of your routing protocol, not the switch hardware itself. Your point about testing under real traffic makes a lot of sense, because in my experience, even a minor config mismatch in HSRP or VRRP can cause a split-brain scenario when the primary drops. Don't take this as gospel, but I've seen some people swear by setting up a small traffic generator during their quarterly tests to catch those tricky issues.
9