The BigBoxoCo Disco Party: Why Segmentation is Good

As the fresh­ly brewed cof­fee enters my mouth, I expe­ri­ence my first glimpse of con­scious­ness for the day. “Where am I?” I mut­ter, in bro­ken English. The gray walls around me slow­ly come into focus, lit by the flick­er­ing of a long-in-the-tooth flu­o­res­cent bulb. The top half of a man’s face appears over the top of my cubi­cle wall.

How’s the won­der­ful world of iNetConjoinApp?”

The caf­feine must have made it past my blood-brain bar­ri­er, as I rec­og­nize at once that I’m at EnergyModCo, where I am one of a hand­ful of employ­ees. The half-head belongs to Freyr, EnergyModCo’s COO, lead cus­tomer ser­vice rep, and deploy­ment technician.

Umm, it’s, well, I just start­ed work­ing on the –”

Great, that sounds good. You remem­ber BigBoxoCo?”

You mean, as in our biggest cust–”

There’s a prob­lem at one of their ware­hous­es. Something to do with our light­ing controller.”

Oh?”

I just got off the phone with the ware­house man­ag­er. All the lights went out for a few min­utes, but they’re back on now.”

Uh, that’s bad. Thank good­ness they have skylights.”

Nope. This is their first two-sto­ry ware­house. The only light the first-floor cus­tomers had was from the emer­gency floodlights.”

My throat tight­ens. “Well, I’m on it. We can’t let that hap­pen again.”

The weight of the sit­u­a­tion slams into me like an over-packed palette of giant may­on­naise jars. After being awake for only 25 sec­onds, I’m not ready to douse this kind of blaze. I don’t have a choice though, so I lean back in my chair and gaze at the craque­lure on the ceil­ing tiles. How could this have hap­pened? I recall that at one time we did have prob­lems with the smart-break­ers that switched the lights. They would some­times mys­te­ri­ous­ly ignore the com­mands sent to them by EnergyModCo’s soft­ware. But I fixed that by adding a watch­dog that would retry the switch com­mands if they did not take effect. After a brief pal­pi­ta­tion sub­sides, I admit to myself that the light­ing con­trol watch­dog must con­tain a nasty bug.

After some brief email dig­ging, face palm­ing, and silent curs­ing, I man­age to get a VPN con­nec­tion set up so that I can SSH into EnergyModCo’s on-site light­ing con­troller. I bump up the log­ging ver­bosi­ty, which requires that I restart the sys­tem, and start look­ing for clues. After a few min­utes, I see the periscope that is Freyr’s fore­head rise above my cubi­cle wall.

The lights are off again! I’ve got the BigBoxoCo man­ag­er on hold, and he’s about to lose it!”

My lip quiv­ers as I strug­gle to sup­press my fight-or-flight instinct. It can’t be a coin­ci­dence that the lights went off right when I restart­ed the soft­ware. What have I done? In a pan­ic, I force our soft­ware to turn the lights back on, and thank­ful­ly it works. At this point, I am par­a­lyzed with fear. I want to dis­able our soft­ware entire­ly, but what if stop­ping it is what made the lights go off just now? I pull my hands away from the key­board, fear­ing that any­thing I do might cause the BigBoxoCo man­ag­er to enter sud­den car­diac arrest, or worse.

Without being able to touch the on-site soft­ware, I dive into the source code, hop­ing to track down the bug ana­lyt­i­cal­ly. I pore through the entire stack, fol­low­ing the data flow and log­ic for the rel­a­tive­ly sim­ple light­ing con­trol sub­sys­tem. The sched­ul­ing code makes sense. So does the the timer code. The trick­i­est code, for the watch­dog sys­tem, looks entire­ly cor­rect. I rack my brain; what am I miss­ing? I am star­tled by the crack of thun­der, but there does­n’t seem to be a storm out­side. As my nerves res­onate with the imag­ined sound, Freyr’s fore­head crests my cubi­cle wall.

The BigBoxoCo man­ag­er is flip­ping out. He says, and I quote, that ‘There’s a God damned dis­co par­ty going on’ in his ware­house. They are going to have to stop accept­ing customers.”

Content with hav­ing sur­passed even my worst expec­ta­tions, Freyr jogs back to his office. I fol­low him briskly.

Freyr, can’t the man­ag­er hit the man­u­al light­ing over­ride? I think it might take me a while to fig­ure out the problem.”

What are you doing away from your desk? No! Only BigBoxoCo’s main­te­nance engi­neer has a key to the enclo­sure, and he’s AWOL. Go!”

I save three sec­onds by run­ning back to my desk. At this point, I’m bounc­ing ideas off our oth­er pro­gram­mer, Nate. No good; he has nev­er worked on this sys­tem, and can only offer lim­it­ed advice. I go back to star­ing at the code. I may have been uncon­scious an hour ago, but now the fire of my mind is burn­ing with the focused inten­si­ty of a TIG welder. The cof­fee is gone. I begin ques­tion­ing all of my assump­tions. Compiler bug? Memory cor­rup­tion? Cosmic rays? Nate com­plains about the sound of my fore­head slam­ming against the desk. In my height­ened state of aware­ness, I per­ceive the ghost­ly sound of foot­steps come to a stop out­side my cube. Moments pass before the shrunk­en form of Freyr emerges from the hall­way. I am calmed by his lack of speed as well as the fact that he is not using his periscope.

I’m sor­ry,” he says.

Uh, hey Freyr, what’s up…?”

I fixed the lights. It was my fault.”

Until this point, it had not crossed my mind that the prob­lems may have been caused by the light­ing con­troller being con­fig­ured incor­rect­ly. “Wha — what the hell happened?”

I had con­fig­ured the light­ing con­troller at a dif­fer­ent site with the IP address of the smart-break­er at the dis­co ware­house. The oth­er site was in a dif­fer­ent time­zone, and its sched­ule said that the lights should be off.”

The prob­lem was too sim­ple. Why did­n’t I think of this? One con­troller thought the lights should be on, and the oth­er thought they should be off. Thus, the light­ing watch­dogs at each site were fight­ing over con­trol of the lights. Neither con­troller knew about the oth­er one; they just thought that the smart-break­ers were dis­obey­ing them and retried their com­mands. Over and over. I sub­due my first instinct to tack­le Freyr on the spot, and mur­mur, “Okay. Thanks for let­ting me know.”

I sit still for a few min­utes, allow­ing the tur­bu­lence of my rage to sub­side. My ini­tial response is to be angry at Freyr for wast­ing my time and ter­ri­fy­ing me. However, as I calm down and regain clar­i­ty, I real­ize that he did noth­ing wrong. Who has­n’t mistyped an IP address before? I know I cer­tain­ly have, many times. Freyr made a sim­ple and under­stand­able mis­take. The prob­lem was that the BigBoxoCo net­work was set up in such a way as to allow a sim­ple mis­take to wreak utter chaos.


As it turned out, BigBoxoCo had all of their hun­dreds of ware­hous­es on the same vir­tu­al net­work. Not only could BigBoxoCo’s cor­po­rate head­quar­ters reach machines at every sin­gle ware­house, but so could any indi­vid­ual ware­house. A PC at a BigBoxoCo in New York could ping a PC at a BigBoxoCo in Oregon with no prob­lem. Even ignor­ing the secu­ri­ty reper­cus­sions of such a set­up, there are good rea­sons to avoid it. If the net­work was set up with a star topol­o­gy, with only the cor­po­rate head­quar­ters hav­ing access to every sin­gle ware­house, the dis­co par­ty fias­co could have been eas­i­ly avoid­ed. In oth­er words, seg­men­ta­tion is good.

Comments are disabled for this post