This is a guest post from our partners at LaunchDarkly.
We software engineers like to think ourselves unflappable. Consider that we spend most of our days staring at glowing pages of eldritch horror that no mortal was meant to witness. We whisper and type our otherworldly incantations, all the while feeling the hungry gaze of a lurking cross-site scripting bug, or a shadowy use-after-free, or an accidental summoning of ZALGO. (H̨eĚ Ěc͢om͢es, you know.)
But no. Truthfully, weâre far more fragile than that. Living our lives on a tightrope over an ocean of chaos (or âunspecified behaviourâ), weâre only one bad deploy away from a manic screaming fit, followed by a move to the countryside and banishment of any technology invented after 1947. So we consume horror novels by the truckload in an attempt to persuade ourselves that... well, things could be worse, you know? When you see that a senior engineer dresses all in black, listens to Sisters Of Mercy and Dimmu Borgir, and has a line of Melanie Tem novels above the OâReilly manuals, remember that she uses them to calm down. Because sheâs seen things.
As, likely, have you.
We know every developer has at least one horror story that still haunts them to this day. Likely, they have more than theyâd care to remember.
For All Hallowsâ Eve, we decided to share some of the most dreadful stories weâve come across over the years. We hope that some will be educational to the innocents in our industry. Maybe some will be entertaining to the more experienced among you. And if none of them quickens your pulse, then weâd like to hear your stories on Twitter (just tag @Rollbar and @LaunchDarkly with #errorhorrorstory so we can share around the virtual campfire).
Now, are you ready? Roll your Herman Miller chair up by the fireplace, pour a glass of Knockerâs port, and prepare to be chilled...
1. The âCrashing Every Single Computer in the Data Centerâ Error
When working on social network ads at Google (remember Myspace?), I wrote some C++ code that looked something like this:
for (int i = 0; i < user->interests->length(); i++) {
for (int j = 0; j < user->interests(i)->keywords.length(); j++) {
keywords->add(user->interests(i)->keywords(i))
}
}
Readers who are programmers probably see the mistake: The last argument should be j not i. My unit tests didnât catch the mistake, nor did my reviewer. After going through the launch process, my code was pushed late one nightâand promptly crashed all the computers in a data center.
- Ellen Spertus, Professor of Computer Science, Mills College
2. The âI Simply Forgot to Save All My Workâ Error
My simple answer might be the same as many who read this. The worst is forgetting to save. The only thing that can be worst [sic] is to save another file over another file which there is no backup. Now I press Ctrl + S so often it became a habit.
- Maynas Eric Chua, CEO and founder, R.BZ
3. The âBreak That Actually Broke Everythingâ Error
On January 15, 1990, AT&T's long-distance telephone switching system crashed. This was a strange, dire, huge event. Sixty thousand people lost their telephone service completely. During the nine long hours of frantic effort that it took to restore service, some seventy million telephone calls went uncompleted.
An obscure software fault in an aging switching system in New York was to lead to a chain reaction of legal and constitutional trouble all across the country. As it happened, the problem itselfâthe problem per seâtook this form. A piece of telco software had been written in C language, a standard language of the telco field. Within the C software was a long "do ... while" construct. The "do ... while" construct contained a "switch" statement. The "switch" statement contained an "if" clause. The "if" clause contained a "break." The "break" was SUPPOSED to "break" the "if clause." Instead, the "break" broke the "switch" statement.
- Bruce Sterling, Author of The Hacker Crackdown
4. The âI Was Experimenting and Made a Mistake I Couldnât Reverseâ Error
A long time ago, when it was still a fairly common and feasible practice to put an entire app's database on a few floppy disks, I made the mistake of fiddling with the .DBF files without first making a backup. Needless to say, I screwed something up and had to spend the rest of my weekend fixing the files using a C program I cobbled together to gather up all the old data into new tables. Luckily, I had enough information from reference materials on hand to be able to figure out the file format and where all the data was on the disks (this pre-Internet times). Still wasn't fun and my supervisor rightfully chewed me out for not taking proper precautions.
- Junilu Lacar, Agile Transformation Coach & Software Developer
5. The âItâs Taking Me Longer to Fix Than to Buildâ Error
Yoz here. Iâd like to note that I hope this is one thatâs in the past for many of you LaunchDarkly and Rollbar users...
Itâll only take me a few hours to implement the feature,â we sometimes say. But after finishing, we find that every few weeks, weâre either fixing a bug with the feature, explaining it to another engineer, or helping answer a question from customer support about how it works. The total investment of time to maintain the feature far exceeds the initial few hours of development.
When code is too complex, it becomes harder to ramp up, harder to reason about it, harder to fix bugs. Itâs difficult to untangle the dependencies and data flows to track down the source of errors. Engineers may actively avoid the most complex parts of the codebase, opting to work around it even if itâs the most logical place to make a certain change. Or they may avoid working in those areas all together, even if the work can be high-impact.
- Edmond Lau, Co-founder at Co Leadership, Author of the Effective Engineer
6. The âUser Input Is Only Input by the Userâ Error
A junior sent me his work for code review and was very proud about the exhaustive validation he put on the user input. When I asked him why he didn't validate the User-Agent header, he argued that it is not entered by the user and it cannot contain special characters or HTML anyway. The best way to convince him was to inject a nice JavaScript alert in the admin panel through the User-Agent header.
- Ilyes Kooli, Lead Software Engineer
7. The âDivision of Integers Isnât Adding Upâ Error
A couple of friends and I were at a hackathon last year, building this cool tool that could play the piano along with a user in real-time. Basically, it would play out notes on the piano which would complement any tune the user was attempting.
We were terribly stuck and we had no idea why. The tool was playing a diarrhea of notes all at the same time, and it was âunpleasantâ to put it politely.
It was about 3:30 AM, weâd been awake for about 18 hours, and we were exhausted. Like zombie level exhausted. No amount of coffee was keeping us fresh and awake anymore.
I slapped myself, sat in front of my screen and squinted long and hard. Mentally going through every line, every expression, every calculation. Weâd spent close to 3 hours in this pit, and I wanted to dig us out of it. (Perhaps âdigâ isnât a good verb, as that would only put one deeper into the pit. But yeah, you get the idea)
Iâm looking at this one snippet, and then it hits me. I wanted to stab myself. Repeatedly. I look up to the heavens in despair and motion my friends to come over. I very slowly demonstrate what the bug is, and their reactions are quite poetic. One walks away in disgust and stares at the wall for a good 5 minutes, while the other hurled an expletive and went to bed. I fix the error, and everything from there worked like a charm.
You see, Python 2.x does this funky thing where an expression like â1/2â evaluates to 0, and not 0.5 as humans would expect. Integer division, for those who know their jargon.
1. >>> x = 1/2 # what we were doing
2. >>> y = 1/2.0 # what we should have done
3. >>> x,y
4. (0, 0.5)
All along, the value for the time delay between playing consecutive notes was being calculated this way resulting in a delay of zero seconds. In other words, we were essentially telling the tool to go batshit crazy and spit out every note at the exact same instant, thus explaining the aforementioned diarrhea.
Three hours, my friends. Three hours. Thatâs how long it took to figure this one out. I know that I come off very heroic in this tale I tell, but perhaps that image of me will be altered when you learn that the bug was introduced by yours truly.
- Chittaranjan Velambur, Senior Software Engineer at Nasdaq
8. The âI Know What Iâm Doing is Completely Backwardsâ Error
Yesterday, I had a smugness related disaster.
Iâd very confidently checked in a new feature, declaring to the world around me how confident I was it was working.
âIâve got loads of unit testsâ, I said.
âYouâll hardly have to manual test it at allâ, I said.
Failed the first manual test when a checkbox worked exactly backwards.
Turned out I had got those unit tests exactly backwards, then wrote code to make them pass.
DâOh!
So the repeating fault here would be my hubristic tendency to feel a bit smug from time to time.
- Alan Mellor, Senior Software Engineer at BJSS
9. The âError That Should Not Have WorkedâBut It Didâ Error
So⌠Many moons ago, I ended up writing a fairly short Perl program - maybe 150 lines. Its purpose was to automate several functions of our backup system that the software package didnât provide in a reasonable manner.
So it goes into production, and runs every morning at 8:03AM. Works perfectly for some 6â7 years. And then we have a procedure change, and I have to take a line of code that said âif A is equal to B1, B2, or B3, do thisâ, and change it to âif A is equal to B1, B2, B3, or B4, do thisâ.
And while Iâm there, I notice that another if statement a few lines down is backwards. And itâs the most crucial if statement in the whole 150 lines, and the code canât possibly work - the program would obviously crash in a very specific way.
So I carefully quit out of the editor, and check⌠Yes, the datestamp on the production code is over 6 years back, and itâs run correctly at 8:03AM every morning for 6 years. And the if statement is backwards.
I get 4 co-workers to look at it individually, and they all agree the if statement is backwards, and it should be crashing in a very specific way, and nobody understands how this worked for 6 years.
The next morning, I get an email from our monitoring software at 8:04AM saying that the program crashed. And sure enough, the datestamp is 6 years back, the if statement is backwards, and it crashed exactly the way youâd expect it to crash if it was backwards.
I went in, reversed the if statement to be correct, added B4 to the other if statement, and the program continued to work properly until we decommissioned that backup system 5 years later.
Iâm still mystified how that line of code worked until somebody looked at it.
- Valdis Kletnieks, Former Computer System Senior Engineer
10. The âFear of Making a Mistake and Being Hunted Downâ Error
To end one of the best programming adages I've received...
Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
â John Woods, Programmer and former UK based computer game producer
So, did they chill you to your very core? Did you feel your soul leave your body in fright? What, not even when you read the Perl one? (Have you seen Perl?) Oh, youâve experienced worse? Really? You poor soul. Please, we need to hear about it.
Post it on Twitter and tag it with #errorhorrorstory and tag us @Rollbar and @LaunchDarkly.
Remember: Rollbar makes it much easier to catch them when they start rampaging, and LaunchDarkly can speed their return to the nether dimensions with the flick of a switch. Join us for a live chat on November 12. RSVP now.