Why do computers crash? What can you do about it? It’s very rare for your computer to physically break. Most of the routine glitches you experience happen at the software level, as different running programs compete for your computer’s finite memory resources. To understand and hopefully avoid crashes, it first helps to know a little something more about how memory works, and how it differs from storage. Imagine your computer as the Dunder Mifflin paper company. Think of memory as the office, and storage devices like the hard disk as the filing cabinets, storage closets and warehouse.
The techies use the word RAM, short for Random-Access Memory, one of the rare computer terms that has meaningful descriptive value. ‘Random-access’ means that the computer can read or write to any location in memory as fast as any other, on the order of a few nanoseconds per byte. Imagine that you have a bunch of papers spread out on your desk, one layer deep, so every paper is within arms length and can be read at a glance. There’s only so much space on your desk, however. You can store a lot more documents in your drawers and filing cabinets, and vastly more in the warehouse downstairs, but then it’ll take a lot longer to access them. Also, if you want to bring up more documents from storage, you’re probably going to have to move the papers on your desk out of the way to make room. A further complication is that memory is volative, needing to be powered to store its ones and zeros. Imagine that if you turn off the office lights, all the documents on your disk immediately cease to exist. Better make sure you have copies of everything you care about in storage.
In a perfect world, you’d have all the memory you wanted and all of your computer’s operations would be almost instaneous. Unfortunately, memory is expensive and thus limited. Disk storage is cheap and getting cheaper all the time, but to access data the disk has to spin to the right place and the read-write head has to maneuver into place. Hard disk read and write time is on the order of a few milliseconds per byte. This may not sound slow, but wait times add up quickly if large quantities of data are involved. Some newer digital devices like the skinniest little iPods use flash memory for their storage, sort of like RAM but non-volatile. Flash memory is way faster, smaller and more durable than magnetic and optical disks, but so far it’s also a lot more expensive.
We like to have many programs running at once, all competing for scarce memory resources. Imagine that each program or process is a different employee. Further imagine that nobody has their own desk. Instead, everyone shares a big round table in the middle of all the cubicles. If only one person is working, there’s ample space on the table for files, notes, phone messages and so on. But if a bunch of people are going to work at the same time, divvying up space on the table takes some crafty organizing. Think of the operating system as Michael Scott, the manager who allocates space on the table. It falls to each employee not to clutter up the table as they come and go, but accidents happen. Imagine that Jim leaves some files behind, and Pam accidentally picks them up. When Jim comes back to pick up where he left off, he can’t find the information he needs. Meanwhile, Oscar is waiting for some data from Jim, and he can’t do anything further until he gets it. Kelly and Stanley are waiting for a response from Oscar, and Angela, Dwight and Toby are in turn waiting for responses from Kelly and Stanley, and so the entire company grinds to a halt.
How do you fix these kinds of conflicts and conundrums? Most of the time, you can’t. The best way to deal with them is to reboot the computer. Recall that memory needs to be powered to retain its information, so when you turn off the computer, it gets blanked instantly. Rebooting works for all digital devices, not just desktop computers: printers, scanners, network hardware, mp3 players and cell phones, most of the time a quick power cycle is the cure for whatever is logically amiss. If a particular application or document consistently makes the computer fail, some of its component bits may have been scrambled on the hard disk. Imagine if Jim is following garbled instructions, so every time he tries to perform a particular task, he gets stuck. In these cases, you need to find the offending stored data and remove or replace it. This can take some detective work. Sometimes the scrambled data is part of the operating system itself. If this happens, everything your computer does is agonizingly slow, or programs spontaneously quit, or the machine won’t boot, or every single thing you do produces a string of error messages. This situation is not as dire as you might think. Operating systems are just software, big complex wads of software. You can reinstall the operating system without harming your applications or files from your original system disks. Imagine replacing Michael with a different manager, leaving all the other employees intact. You might want to ask your neighborhood geek for help, and you should definitely back up your files first.
A common cause of computer failure is when two processes get into a wait loop. Say Creed has to send a signal to Meredith and then wait until he gets a response. If Meredith is busy at that moment, Creed has to wait until she becomes un-busy. But so let’s say that Meredith, at that moment, is in the process of sending a message to Creed. He’s busy waiting for a response to his message and can’t answer, and Meredith is busy waiting for a response to her message and can’t answer, and around and around they go until everything gets rebooted. Wait loops are like the sidewalk dance. You go left while the other person goes right, so you stop and go right, so the other person goes left, so you stop and go left, so the other person goes right… Humans break these loops quickly, but computers aren’t that smart and will happily run around and around them forever if you let them. Fun fail fact: the 2003 Northeast blackout was caused by alarm systems in a wait loop.
A variation on the wait loop involves error messages. Imagine that Kelly sends Dwight a request, but in the wrong format, so he replies with an error message. Imagine that Kelly can’t understand the error message, so she replies with an error message of her own. Dwight can’t understand her error message, so he replies with an error message, and again, around and around they go.
Some errors happen when a program is asked to find an answer to a question that has no answer, or that has multiple answers, as in divide-by-zero errors. Sometimes a transistor will get poised exactly halfway between its one and zero values, a condition known as metastable failure. In the tiniest and most delicate components, a single cosmic ray impact can flip a one to a zero or vice versa. As computers get ever smaller, the random quantum jumping of electrons will probably cause ever more unpredictable failures.
Your computer has many layers of error-checking in place to make sure that most minor glitches go totally unnoticed. Still, no error-checking system is perfect. Software is written by humans and humans make mistakes. A single missing semicolon in millions of lines of code can cause all kinds of unpredictable behavior that’s fiendishly hard to detect. It doesn’t help that bug-checking is tedious, time-consuming and not very profitable. Microsoft achieved its present market dominance in part by discovering that if they released buggy products while the competition were still fine-tuning, their customers would willingly perform their bug-checking for them. Better code crashes less often, but there’s no predicting all electronic glitches or software interactions or malicious viruses and worms. The best strategy is to back up your files and try to keep a relaxed attitude when the crash inevitably arrives.

3 Comments
Cool. Good analogy – makes things very clear.
I’m guessing that semiconductor memory is the RAM you discuss earlier?
And, on my laptop screen the dashed arrow pointing to the hard disk wasn’t very obvious – I had to spend a bit of time deciphering the figure.
Semiconductor memory is indeed RAM. Thanks for the keen-eyed copyediting. As for the arrows, they were originally black, but then someone on Flickr urged me to make them pale grey for aesthetic reasons. I agree, it’s more attractive this way, but also harder to read if your screen isn’t at exactly the right angle. I guess a darker grey is called for.
Okay, fixed the semiconductor thing and made the pertinent arrow black.
Post a Comment