Jonathan David Page talks about whatever he happens to be thinking about. Sometimes other people join in.
A collection of cool people and projects.
At this point, a lot of you have probably heard about the Heartbleed OpenSSL bug, which has come into the public eye to the point where the BBC reported on it. The implications have been fairly well-covered in the media: disclosure of passwords, disclosure of private keys, etc. In other words, Bad Stuff. This article aims to cover the technical details of the bug in a manner suitable to non-technical people. It may get dry. However, no previous knowledge of computers should be required.
The Heartbleed bug is probably the only software bug that I’ve met which actually scared me. SQL injection, mentioned in the BBC article, was old hat by the time I came along. People still write code that’s vulnerable to it, but almost all of the SQL-related documentation you come across these days explains what it is and how to avoid it.
This bug—officially named CVE-2014-0160 in the Common Vulnerabilities and Exposures database—allows an attacker to read chunks of an affected computer’s memory without leaving a trace. This includes sensitive data that may be stored in memory, such as passwords. It does so using a trick called a buffer overrun. I’m going to break down both the bug and how it could have been mitigated. I’ll start from the high level and work down towards the metal.
The Internet has four layers. The lowest is the link layer, which deals with bits of metal and antennas and wires and getting actual bits and bytes from one computer connected directly to another, with nothing fancy like making sure that they all arrived, or they arrived in the right order, or where they should go next. If you want an analogy, the link layer is like the part of the mail system which deals with moving pieces of paper from one location to another, without regard to what’s actually written on them.
The next lowest is the Internet layer, which deals with shipping those bits and bytes to their intended destination via a series of hops over the link layer. It does this by wrapping up the information to be sent with some extra information about where it’s supposed to go. In the mail system analogy, this is like putting a letter in an envelope with an address on it so that it can be sorted.
The transport layer deals with ensuring that the data arrived (if you care about such things), that it arrived in the right order (if you care about such things), that messages addressed to the same place but different things end up in the right hands, that no one part of the network gets swamped by too much data, and other niceties. This is where the layer where the Heartbleed bug lives. In the mail analogy, the transport layer is like the guy who decides when to put stuff on the plane, and also whoever picks up the mail in your house, and gives the various messages to the intended people (this may be you).
The application layer deals with the formatting of the actual data that’s being sent. In the mail analogy, it corresponds to me writing you a letter in English, or French, or Esperanto, and then you reading it (or not reading it, if it’s written in Esperanto). This is where the fun stuff like the World Wide Web, email, instant messaging, cat pictures, and multiplayer games live.
At the transport layer, by default, data is sent in the clear, without any kind of encryption. This means that anybody who can intercept the message can read it. It’s kind of like a postcard—there’s nothing to prevent the mail delivery person from making a copy of it for themselves, but we trust them not to, and even if they do it isn’t a huge deal because nobody (I hope) puts sensitive information on postcards.
While acceptable for cat pictures and the works of Shakespeare, for sending sensitive data like passwords, credit card numbers, blackmail material, or your Hunger Games self-insert fanfic, something less transparent is required. This is where TLS, or Transport Layer Security, comes in. TLS is a pretty wizard system which uses fancy mathematics to prevent people from looking at your stuff. (The maths itself is pretty cool, but also irrelevant to this article, so we’ll save that for another time.) It’s like an envelope which can only be opened by the intended recipient. It can also tell you if the messages was sent by the person you think sent it (using something which is unsurprisingly called a signature, and involves more fancy mathematics). I’ll give you a brief, math-free overview of how this shakes down some other time, but for now that’s all you need to know. TLS is magic safety envelopes which you can send your credit card information in without worrying about the mail delivery person (or any other third parties, for that matter) reading it.
There’s an older protocol called SSL, the Secure Socket Layer, and the names are sometimes used interchangeably. A socket is how you talk to the transport layer, so the names pretty much mean the same thing anyway. The SSL protocol itself is vulnerable to certain attacks, regardless of how bug-free the implementation is. As far as we know, the latest version of TLS, TLS 1.2, is not vulnerable to any attacks if implemented correctly. The Heartbleed bug exploits TLS 1.2 being implemented wrong.
Okay, not TLS 1.2 specifically. It actually relies on an extension to TLS, called the Heartbeat Extension, being implemented wrong. The purpose of the extension is to keep TLS sessions alive. If data isn’t being sent across them, they can die, which is irritating because the fancy handshakes involved in setting up a TLS session are actually rather time-consuming. So what the Heartbeat extension does is provide a standard way for one end of the connection to say “are you still there?” and the other end to reply “yeah, I’m still here.” By sending such messages back and forth every so often, the session can be kept alive. Here’s the important part: the end sending the original message gets to include a little chunk of data, which the other end is supposed to return verbatim to ensure that they really did get the message.
To explain how the bug works, I need to explain some stuff about how computers work. You’ve probably heard that computers see everything as ones and zeroes, which produces some nice visuals on NCIS but doesn’t really give you a good mental model of what’s going on.
We’re going to forget about ones and zeroes for a minute. The important thing to know is that, when a computer is working with data—not just storing it, but performing calculations—it keeps the data in RAM, which we usually refer to as “memory”, presumably not for short. Memory is divided up into bytes. You can think of RAM as a big sheet of paper covered in little boxes. Those little boxes are the bytes. In each box, you can write a number. That number must in the range of zero through two hundred and fifty-five. If you want a bigger number, you have to use multiple boxes. This is how all data is represented on a computer: text, music, cat photos, everything.
The first concern is “how does the computer know what’s what?” The answer is, in general, that it doesn’t. Computers are pretty stupid, it turns out, and they basically operate on a principle of “go to such and such boxes and do such a such a thing to them” (like adding one to each box, or erasing each of them and filling them in with zeroes). In lower-level languages like C, it is the responsibility of the programmer to remember what’s what, and if they get it wrong, Bad Things can happen. In higher level languages like Java and Python, there are some extra numbers stored with the other numbers saying what kind of data is here and how much of it there is. So if you wanted to store the word “hello”, you could store numbers saying that it’s, say, Type #3 (we’re going to say that “text” is type #3), and that there are five letters. Then the language runs some extra code to look at that information and make sure that what you think is text really is text, and not a number or a cat picture, and that it only has five letters, not twenty or four thousand.
You are supposed to do this kind of bookkeeping in C as well, but the language won’t do it for you. You have to do it all by yourself. If you think that this sounds incredibly tedious and error-prone, you are correct. Have a cookie. Why would programmers inflict this torture on themselves? Some combination of masochism, machismo, desire for speed (C is faster than Java or Python), and the fact that for better or worse, C is the lingua franca of the computing world. Which is why OpenSSL, the software with the bug in it, is written in C.
So here’s how the Heartbleed bug works. Based on the OpenSSL patch fixing the exploit The attacker sends a heartbeat message, with some data to return. They’re also supposed to say how much data they sent. The attack happens by sending some data, and then claiming that you sent more data than you did. In this case, OpenSSL should look at how much data was sent, realized that the numbers don’t match, and ignore the message. What actually happened is that OpenSSL would look at the number, say “sure that’s good”, find that much space that it wasn’t using for anything else, and start copying boxes from the message it received into the message it was sending. The problem is that because the received message isn’t as long as it says it is, it copies all of the boxes from it, and then starts on whichever boxes happened to come next. These boxes are being used by other parts of the program to store various things, such as passwords, secret keys, and other tasty data. So it copies all of that into the reply and sends it back. Hence the name—the exploit uses the heartbeat to bleed out information.
There are some finer details on how the problem could have been mitigated, but explaining them would require a crash-course in basic operating systems theory, and this article is long and confusing enough already. The whole “secret key” thing, and why it’s so bad that they were leaked, is also another story. (I mean, you can tell that it’s bad that they escaped just by looking at the name—there is no way that a thing called a “secret key” protecting your data becoming not-secret could be a good thing. There are some details though about why, exactly, the keys need to be so secret.) I might publish info on those over the next couple days. Or I might go back to radio silence, since I have an exam and a paper due before this weekend. We’ll see. Until then, ladies and gentlemen.