Buffer Overflows & Fuzzing

Adapted, with permission, from the Data Security and Ethics lecture materials by Martin Lester (University of Reading).

Buffers

A buffer is a block of memory (or other storage) used for storing data temporarily.

Example 1

A calculator program allows the user to type sums.

As the user types, input is stored in a buffer.

When the user presses return, the program calculates the result.

A buffer is a block of memory (or other storage) used for storing data temporarily.

Example 2

A program counts the number of lines in a text file on disk.

It would be inefficient to load the whole file into memory...

...or to read it one byte at a time.

Instead, the program loads a block at a time into memory, then processes the file a block at a time.

A buffer is a block of memory (or other storage) used for storing data temporarily.

Example 3

A webserver logs names of pages accessed to disk.

The log message is a string that includes a timestamp and the name of the page accessed.

The program concatenates the timestamp string and page name string together in memory.

Then it prints the whole message with a single operating system call.

Buffer overflows

A buffer overflow, also called a buffer overrun, is when a program is meant to write to memory in a buffer, but instead writes to memory after the buffer.

Main cause: * Programmer allocated fixed size buffer and thought that would be OK.

Other causes: * Programmer incorrectly calculated length of variable size buffer. * Incorrect assumption about how much data a library call could write.

Exploiting a buffer overflow

What can happen if memory after a buffer is written to?

Nothing!
Program crashes.
Variable value changed.
Return address changed.
Arbitrary code execution.

Causing a buffer overflow

Overflowing a buffer is sometimes very simple:

Just send the program more data than it is expecting (for example, over network connection).

But sometimes overflows only occur in very specific circumstances:

Maybe the data includes a field specifying the length of another field. What if the field is longer than the claimed length?

Or could rely on several rare events occurring simultaneously.

Exploiting a buffer overflow

In order to exploit a buffer overflow, you usually need to do more than just trigger it.

Most likely outcomes are nothing or random crash.

An attacker want to run code of his choosing: Arbitrary Code Exection.

He often wants to run shellcode: a small program that allows the attacker to make a network connection and run any command.

How do you get shellcode in memory on the target?
How do you run shellcode on the target?

Sending shellcode

Maybe you just send shellcode into the buffer. But:

Buffer might be small.
Shellcode needs to be small to fit in buffer.
Optimise for size, or hand-code.
Program reading into buffer might be expecting text or some other format.
Some bytes of code (like 0) might be interpreted as string termination!
Use alternate encodings of some instructions, or hand-code.
Buffer might not always be at same address in memory.
Absolute jumps might not work.
Compile Position Independent Code (PIC).

Running shellcode

Now, how to make the shellcode run?

Carefully examine memory layout of compiled program.
Then write a value into the buffer that will jump to the right address.
Can vary between builds of a program.
Can vary between runs of a program!
Heap-spraying: Fill memory with copies of shellcode and "ramps".
Maximise probability that random address leads to shellcode.
Usually needs another way of getting shellcode into memory.
Example: JavaScript + buggy browser plugin.

Countermeasures

Don't write buggy code.
Hard! Can use testing, auditing...
Use language with automated memory management.
Fine for applications, not high-performance or systems code. Or language runtimes.
Address Space Layout Randomisation (ASLR).
Operating system/runtime pick different memory address every run.
Hardware protection.
Memory segmentation/paging and no-execute flag.
Malware detection and Web application firewalls.
Nozzle/Zozzle detect heap-spraying JavaScript.
Install updates regularly.
Make sure security patches applied.

Fuzz testing

Basic idea:

Run program with random input.
Does it crash?
- If yes, log the input for later analysis.
- If no, try again.

Can try billions of possible inputs!

May need to run the program in a special environment (for example, using Valgrind) to force a crash on memory management errors.

Problems with purely random fuzz testing

Many programs look something like this:

try {
    f = read_input_file();
    x = parse(f);
}
catch (ParseError e) {
    println("Error: malformed input.");
}
// do something interesting with x

The interesting behaviour with x rarely gets considered.

Many programs look something like this:

if (complex_test(x) && complex_test(y)) {
    // do something interesting with x and y
}
else {
    // do something boring with x and y
}

The interesting behavoiur with x and y rarely gets considered.

Improving fuzz testing: generation vs mutation

A generation-based fuzzer creates a new random input each time.

A mutation-based fuzzer uses genetic algorithms to create new inputs: * Flip random bits or insert random bytes. * Concatenate, split or merge existing inputs.

Mutation can be better at creating interesting test inputs.

But often relies on good initial corpus of test inputs.

Imagine fuzz-testing a program that parses HTML fragments.

Initial corpus: * Hello. * Yes.

Example mutations: * Hello.Hello. * HeYes. * <en>Yes. * s.

Improving fuzz testing: unstructured vs structured

An unstructured/dumb fuzzer does not know expected input format.

A structured fuzzer has a grammar or other way of recognising/generating syntactically valid inputs.

Advantages: * More likely to test interesting program paths... * ...especially if input format has checksums. Disadvantages: * Misses bugs in parser. * Requires accurate grammar to pass to fuzzer. * Slower rate of test case generation.

Improving fuzz testing: black/grey/white box

A black box fuzzer has no knowledge of program structure. This is rare in practice.

A grey box fuzzer uses instrumentation (via special compiler or virtual machine) to track trace of execution.

If a test case generates a new trace, focus on it in later mutations.

A white box fuzzer uses symbolic execution and/or constraint solvers to generate test cases that specifically target new execution paths.

Trade-off between number and quality of test cases.

Example: American Fuzzy Lop

American Fuzzy Lop (AFL) is a mutation-based, unstructured, grey box fuzzer.

It is not the most technically advanced fuzzer, yet it has been used to detect many bugs in real applications.

Lessons learnt:

Having sensible defaults and being easy to use is more important for adoption than flexibility.
Reliability and speed are also important.
In practice, volume of test cases is often more important than quality.

Ethical issues: buffer overflows

Buffer overflows raise mostly the same ethical issues as other software vulnerabilities:

As a developer, what obligations do you have to test for or minimise the likelihood of buffer overflows?
As a system administrator, what obligations do you have to minimise the likelihood of a buffer overflow being exploited?
As a penetration tester, if you discover a buffer overflow, when and how should you disclose it?
As a software developer, if someone informs you of a buffer overflow, what should you do and when?
Is it ethical to introduce buffer overflows in software projects?
Is it ethical to exploit buffer overflows?

Ethical issues: fuzz testing

Fuzz testing raises mostly the same ethical issues as other security tools with "offensive" uses. Is it ethical...

...to write a fuzz tester?
...to distribute a fuzz tester?
...to restrict distribution of fuzz testers by law?
...to use fuzz testers over the network to test someone else's system?