Crafting a exploit for a (very basic) browser

Date written: 2021.10.27

Let's start with a puzzle:

By changing up to three characters in the procedure called simple below, make the program print c is: 1.

// bo.c
#include <stdio.h>

void simple(char a) {
    char buffer[4];
    *buffer = a;
    *(buffer + 1) = a;
    *(buffer + 1) += 1;
}

void main() {
	char c = '1';
	simple(c);
	c = '2';
	printf("c is: %c\n", c);
}

At first this seems impossible: after simple(c); is called the line c = '2'; sets c to 2... so no matter how we change simple the program will always end up printing c is: 2, right?

No. It is absolutely possible to change three characters of bo.c to instead print out c is: 1. We can do this because programs do not run in a vacuum but on hardware, and it is sometimes possible to utilize this knowledge to make them do unexpected things.

This blog assumes that the program above was compiled to a 64-bit Linux executable using gcc version 9.3.0. Since different compilers on different operating systems with different hardware will produce different executables, the exact solution to the puzzle above depends on these factors. For example, my solution to this puzzle printed c is: 1 on Linux, but when the same code was compiled on 64-bit Windows with Clang, the executable printed c is: 2 instead.

Let's begin. Like any good programming language, C is a compiled language, so let's compile bo.c with debug flags.

$ gcc -g -o bo bo.c
$ ./bo 
c is: 2
$

For reference, here are the gcc details.

$ gcc --version
gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$

We'll use the gdb debugger to dive into the internals of how the program works. It provides basic information and is fairly simple to use.

$ gdb bo 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
...
(gdb)

Let's examine what the gcc compiler ultimately did to our code.

(gdb) disass main
Dump of assembler code for function main:
   0x00005555555551b8 <+0>:	endbr64 
   0x00005555555551bc <+4>:	push   %rbp
   0x00005555555551bd <+5>:	mov    %rsp,%rbp
   0x00005555555551c0 <+8>:	sub    $0x10,%rsp
   0x00005555555551c4 <+12>:	movb   $0x31,-0x1(%rbp)
   0x00005555555551c8 <+16>:	movsbl -0x1(%rbp),%eax
   0x00005555555551cc <+20>:	mov    %eax,%edi
   0x00005555555551ce <+22>:	callq  0x555555555169 <simple>
   0x00005555555551d3 <+27>:	movb   $0x32,-0x1(%rbp)
   0x00005555555551d7 <+31>:	movsbl -0x1(%rbp),%eax
   0x00005555555551db <+35>:	mov    %eax,%esi
   0x00005555555551dd <+37>:	lea    0xe20(%rip),%rdi        # 0x555555556004
   0x00005555555551e4 <+44>:	mov    $0x0,%eax
   0x00005555555551e9 <+49>:	callq  0x555555555070 <printf@plt>
   0x00005555555551ee <+54>:	nop
   0x00005555555551ef <+55>:	leaveq 
   0x00005555555551f0 <+56>:	retq   
End of assembler dump.
(gdb)

Here the debugger is showing us the x86 assembly instructions gcc turned our code into. We can see the program uses callq to jump to our simple procedure. Once the instruction pointer (IP) has jumped to the location of the simple function and executed the instructions there it jumps back to the instruction following callq. This is movb, and this and the subsequent instructions (movsbl, mov) copy the hex number 0x32 (the ASCII printable character 2) in a roundabout way to the esi register so it can be printed. Note that there was no reason why the compiler had to use three move instructions instead of one. It's just that gcc isn't perfect at optimization.

But how did the program know to jump back to movb once it was done with simple? It knows because the location of this instruction is stored on the stack. We can find this location with the debugger.

We'll run the program and stop it when inside simple. This is easy to do by setting a breakpoint.

(gdb) list
1	// bo.c
2	#include <stdio.h>
3	
4	void simple(char a) {
5	    char buffer[4];
6	    *buffer = a;
7	    *(buffer + 1) = a;
8	    *(buffer + 1) += 1;
9	}
10	
(gdb) break 8
Breakpoint 1 at 0x1197: file bo.c, line 8.
(gdb) run
Starting program: ~/Documents/bo 

Breakpoint 1, simple (a=49 '1') at bo.c:8
8	    *(buffer + 1) += 1;
(gdb)

The memory address of the movb instruction the IP jumps to after simple can be read off from the disassembly above (0x00005555555551d3). It will be located above the bottom of the stack.

(gdb) i r
rax            0x31                49
rbx            0x555555555200      93824992236032
rcx            0x555555555200      93824992236032
rdx            0x7fffffffe0d8      140737488347352
rsi            0x7fffffffe0c8      140737488347336
rdi            0x31                49
rbp            0x7fffffffdfb0      0x7fffffffdfb0
rsp            0x7fffffffdf90      0x7fffffffdf90
r8             0x0                 0
r9             0x7ffff7fe0d50      140737354009936
r10            0x0                 0
r11            0x0                 0
r12            0x555555555080      93824992235648
r13            0x7fffffffe0c0      140737488347328
r14            0x0                 0
r15            0x0                 0
rip            0x555555555197      0x555555555197 <simple+46>
eflags         0x246               [ PF ZF IF ]
cs             0x33                51
ss             0x2b                43
ds             0x0                 0
es             0x0                 0
fs             0x0                 0
gs             0x0                 0
(gdb)

The stack pointer (SP) holds the bottom of the stack, which is 0x7fffffffdf90. We can then use the debugger to directly examine the memory above it.

(gdb) x/8gx 0x7fffffffdf90
0x7fffffffdf90:	0x00007fffffffdfb6	0x000055315555524d
0x7fffffffdfa0:	0x00003131f7fb6fc8	0x51e4bd6119d63f00
0x7fffffffdfb0:	0x00007fffffffdfd0	0x00005555555551d3
0x7fffffffdfc0:	0x00007fffffffe0c0	0x3100000000000000
(gdb)

And there is the return address! Now comes the hacking part. Notice that 20 bytes before the return address are the bytes 0x3131... the values in the variable buffer which the simple procedure changes. So if we modify the C code *(buffer + 1) += 1 to *(buffer +20) += 1 we will now be modifying the return address.

But if we can modify the return address, then we can make the program skip the part where c gets set to '2'. Looking at the disassembly from earlier, the instruction movsbl occurs 4 bytes after the movb instruction (31 - 27 = 4). Therefore if we replace *(buffer + 1) += 1 with *(buffer +20) += 4, we should skip past the part where c gets sets to '2'.

// bo.c
#include <stdio.h>

void simple(char a) {
    char buffer[4];
    *buffer = a;
    *(buffer + 1) = a;
    *(buffer +20) += 4;
}

void main() {
	char c = '1';
	simple(c);
	c = '2';
	printf("c is: %c\n", c);
}

With this change, when we compile and run the modified program...

$ gcc -o bo-sol bo.c
$ ./bo-sol 
c is: 1
$

we get c is: 1 like we wanted! (And we only had to change 3 characters in our program to do so!)

I should note that I was somewhat lucky... if the last 8 bits of the return address held a value above 251 then adding four would have resulted in the incorrect return address, but the odds were in my favour.)

So how is all of this relevant to crafting a browser exploit? We just changed a return address in the stack by altering the source code, but with a buffer overflow it may be possible to change a return address in the stack by simply rewriting over it. Once you can modify what the return address is, you may be able to make the program execute instructions of your own choosing, whereby the fun begins.

I had to do all of the above and more for a project from a course I'm taking this semester (COMP3410), which tasked us with the following:

Develop an exploit for a RISC-V binary admitting a buffer overflow vulnerability.

The course provided us with a specific and very minimal browser to exploit. Although the binary was not comprised of x86 instructions, which meant that there was not as much available documentation on crafting an exploit from a buffer overflow vulnerability, the RISC-V ISA is actually comprehensible. (Recall that x86 is a gargantuan mess.) The project's course page did give us a massive hint by linking to an 1996 article from phrack magazine which described how to develop an exploit from a buffer overflow vulnerability on a x86 binary on linux.

Although the article was almost 25 years old, the C programming language is older, so much of the article held up. In fact, my puzzle at the beginning of this post was inspired by the article. Our course project was made significantly easier by the machine the RISC-V browser was running on not randomizing the location of the browser's stack. (I.e. it did not have ASLR turned on.) And in fact to make the assignment even easier, the browser was run on a simulator which assigned the same stack location to the browser each time. As such, my exploit was very small and had the following structure in bytes.


AAAAAAAAAAAAAAAA
BBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBB
BBBBBBBBBBBBBBBB
BBBCCCCDDDDDDDDD
DDDDDDDDDDDDDDDD
DDDDDDDDDDDDDDDD
DDDDDDDDEEEEEEEE
EEEEEEEEEEEEEEEE
EEEEEEEEEEEEEE

In case the assignment is used when the course is taught again I don't want to post my actual exploit, but these are what the components of what it did:

A: These bytes made the browser take a branch to a return address which could be changed with a buffer overflow.
B: Filler bytes to get to the return address.
C: The return address. I replaced it with the memory address of the word following it in the stack.
D: RISC-V instructions to print the string in the next component, and to replace specific characters in the next component with newline characters. Newlines had to be replaced otherwise the full payload would not have been written to the stack.
E: A string to demonstrate we exploited the program.

Overall I had a great time with the project. Everyone's heard of buffer overflow vulnerabilities, but actually turning a buffer overflow into an exploit required non-trivial knowledge. Knowing how to program in python or javascript is certainly not enough knowledge to be able to craft an exploit — doing so instead requires knowledge of how computers actually work. As I discovered, utilizing that knowledge to craft an exploit was very fun.