Crafting a exploit for a (very basic) browser
Date written: 2021.10.27
Let's start with a puzzle:
// bo.c #include <stdio.h> void simple(char a) { char buffer[4]; *buffer = a; *(buffer + 1) = a; *(buffer + 1) += 1; } void main() { char c = '1'; simple(c); c = '2'; printf("c is: %c\n", c); }
At first this seems impossible: after simple(c);
is called the line c = '2';
sets c
to 2... so no matter how we change simple
the program will always end up printing c is: 2
, right?
No. It is absolutely possible to change three characters of bo.c
to instead print out c is: 1
. We can do this because programs do not run in a vacuum but on hardware,
and it is sometimes possible to utilize this knowledge to make them do unexpected things.
This blog assumes that the program above was compiled to a 64-bit Linux executable using gcc version 9.3.0. Since different compilers on different operating systems with different hardware will produce different executables, the exact solution to the puzzle above depends on these factors. For example, my solution to this puzzle printed c is: 1
on Linux, but when the same code was compiled on 64-bit Windows with Clang, the executable printed c is: 2
instead.
Let's begin. Like any good programming language, C is a compiled language, so let's compile bo.c
with debug flags.
$ gcc -g -o bo bo.c $ ./bo c is: 2 $
For reference, here are the gcc details.
$ gcc --version gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Copyright (C) 2019 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $
We'll use the gdb
debugger to dive into the internals of how the program works. It provides basic information and is fairly simple to use.
$ gdb bo GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2 Copyright (C) 2020 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> ... (gdb)
Let's examine what the gcc
compiler ultimately did to our code.
(gdb) disass main Dump of assembler code for function main: 0x00005555555551b8 <+0>: endbr64 0x00005555555551bc <+4>: push %rbp 0x00005555555551bd <+5>: mov %rsp,%rbp 0x00005555555551c0 <+8>: sub $0x10,%rsp 0x00005555555551c4 <+12>: movb $0x31,-0x1(%rbp) 0x00005555555551c8 <+16>: movsbl -0x1(%rbp),%eax 0x00005555555551cc <+20>: mov %eax,%edi 0x00005555555551ce <+22>: callq 0x555555555169 <simple> 0x00005555555551d3 <+27>: movb $0x32,-0x1(%rbp) 0x00005555555551d7 <+31>: movsbl -0x1(%rbp),%eax 0x00005555555551db <+35>: mov %eax,%esi 0x00005555555551dd <+37>: lea 0xe20(%rip),%rdi # 0x555555556004 0x00005555555551e4 <+44>: mov $0x0,%eax 0x00005555555551e9 <+49>: callq 0x555555555070 <printf@plt> 0x00005555555551ee <+54>: nop 0x00005555555551ef <+55>: leaveq 0x00005555555551f0 <+56>: retq End of assembler dump. (gdb)
Here the debugger is showing us the x86 assembly instructions gcc
turned our code into. We can see the program uses callq
to jump to our simple
procedure. Once the instruction pointer (IP) has jumped to the location of the
simple
function and executed the instructions there it jumps back to the instruction following callq
. This is movb
, and this and the subsequent instructions (movsbl
, mov
) copy the hex number 0x32 (the ASCII printable character 2
) in a roundabout way to the esi
register so it can be printed. Note that there was no reason why the compiler had to use three move instructions instead of one. It's just that gcc
isn't perfect at optimization.
But how did the program know to jump back to movb
once it was done with simple
? It knows because the location of this instruction is stored on the stack. We can find this location with the debugger.
We'll run the program and stop it when inside simple
. This is easy to do by setting a breakpoint.
(gdb) list 1 // bo.c 2 #include <stdio.h> 3 4 void simple(char a) { 5 char buffer[4]; 6 *buffer = a; 7 *(buffer + 1) = a; 8 *(buffer + 1) += 1; 9 } 10 (gdb) break 8 Breakpoint 1 at 0x1197: file bo.c, line 8. (gdb) run Starting program: ~/Documents/bo Breakpoint 1, simple (a=49 '1') at bo.c:8 8 *(buffer + 1) += 1; (gdb)
The memory address of the movb
instruction the IP jumps to after simple
can be read off from the disassembly above (0x00005555555551d3). It will be located above the bottom of the stack.
(gdb) i r
rax 0x31 49
rbx 0x555555555200 93824992236032
rcx 0x555555555200 93824992236032
rdx 0x7fffffffe0d8 140737488347352
rsi 0x7fffffffe0c8 140737488347336
rdi 0x31 49
rbp 0x7fffffffdfb0 0x7fffffffdfb0
rsp 0x7fffffffdf90 0x7fffffffdf90
r8 0x0 0
r9 0x7ffff7fe0d50 140737354009936
r10 0x0 0
r11 0x0 0
r12 0x555555555080 93824992235648
r13 0x7fffffffe0c0 140737488347328
r14 0x0 0
r15 0x0 0
rip 0x555555555197 0x555555555197 <simple+46>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
(gdb)
The stack pointer (SP) holds the bottom of the stack, which is 0x7fffffffdf90. We can then use the debugger to directly examine the memory above it.
(gdb) x/8gx 0x7fffffffdf90 0x7fffffffdf90: 0x00007fffffffdfb6 0x000055315555524d 0x7fffffffdfa0: 0x00003131f7fb6fc8 0x51e4bd6119d63f00 0x7fffffffdfb0: 0x00007fffffffdfd0 0x00005555555551d3 0x7fffffffdfc0: 0x00007fffffffe0c0 0x3100000000000000 (gdb)
And there is the return address! Now comes the hacking part. Notice that 20 bytes before the return address are the bytes 0x3131
... the values in the variable buffer
which the simple
procedure changes. So if we modify the C code *(buffer + 1) += 1
to *(buffer +20) += 1
we will now be modifying the return address.
But if we can modify the return address, then we can make the program skip the part where c
gets set to '2'. Looking at the disassembly from earlier, the instruction movsbl
occurs 4 bytes after the movb
instruction (31 - 27 = 4). Therefore if we replace *(buffer + 1) += 1
with *(buffer +20) += 4
, we should skip past the part where c
gets sets to '2'
.
// bo.c #include <stdio.h> void simple(char a) { char buffer[4]; *buffer = a; *(buffer + 1) = a; *(buffer +20) += 4; } void main() { char c = '1'; simple(c); c = '2'; printf("c is: %c\n", c); }
With this change, when we compile and run the modified program...
$ gcc -o bo-sol bo.c $ ./bo-sol c is: 1 $
we get c is: 1
like we wanted! (And we only had to change 3 characters in our program to do so!)
I should note that I was somewhat lucky... if the last 8 bits of the return address held a value above 251 then adding four would have resulted in the incorrect return address, but the odds were in my favour.)
So how is all of this relevant to crafting a browser exploit? We just changed a return address in the stack by altering the source code, but with a buffer overflow it may be possible to change a return address in the stack by simply rewriting over it. Once you can modify what the return address is, you may be able to make the program execute instructions of your own choosing, whereby the fun begins.
I had to do all of the above and more for a project from a course I'm taking this semester (COMP3410), which tasked us with the following:
The course provided us with a specific and very minimal browser to exploit. Although the binary was not comprised of x86 instructions, which meant that there was not as much available documentation on crafting an exploit from a buffer overflow vulnerability, the RISC-V ISA is actually comprehensible. (Recall that x86 is a gargantuan mess.) The project's course page did give us a massive hint by linking to an 1996 article from phrack magazine which described how to develop an exploit from a buffer overflow vulnerability on a x86 binary on linux.
Although the article was almost 25 years old, the C programming language is older, so much of the article held up. In fact, my puzzle at the beginning of this post was inspired by the article. Our course project was made significantly easier by the machine the RISC-V browser was running on not randomizing the location of the browser's stack. (I.e. it did not have ASLR turned on.) And in fact to make the assignment even easier, the browser was run on a simulator which assigned the same stack location to the browser each time. As such, my exploit was very small and had the following structure in bytes.
AAAAAAAAAAAAAAAA BBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBB BBBBBBBBBBBBBBBB BBBCCCCDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDDDDDDDDD DDDDDDDDEEEEEEEE EEEEEEEEEEEEEEEE EEEEEEEEEEEEEE
In case the assignment is used when the course is taught again I don't want to post my actual exploit, but these are what the components of what it did:
- A: These bytes made the browser take a branch to a return address which could be changed with a buffer overflow.
- B: Filler bytes to get to the return address.
- C: The return address. I replaced it with the memory address of the word following it in the stack.
- D: RISC-V instructions to print the string in the next component, and to replace specific characters in the next component with newline characters. Newlines had to be replaced otherwise the full payload would not have been written to the stack.
- E: A string to demonstrate we exploited the program.
Overall I had a great time with the project. Everyone's heard of buffer overflow vulnerabilities, but actually turning a buffer overflow into an exploit required non-trivial knowledge. Knowing how to program in python or javascript is certainly not enough knowledge to be able to craft an exploit — doing so instead requires knowledge of how computers actually work. As I discovered, utilizing that knowledge to craft an exploit was very fun.