Exploring ELF Binary Dynamics: Relocations and Sections in Depth

12 min read

March 18, 2023

Initiating Linux Binary Exploitation: A Beginner's Expedition into Code Manipulation
Exploring ELF Binary Dynamics: Relocations and Sections in Depth

Table of contents

Introduction: My Journey Through ELF Binaries in the Linux Binary Exploitation Series

Welcome to another chapter in my ongoing series on Linux binary exploitation, where I delve into the intricate world of Executable and Linkable Format (ELF) binaries. As I continue to explore the various aspects of binary exploitation on Linux, this installment is particularly special to me. It's here that I unravel the complexities and nuances of ELF binaries, a cornerstone of Linux and Unix systems.

In my journey through the realms of software development and cybersecurity, I've encountered ELF binaries as more than just files; they are the essential gears that drive software’s interaction with the Linux operating system. This chapter builds on what I've covered in previous parts of the series, taking a deep dive into the heart of ELF binaries. From the basic principles of disassembly and decompilation to the advanced realms of dynamic linking and vulnerability analysis, my aim is to demystify each element, offering a clear and comprehensive understanding.

Through this chapter, I will guide you through the anatomy of ELF binaries, exploring their sections, segments, memory management, and how they seamlessly integrate with dynamic libraries. Whether you are a seasoned programmer, an aspiring cybersecurity expert, or a newcomer to this field, my insights are crafted to enhance your understanding of how Linux programs operate and how to secure them effectively.

Join me as I continue this fascinating journey, connecting the knowledge from previous articles to provide a richer, more integrated understanding of Linux binary exploitation. This exploration is not just an academic exercise; it's a practical guide filled with the knowledge essential for navigating the modern landscape of computing. Let's dive in and uncover the secrets of ELF binaries together, as I share my learnings and discoveries in this captivating chapter of the Linux Binary Exploitation series.

Unveiling the Secrets of Binaries: Disassembly and Decompilation

As we navigate through the realm of ELF binaries, it's essential to familiarize ourselves with the concepts of disassembly and the captivating world of decompilation. While we won't delve too deeply into the technicalities of disassembly just yet, let's explore its essence.

Picture disassembly as a kind of magic that transforms cryptic machine code—those baffling strings of 1s and 0s—into a more understandable form of assembly language. It's akin to peeling back the layers of a binary file, offering a glimpse into its core, almost like peering into the soul of the binary. This understanding is crucial, especially when tackling exploits and vulnerabilities.

Now, you might wonder, "Is it possible to reverse-engineer machine code back into high-level languages like C or C++?" It's an intriguing thought, but the reality is a bit more complex. The original code morphs significantly during compilation as it's optimized for performance and efficiency. Therefore, a perfect reverse-engineering to its original state is often unfeasible.

However, there's a ray of hope: decompilers. These are the unsung heroes of reverse engineering, capable of translating machine code into pseudocode that bears a strong resemblance to C/C++. While we're not diving into the deep end with these tools just yet, it's important to know they're part of our arsenal. So, get ready for an exciting exploration into the world of ELF binaries!

Disassembly and decompilation sample

Object File vs. Executable File: Unveiling Their Distinctions

In this section, we embark on an enlightening journey to discern the critical differences between an ELF binary's object file, created post-compilation, and the executable file, born from the linking phase. This exploration is key to understanding the complexities of function and variable relocation during linking.

Object Files: A Closer Look

We begin by exploring object files, utilizing the capabilities of radare2. Starting with the command r2 test.o, we enter the radare2 environment. Here, we execute a detailed analysis using the aaa command, effectively identifying functions and key elements in the file.

[0x08000040]> aaa
				...
[0x08000040]> afl
0x08000040    1     32 sym.main
[0x08000040]> 

This initial analysis reveals that only the main function is identifiable at this stage. Other functions, like printf, remain undetected due to the absence of linking phase resolutions.

To delve into the main function's code, we employ the pdf command. Notably, radare2 indicates that the string "Hello, world!" resides in the .rodata section, highlighting its need for relocation—a direct consequence of the yet-to-be-performed linking phase. The iz command can extract strings from the object or executable:

[0x08000040]> iz
[Strings]
nth paddr      vaddr      len size section type  string
―――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00000060 0x08000060 13  14   .rodata ascii Hello, world!
Disassembly of object code

In our analysis, we also observe the puts function being called—an imported function identified within the file. To confirm this, the is command lists the symbols in the file, showing puts as imp.puts (import puts):

[0x08000040]> is
[Symbols]

nth paddr      vaddr      bind   type   size lib name
―――――――――――――――――――――――――――――――――――――――――――――――――――――
1   0x00000000 0x08000000 LOCAL  FILE   0        test.c
2   0x00000040 0x08000040 LOCAL  SECT   0        .text
3   0x00000060 0x08000060 LOCAL  SECT   0        .rodata
4   0x00000040 0x08000040 GLOBAL FUNC   32       main
5   0x00000000 0x08000000 GLOBAL NOTYPE 16       imp.puts

Executable Files: A Detailed Analysis Post-Linking

Having explored object files, we now shift our focus to executable files, particularly those generated after the linking phase. This step provides us with a more complete and intricate understanding of the ELF binary.

Insights into the Executable File

Upon examining an executable, we immediately notice a significant increase in the number of recognized functions compared to the object file. Among these, three functions are of particular interest: main, sym.imp.puts, and entry0. Let's start with main.

Running the command afl in radare2, we see a list of functions including main

[0x00401040]> afl
0x00401040    1     37 entry0
0x00401080    4     31 sym.deregister_tm_clones
0x004010b0    4     49 sym.register_tm_clones
0x004010f0    3     32 sym.__do_global_dtors_aux
0x00401120    1      6 sym.frame_dummy
0x00401148    1     13 sym._fini
0x00401070    1      5 loc..annobin_static_reloc.c
0x00401126    1     32 main
0x00401030    1      6 sym.imp.puts
0x00401000    3     27 sym._init

In this stage, the analysis reveals more information:

  1. Relocation of Strings: The need for string relocation, as seen in object files, no longer exists. Radare2 can now directly locate the memory address of the string "Hello, world!" at 0x402010.
[0x00401126]> iz
[Strings]
nth paddr      vaddr      len size section type  string
―――――――――――――――――――――――――――――――――――――――――――――――――――――――
0   0x00002010 0x00402010 13  14   .rodata ascii Hello, world!
  • Function Definitions: The executable file no longer shows the definition of the function puts as sym.imp. This indicates that puts is a symbolic reference (sym) to an imported function (imp). Furthermore, we get a brief definition of the function, such as int puts(const char *s), which aids in analysis.
  • The entry0 Function: Commonly known as __start in other tools, entry0 is a standard function in ELF binaries compiled with gcc. Its primary role is to set up command line arguments and the environment for executing the main function. The assembly code for entry0 typically shows it calling libc_start_main, which then calls main with the appropriate arguments.
entry0 code

Sections of a Binary: Foundations for Analyzing ELF Binaries

Before delving into various exploiting techniques, it's crucial to understand the last piece of foundational theory relevant to ELF binaries analysis: the sections of a binary. Although more theoretical aspects will be introduced in future articles, the understanding of binary sections is essential for a comprehensive grasp of ELF files and their exploitation.

Understanding Binary Sections

  • What Are Sections?: Sections in a binary are essentially logical divisions of the code and data. They don't adhere to a specific structure; rather, their structure is determined by their content.
  • Section Headers: Each section is described by what is known as a section header. These headers collectively form the section header table. Although we won't delve deeply into each header part, it's important to note that their definitions can be found in /usr/include/elf.h.
Structure of an executable of type ELF

Role of Sections in a Binary

  • Linker Assistance: Sections are primarily designed to aid the linker. This means not all sections are essential for executing the binary in memory. For instance, some symbols or relocations are more geared towards debugging rather than being necessary for runtime.
  • Segments and Execution: When a binary is executed, its code and data are organized differently, known as segments. While we won't cover this concept in detail here, it's an important aspect to keep in mind.

Exploring Sections in ELF Files on GNU/Linux

  • Using radare2 for Section Analysis: To examine the sections of ELF files, tools like radare2 can be very useful. Commands such as rabin2 -S test or iS within radare2 can provide detailed information about these sections.
  • Permissions in Sections: When analyzing sections, you'll encounter various permissions:
    • Read (r): Allows reading the contents of the section.
    • Write (w): Indicates whether writing in the section is permissible.
    • Execute (x): Determines if the section's code can be executed.
Sections of the test executable

Diving into the .init, .fini, and .text Sections of ELF Binaries

Venturing into the world of ELF binaries, it's essential to understand the unique roles and characteristics of specific sections like .init, .fini, and .text. These sections are more than just parts of a binary; they are the keystones in understanding how a program functions from start to finish.

The .init and .fini Sections: The Bookends of Program Execution

  • The Role of .init: Think of the .init section as the warm-up act before the main performance. This section executes right before the binary's main code, akin to an object constructor in object-oriented programming. The presence of the -x flag here tells us that this part of the code is set to execute.
  • Understanding .fini: On the flip side, the .fini section is like the final bow after a show. It runs after the main program, wrapping things up in a manner similar to an object's destructor. It's where the program does its final clean-ups.

The .text Section: Where the Main Action Happens

  • A Focus on Main Code: The .text section is where the heart of the program beats. It's the main stage where all the primary actions and operations of the program are performed.
  • Security in Permissions: Noticeably, this section is typically marked with r (read) and x (execute) permissions, but pointedly lacks the w (write) permission. This isn't an oversight; it's a security measure. Allowing both execute and write permissions would be like leaving the door wide open for attackers.
  • Analyzing the .text Section: To get under the hood of the .text section, we use radare2's iS command to pinpoint its memory address. Then, with the pD command, we delve into its content, disassembling it to reveal the intricacies of the program's code:
iS
pD <memory address to be dumped> 
.text section

The .bss, .data, and .rodata Sections in ELF Binaries

When dissecting the structure of ELF binaries, three crucial sections emerge for organizing different types of variables: .bss, .data, and .rodata. Each of these sections plays a distinct role in how variables are stored and managed within an executable.

Understanding the Different Sections

  • The .bss Section: This is where all the uninitialized variables reside. If you have variables that are declared but not assigned a value, they find their home here. It’s like a blank canvas waiting for data to be painted on it during runtime.
  • The .data Section: In contrast, the .data section houses initialized variables. These are the variables that are not only declared but also assigned a value. It's akin to a pre-filled canvas, where certain elements are already defined and set.
  • The .rodata Section: Standing for "Read-Only Data", the .rodata section is reserved for constant variables. These are the variables that are set once and don't change throughout the execution. They are the immutable truths of the program.

Permissions and Security Implications

  • Write Permissions in .data and .bss: Both the .data and .bss sections are given write permissions, aligning with their roles in storing variables that might change or be initialized during the program's execution.
  • Read-Only Nature of .rodata: In contrast, the .rodata section is read-only. This makes sense as it contains constants - values that should remain unchanged and protected from modification.

Practical Example: "Hello World!"

In the context of our ongoing ELF binary analysis, the string "Hello World!" is a constant. Therefore, we find it in the .rodata section. It's a classic example of how constant data, like strings displayed to the user, are stored in a protected, read-only section to ensure they remain unaltered throughout the program's operation.

.rodata section

Navigating the World of Lazy Binding, PLT, and GOT in ELF Binaries

Welcome to the intriguing world of ELF binaries, where the integration of dynamic libraries during a program's run time is a ballet of efficiency and optimization. Here, we're going to unravel the mysteries of lazy binding, the Procedure Linkage Table (PLT), and the Global Offset Table (GOT) - three protagonists in this fascinating process.

Lazy Binding: An Overview

  • Dynamic Linking with Lazy Binding: Although dynamic library relocations happen when an executable is loaded into memory, they are not fully resolved immediately. Instead, the relocations occur "lazily" - only when a function call is made or a variable from a dynamic library is used. This approach, known as lazy binding, optimizes performance by avoiding unnecessary relocations and is the default method used by dynamic linkers today.
  • Utilizing PLT and GOT: Lazy binding is facilitated by two main sections - the Procedure Linkage Table (.plt) and the Global Offset Table (.got).

Understanding PLT and GOT

  • Procedure Linkage Table (PLT): This section contains entries for each function that requires dynamic linking. An entry in the PLT typically includes:
    1. A jump to the corresponding entry in the GOT.
    2. The function's identifier placed on the stack.
    3. A jump to the dynamic linker's default stub.
  • Analyzing PLT: To view the .plt section, commands like iS to show sections and pD <address> to display content can be used, similar to analyzing the .text section.
  • Global Offset Table (GOT): The GOT holds memory addresses where dynamically linked functions will be placed. Initially, these addresses point back to the PLT, due to the lazy binding process not being complete.
.plt section
Jump address in .got.plt

The Lazy Binding Ballet

Dynamic linking process
  1. The Function Call: Let's say our program calls puts. This triggers the sequence in the PLT.
  2. PLT-GOT Tango: The PLT then gracefully jumps to the GOT entry, which for now, loops back to the PLT, ensuring the function identifier is noted.
  3. The Dynamic Linker's Cue: Next, we leap to the default stub, a preparatory step before the main performance by the dynamic linker.
  4. The Final Performance: The dynamic linker takes center stage, modifying the GOT to directly point to puts, streamlining all future calls.

GOT vs. GOT.plt: The Two Arenas

  • GOT.plt for Functions: The .got.plt is where the magic happens for function references. It's dedicated to making sure function calls from shared libraries hit their mark.
  • GOT for Variables: The .got, on the other hand, is like a storage unit for variables or constants from shared libraries, bypassing the more complex dance steps needed for functions.

Conclusion: Navigating the Depths of ELF Binaries

As we conclude our exploration of ELF binaries, we find ourselves having journeyed through a landscape rich in complexity and sophistication. From dissecting the very essence of disassembly and decompilation to demystifying the intricacies of sections like .text, .init, .fini, and others, we've unraveled the fundamental components that constitute these binaries. We've seen how they are meticulously structured, how they cleverly manage memory, and how dynamic libraries intertwine seamlessly with the program's execution through mechanisms like lazy binding, PLT, and GOT.

This excursion into the world of ELF binaries isn't just an academic exercise; it's a deep dive into the underpinnings of how software operates at its core. By understanding these elements, we're not just reading code; we're interpreting the language of the machine. We gain insights into the subtleties of how programs are executed, how they interact with the operating system, and how vulnerabilities can emerge and be exploited.

The knowledge of ELF binaries is invaluable for developers, security researchers, and anyone fascinated by the inner workings of software. It empowers us to write more efficient and secure code, to analyze and understand existing software at a granular level, and to think creatively about problem-solving in the realm of computing.

In essence, the journey through ELF binaries is a journey through the heart of computing, offering a foundational understanding that is both powerful and indispensable in the rapidly evolving landscape of technology. As we continue to build and secure the digital world, the insights gained here will undoubtedly serve as a guiding light, illuminating the path forward in the ever-expanding domain of software development and cybersecurity.

Chapters

Botón Anterior
Decoding the Compiler: A Deep Dive into the Phases of C Code Compilation

Previous chapter

Exploring Buffer Overflow Exploits: A Practical Guide with Dynamic Analysis

Next chapter