Rest of the RE posts can be viewed here.

Reverse Engineering Journey

Reverse engineering is a deep topic I’ve been interested in for a while, and now I feel ready to dive in. I’m starting with the book Reverse Engineering for Beginners by Dennis Yurichev, and I’ll move to other resources as needed.

Chapter 1: Code Patterns

The book begins with an introduction to the author’s background, his tech journey, and a discussion on compilers and computer architectures. It also touches on modern compilers and how they excel at optimizing code. It’s a fun and informative read.

1.2.1 A Short Introduction to the CPU

This section explains the CPU, its function, and key terms. Here are some important definitions:

Instruction: A primitive CPU command. Examples include moving data between registers, working with memory, and basic arithmetic operations. Each CPU has its own Instruction Set Architecture (ISA).
Machine Code: Code that the CPU directly processes. Each instruction is typically encoded with several bytes.
Assembly Language: A human-readable representation of machine code, often with extensions like macros to simplify programming.
CPU Register: A fixed set of general-purpose registers (GPRs) within a CPU. For example, x86 has around 8 registers, x86-64 has about 16, and ARM has around 16. A register is essentially an untyped temporary variable, which is a powerful tool in assembly programming.

This introduction highlights the importance of assembly and how it relates to machine code and higher-level languages like C, C++, and Java.

1.3 An Empty Function

The book explores the assembly dump of an empty C function:

void f() {
    return;
};

This function does nothing and returns nothing. Using GCC and MSVC on Godbolt, we get the following assembly:

f       PROC
        ret     0
f       ENDP

In my study, I’ll focus on x86-64 and ARM architectures, which are the areas I’m most interested in exploring. The code above differs slightly from the book’s output, even with the /O7 optimization flag:

f:
    ret

After changing the compiler to x86-64 GCC with the -O3 flag, I got the same result. So, it’s important to always check the compiler version and optimization settings.

1.4.2 ARM

For ARM, the assembly output is a bit different:

f   PROC
    BX      lr  # For branching
    ENDP

I found this cheatsheet for ARM assembly. The BX instruction is used for branching, returning to the caller, whose address is stored in the lr (link register).

Hello World in Assembly

Here’s the assembly code to print “Hello, World!” in x86-64 assembly:

.global _start
.intel_syntax noprefix

.section .text
_start:    

    // sys_write call    
    mov rax, 1
    mov rdi, 1    
    lea rsi, [hello_world]
    mov rdx, 14 
    syscall
    
    // sys_exit call to exit the program
    mov rax, 60
    mov rdi, 0
    syscall

.section .data
hello_world:
    .asciz "Hello, World!\n"

.section .bss

The program contains 4 main sections:

Global section: Identifies the entry point.
Text section: Contains the actual code instructions.
Data section: Holds initialized variables.
BSS section: Holds uninitialized variables.

In the .text section, two system calls are made:

sys_write: To print out the “Hello, World!” string. The syscall table specifies the arguments:

%rax	System call	%rdi	%rsi	%rdx	%r10	%r8	%r9
1	sys_write	unsigned int fd	const char *buf	size_t count

The assembly code corresponding to the sys_write syscall is:

mov rax, 1               // 1 = sys_write (this tells Linux we want to write)
mov rdi, 1               // File descriptor: 1 (stdout)
lea rsi, [hello_world]   // Address of the string (pointer)
mov rdx, 14              // Number of bytes to write
syscall                  // Perform the syscall

Running the Assembly Code

To assemble and run the code, follow these steps:

Assemble the source file (hello-world.s) into an object file (hello-world.o):
```
as hello-world.s -o hello-world.o
```
Link the object file to create an executable (hello-world):
- The -nostdlib flag prevents linking with the standard C libraries.
- The -static flag ensures the executable is statically linked.
```
gcc -o hello-world hello-world.o -nostdlib -static
```
Run the executable to see the output:
```
./hello-world
```

This will output “Hello, World!” to the terminal. The second syscall is for sys_exit which enables us to exit from the code:

%rax	System call	%rdi	%rsi	%rdx	%r10	%r8	%r9
60	sys_exit	int error_code

So only two registers are needed, RAX to hold the syscall number, RDI to hold the return value then syscall to excute the call.

Final Thoughts

While I’ve covered x86-64 and ARM architectures, other compilers and architectures (like MIPS) are also mentioned in the book, though I’m not covering them in my study for now.

Reverse engineering for beginners walkthrough - 1