Rest of the RE posts can be viewed here.
Today we’ll talk about the sections of assembly program and we’ll use it to write a simple “Hello, World!” application. We’ll focus on x86-64 architecture but generally same applies to ARM.
The General Structure of Assembly program
Generally, a x86-64 assembly program consists of these sections:
.global _start
.intel_syntax noprefix
.section .text
_start:
.section .data
.section .bss
The program contains 4 main sections:
- Global section: Identifies the entry point, _start in our case.
- Text section: Contains the actual code instructions.
- Data section: Holds initialized variables.
- BSS section: Holds uninitialized variables.
The Global Section
To execute a program we need an entrypoint to start with (similar to main in C), which is what the first line for. The general syntax is:
.global <entry point label>
You don’t have to use start and “” isn’t mandatory, however the convention is to use an underscore for system-related or special symbols.
As to the second line, assembly follows two main syntax styles, intel syntax and AT&T Syntax with some differences between each other, ref, I’ll be using the intel syntax by adding:
.intel_syntax noprefix
The noprefix keyword is so we don’t use the prefix ‘%’ which makes writing the code easier, the abscence of the above directive changes the syntax style to AT&T style, so instead of:
.intel_syntax noprefix
mov eax, 1 ; Intel style were destination before the source and % isn't used
We get:
mov $1, %eax // notice the prefix % is added to the register name and the source preceeds the destination
The TEXT Section
In the .text section, we modify registers and write the actual instructions and syscalls which we will talk about later. To print something to the standard output we can take advantage of the sys_write syscall, which is easy to translate to assembly by following the syscall table, let’s take a look at it’s elements:
| %rax | System call | %rdi | %rsi | %rdx | %r10 | %r8 | %r9 |
|---|---|---|---|---|---|---|---|
| 1 | sys_write | unsigned int fd | const char *buf | size_t count |
The assembly code corresponding to the sys_write syscall is:
mov rax, 1 // 1 = sys_write (this tells Linux we want to write)
mov rdi, 1 // File descriptor: 1 (stdout)
lea rsi, [INITIALIZED VARIABLE NAME] // Address of the string (pointer)
mov rdx, 14 // Number of bytes to write
syscall // Perform the syscall
We use the opcode mov to copy (not actually move) the data into the registers, meanwhile if we want to copy and address of a variable which holds our string we use lea, in the sys_write table entry we see that rsi should have a pointer to a buffer which is how we know it requires an address to such buffer not the actual value.
We end the syscall with the Opcode syscall to execute it. Just like in other programming languages like C, we need to exit from our code somehow, so we can use sys_exit syscall which only requires two registers!
mov rax, 60
mov rdi, 0
syscall
The DATA Section
Here we declare the variables and initialize them, for example to initialize a null terminated string:
.section .data
hello_world:
.asciz "Hello, World!"
It’s important that the variables are initialized, and to use them we refer to their address with the instruction lea (load effective address).
The BSS Section
This section is also to save variables but unlike the DATA section they’re not giving an initial value.
Putting It All Together
Now let’s use this information to print “Hello, World!” to the screen, for that we will use the syscall “sys_write” and “sys_exit” which we discussed above:
- Let’s write the sections first:
.global _start
.intel_syntax noprefix
.section .text
_start:
.section .data
.section .bss
- Then let’s create our variable to store the string “Hello, World!”
.global _start
.intel_syntax noprefix
.section .text
_start:
.section .data
hello_world:
.asciz "Hello, World!\n"
- Now we will invoke sys_write syscall to print the Hello, world! string:
.global _start
.intel_syntax noprefix
.section .text
_start:
// sys_write call
mov rax, 1
mov rdi, 1
lea rsi, [hello_world]
mov rdx, 14
syscall
// sys_exit call to exit the program
mov rax, 60
mov rdi, 0
syscall
.section .data
hello_world:
.asciz "Hello, World!\n"
.section .bss
Running the Assembly Code
To assemble and run the code, follow these steps:
-
Assemble the source file (
hello-world.s) into an object file (hello-world.o):as hello-world.s -o hello-world.o -
Link the object file to create an executable (
hello-world):- The
-nostdlibflag prevents linking with the standard C libraries. - The
-staticflag ensures the executable is statically linked.
gcc -o hello-world hello-world.o -nostdlib -static - The
-
Run the executable to see the output:
./hello-world
It’s ok if you don’t understand yet how these commands work, we will explain them in a separate post.
Final Thoughts
Now you have wrote your first assembly program, we are ready to move deeper into other topics and understand this language better.