Introduction
We all know that writing proper Assembly is hard.
While constructing even a Assembly file, we are constantly being tormented
by weird compiler errors, segmentation faults, undefined behaviour
and more.
During my first few weeks at a university that forces us to write Assembly
by hand, I have come across many fellow students who went ape about
some weird error they weren't able to solve.
It's not their fault though. Expecting students (some of which without
any previous programming experience) to write god-tier GAS Assembly in the
first three weeks of the programme simply isn't going to happen with
a tedious and incomplete self-study guide.
Resources about GAS Assembly on the internet are scarce, because no-one
actually writes assembly by hand and the Intel syntax is more popular.
In fact, Assembly becomes ten times easier once you know what you're doing.
It's time to write a proper guide.
Hello world: not that simple
Let's start with a proper simple hello world program. I will explain in detail what every line is doing.
1
.data
2
3
hello_world_str:
4
.string "Hello, World!"
5
6
7
.text
8
9
.global main
10
main:
11
12
# prologue
13
14
pushq %rbp # save rbp
15
movq %rsp, %rbp # create a new stackframe
16
17
# output "Hello, World!" and a newline
18
19
movq $hello_world_str, %rdi
20
call puts
21
22
# epilogue
23
24
movq %rbp, %rsp # end the stackframe
25
popq %rbp # restore the previous stackframe
26
movq $0, %rax # exit successfully
27
ret # returning from the main function exits the program
Let's save the file as app.s
and compile it
with gcc app.s -o app -no-pie -g
.
This commands calls the GNU Compiler Collection to compile the assembly file
app.s
.
With the output flag we specify that we want the output executable to
be called app
.
The -no-pie
flag is used to disable the
Position Independent Executable flag.
PIE is a Linux security feature which we have to disable to make our
lives easier when compiling Assembly.
Finally, the -g
flag denotes that we want
to add debugging symbols to our executable, which will be very helpful
when we have to debug our program.
Let's go through the source code line by line.
1
: We start by defining the data section.
In this section we put the string we want to print.
The data section can also be used to save any data our program might
need, or to store global variables.
3
: This is a label which we can use to
address the string containing our hello world message.
4
: The
.string
keyword denotes that we want
to store a string of characters. The compiler will append a null byte to
the string, which is used to identify the end of the string.
7
: Next we define the text section.
This section will hold the actual code of our program.
9
: We specify that we will declare a global
function main
.
Without this, the compiler cannot access the main function and our code
wouldn't compile.
10
: Here we put the label for the main
function. This function is special, because the compiler will use the
main function as entry point for the program.
12
: Most functions we will write
require a prologue.
In the prologue we will create a new stack frame.
A stack frame is a place for a function to create local variables.
14
: We have to ensure the stack frame of the
caller
is left untouched, so we have to save the stack frame pointer (also known
as the base pointer, %rbp
).
15
: This line of code will move the base
pointer to the top of the stack (defined by the stack pointer,
%rsp
), where we place the new stack
frame.
19
: In order to print the hello world string,
we move its address into the %rdi
register, which holds the value of the first argument for a function.
The dollar sign in front of the hello world string label indicates that we
take the address of the label, instead of the first bytes stored at the
label.
20
: Now we call the
puts
function. It is a function from the C
standard library. As the acronym implies, it will put
a string onto our screen.
puts
expects an address to a string as its
first parameter, and it will print all characters one by one until it
finds a null byte, indicating the end of the string.
Finally it will end the line by printing the newline character.
22
: We conclude the function with an
epilogue, which does the opposite of the prologue.
It will clean up the current stackframe, so we can safely pass control
back to the caller.
24
: This brings back the stack pointer to
where it was just after the function was called.
25
: Next we restore the base pointer to
where it was before the function was called.
26
: We move the value 0 into the return
register (%rax
).
The return value of the main function denotes the exit code of the program.
Zero means success, and anything else implies that an error occured.
27
: Lastly, we return from the main function.
This will exit the program.
Hopefully you will have learnt how a basic Assembly file looks like.
It's time to dive deeper into some Assembly fundamentals.