Who even does this any more? Embedded stuff maybe. And of course, there’s these guys.
-
A nice textbook by Paul Carter.
-
Intel 80386 Programmer’s Reference Manual circa 1986.
-
Some 80386 architecture notes.
-
Understanding Assembly Language, an excellent free book in a gigantic 1082 page PDF.
-
A compiler explorer. An brilliant tool for checking C to assembly in real time.
The best way to write some assembly language is to let GCC do it for you. The advantages are many.
-
Obviously it does all the work.
-
A variety of instruction sets are possible.
#include <stdio.h> int main(){ printf("xed.ch\n"); return 0; }
To create an assembly language program
gcc -O0 -S xed.c
Which will produce a file named xed.s
.
.file "xed.c" .section .rodata .LC0: .string "xed.ch" .text .globl main .type main, @function main: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $.LC0, %edi call puts movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Debian 4.9.2-10) 4.9.2" .section .note.GNU-stack,"",@progbits
For example the line movl $0,%eax
is setting up a return code.
Assembly To Executable
If you have an assembly (dot s) file that you want to turn into an executable binary use GCC like this.
$ gcc -o xed xed.s
$ ./xed
xed.ch
nasm
There is also nasm
(Netwide Assembler) which I think works like this
in its simplest form.
$ nasm -f elf someassmbly.asm
Other formats (for the -f
option) include bin
(e.g. DOS .COM, .SYS),
aout
(Linux a.out), elf32
(same as elf
), elf64
, obj
(some
DOS thing), win32
(same as win
), and win64
.
You might also need a -d ELF_TYPE
.
Executable To Assembly
What if you have an executable that you want to understand how it
works? This is very hard but if the program is sufficiently simple,
you may be able to convert it to assembly and have a look in human
readable terms. The program objdump
is the key.
$ objdump -d ./xed | sed -n '/<main/,/^$/p'
0000000000400506 <main>:
400506: 55 push %rbp
400507: 48 89 e5 mov %rsp,%rbp
40050a: bf a4 05 40 00 mov $0x4005a4,%edi
40050f: e8 cc fe ff ff callq 4003e0 <puts@plt>
400514: b8 00 00 00 00 mov $0x0,%eax
400519: 5d pop %rbp
40051a: c3 retq
40051b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
Directives
These are like C’s preprocessor but for the assembler. For example the
%define MYCONSTANT, 64
will allow for something like this.
mov eax, MYCONSTANT
Note that these (nasm) directives begin with %
.
A common one is this.
%include "asm_io.inc'
Structure
A line of assembly code can take the following form.
Label: [; Comment]
[Label:] Opcode [Source Operand][, Destination Operand] [; Comment]
Obviously the labels are niceties provided by the assembler to get to the desired locations. Generally assembly closely adheres to the real way the CPU sees the world, i.e. instruction codes followed by the correct required number of operands.
Addressing
It’s complicated. Square brackets seem to indicate that registers are to be used like a pointer. In other words the value of the register will indicate the location required. Something sort of like this.
mov ax, [ebx] ; ax= *ebx
Registers
-
32-bit: EAX, EBX, ECX, EDX, ESI, EDI, ESP, EBP
-
16-bit: AX, BX, CX, DX, SI, DI, SP, BP
-
8-bit: AH, AL, BH, BL, CH, CL, DH, DL
-
Flags: EFLAGS (bit mask signalling stuff like interrupts, overflow, privilege level).
-
Instruction pointer: EIP (offset into the current code segment)
Operands
The fastest operands are immediate created by the processor; for
example constant numbers like $0x10 or $0 as in the sample program’s
exit code above. Prefix is $
. These are stored in the code segment
itself right where they’re used and so no data fetching or addressing
is needed.
Registers can be used for operands (basically an opcode’s parameters).
Prefix is %
.
Also memory addresses can be operands though this is slower. I’m not sure these have a prefix.
Also I/O ports can be used as operands with the speed being potentially even slower yet.
Ops
-
mov <dest>, <src>
- Operands must be the same size and both can not be memory. The order looks wrong, but I’ve seen it this way. -
add <X>, <amount>
- Increments X by amount where X is a register. Amount can be a register too, but it’s not changed. -
sub <X>, <amount
- Like add but subtracting. -
inc <X>
- Like add with an amount of 1. -
dec <X>
- Like sub with an amount of 1. -
shr <X>, <amount>
- Shift bit right. Alsoshl
. Amount of 1 divides by 2. -
ror <X>, <amount>
- Rotate bit shift right preserving the lost bits on the other end. Alsorol
. -
xor <X>, <X>
- As shown in place. -
neg <X>
- Negates. -
not <X>
- Bitwise not. -
and <X>, <Y>
- Bitwise and. -
or <X>, <Y>
- Bitwise or. -
cmp <X>, <Y>
- Compare -
ret
- Return. Details?
Stack
Many CPUs have a LIFO stack that can be directly used.
-
push dword <value>
- Pushes value onto the stack. Not sure about the exact function ofdword
in this case. -
pop <X>
- Pops the top of the stack into X. -
call <subprog>
- Pushes the current address on the stack and heads off the execute the subprogram. -
ret
- Pops an address from the stack (unless you mangled it) and resumes execution there.