May 3 / Sebastian Helmut

How the GNU Linker Works: Symbols, Relocation, and ELF Binaries Explained

Don't hesitate

Get the Professional Embedded Starter Kit: Production-ready templates and architectural cheat sheets for your firmware projects.

How the GNU Linker Works: Symbols, Relocation, and ELF Binaries Explained



1. Introduction

Every embedded developer working with STM32, NXP, or any other Cortex-M device eventually runs a command like:

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \

startup.o main.o utils.o \

-T linker_script.ld -o firmware.elf

The compiler's job is well understood: it turns C into Thumb-2 machine code. But there is a second tool in that pipeline whose work is almost entirely invisible until something goes wrong: the linker, invoked here through the GCC driver.

Undefined reference errors, a HardFault on boot because the vector table landed at the wrong flash address, a BSS region that never gets zeroed because _sbss and _ebss are missing from the map all of these are linker problems. Yet most embedded developers treat the linker as a black box and only interact with it through a handful of -T and -Wl flags.

This article opens that black box. We follow a small C program from the moment arm-none-eabi-gcc -c finishes to the moment a flashable ELF exists on disk, examining every decision the linker makes: how it reads ARM ELF object files, how it matches undefined BL targets to definitions, how it assigns real Cortex-M flash and RAM addresses to code and data, and how a linker script controls the final memory map.

2. From Source to Object File

2.1 What the compiler produces

Given a C source file, the compiler runs four phases: preprocessing, parsing and semantic analysis, Thumb-2 code generation, and assembly. The final phase hands off to arm-none-eabi-as, which encodes the instructions and writes an ELF relocatable object the .o file.

The object file is not a complete firmware image. It is a fragment: Thumb-2 machine code with holes where branch targets and data addresses will eventually go, plus tables describing those holes so the linker knows how to fill them.

Compile only, stop before linking (-c flag)

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -c main.c -o main.o

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -c utils.c -o utils.o

$ file main.o

main.o: ELF 32-bit LSB relocatable, ARM, EABI5 version 1 (SYSV)

The key words are ELF 32-bit LSB relocatable. This is a fragment, not an executable. No flash addresses have been assigned yet.

2.2 The anatomy of an object file: sections and symbols

An ARM ELF object file is organized into sections. Each section holds a distinct category of content:

Section

Contents

Permission

.text

Compiled Thumb-2 machine instructions

Read + Execute

.data

Initialized global and static variables

Read + Write

.bss

Zero-initialized globals

Read + Write

.rodata

String literals, const globals

Read only

.symtab

Symbol table: names, binding, sections, sizes

Metadata

.rel.text

REL relocation entries for .text

Metadata

.ARM.attributes

CPU arch, Thumb ISA version, FPU, ABI flags

Metadata

Use arm-none-eabi-nm and arm-none-eabi-readelf to inspect these structures directly:

List symbols in main.o

$ arm-none-eabi-nm -n main.o

U add_values Undefined: defined in utils.o

U HAL_UART_Transmit Undefined: defined in HAL library

00000000 T main Defined, .text, Thumb (T flag)

Show relocation entries (ARM uses REL not RELA)

$ arm-none-eabi-readelf -r main.o

Relocation section '.rel.text':

Offset Info Type Sym. Name

0000000c 00000228 R_ARM_THM_CALL add_values

00000018 00000328 R_ARM_THM_CALL HAL_UART_Transmit

The U symbols are undefined: the compiler saw a call to add_values, emitted a BL instruction with a zero branch offset, and recorded an R_ARM_THM_CALL relocation saying "patch this slot when you know the address."

Fig 1. Internal structure of an ARM ELF relocatable object file.

Struggling to implement this for a professional project? If you need to master full-scale firmware architecture, security, and build automation, join our 1-D or 4-Day Live Implemnetation Workshops. I'll show you the exact direct path to production-ready firmware without the trial and error.

3. Symbol Resolution

3.1 Defined and undefined symbols

When the linker opens all object files it builds a global symbol table by merging every individual symbol table. Each symbol is either:

- Defined the object contains the actual Thumb-2 code or data for this name.

- Undefined the object references the name but does not define it.

If any undefined symbol cannot be satisfied by the time all inputs are scanned, the linker halts:

arm-none-eabi-ld: main.o: in function main':

main.c:(.text+0xc): undefined reference to add_values'

collect2: error: ld returned 1 exit status

3.2 How the linker resolves references across object files

Consider a minimal two-file Cortex-M program:

/* main.c */

extern int add_values(int a, int b);



int main(void) {

volatile int r = add_values(3, 4);

return r;

}

/* utils.c */

int add_values(int a, int b) {

return a + b;

}


The compiler turns the add_values(3, 4) call into a Thumb-2 BL with a zero placeholder offset:

Disassemble main.o BEFORE linking

$ arm-none-eabi-objdump -d main.o



00000000 <main>:

0: b580 push {r7, lr}

2: b082 sub sp, 8

4: 2003 movs r0, 3

6: 2104 movs r1, 4

8: f7ff fffe bl 0x00000000 < placeholder, R_ARM_THM_CALL add_values

c: 9001 str r0, [sp, 4]

e: 9800 ldr r0, [sp, 0]

10: b002 add sp, 8

12: bd80 pop {r7, pc}


The linker scans main.o, records add_values as unresolved, then scans utils.o, finds the strong definition, and marks the reference satisfied. The symbol now has a section and an offset within that section.

Fig 2. The linker matches the undefined add_values reference in main.o to the strong definition in utils.o and records the binding in its global symbol table.

3.3 Strong and weak symbols

Not all definitions carry equal weight. GNU ld distinguishes strong (STB_GLOBAL) and weak (STB_WEAK) symbols. The rules are:

- Two strong definitions of the same name is a hard error.

- A strong definition always wins over a weak one.

- If only weak definitions exist, the linker uses one of them.

This is exactly how Cortex-M interrupt handlers work. Every ARM CMSIS-compliant startup file ships weak default handlers in assembly:

/* startup_stm32f407xx.s (excerpt) */

.weak TIM2_IRQHandler

.thumb_set TIM2_IRQHandler, Default_Handler



.weak USART1_IRQHandler

.thumb_set USART1_IRQHandler, Default_Handler



Default_Handler:

b Default_Handler @ infinite loop catches unhandled IRQs

Any handler you define in your application is a strong symbol and silently overrides the weak alias:

/* your application strong symbol, overrides Default_Handler */

void TIM2_IRQHandler(void) {

TIM2->SR &= ~TIM_SR_UIF; /* clear update interrupt flag */

counter++;

}


Common pitfall: If you define a handler but its source file is not compiled into the build (wrong Makefile rule, missing source), the weak default is silently used. The CPU enters the interrupt and loops forever in Default_Handler. Always verify with:

> arm-none-eabi-nm -n firmware.elf | grep IRQHandler


4. Relocation

4.1 Why addresses are not known at compile time

When the compiler emits a BL add_values Thumb-2 instruction, the encoding requires a signed PC-relative offset. But the compiler does not know that offset it depends on:

- Where the linker places utils.o's .text in flash.

- Where within that section add_values starts.

- How many alignment-padding bytes the linker inserts between sections.

All unknowable at compile time. The compiler writes a zero-offset BL (f7ff fffe) and records a relocation entry for the linker to fix up later.

4.2 Relocation entries

ARM ELF uses the REL format: the addend is implicit in the instruction encoding, not stored separately. Each entry in .rel.text contains:

Field

Meaning

r_offset

Byte offset of the BL instruction within .text

r_info (sym)

Symbol table index which symbol to look up

r_info (type)

R_ARM_THM_CALL (type 10) encoding formula to apply

R_ARM_THM_CALL applies to Thumb-2 BL and BLX instructions. It instructs the linker to compute S + A - P, encode the result as a Thumb-2 branch offset across two 16-bit halfwords, and set bit 0 of the target address to 1 (the Thumb interworking bit). Here S is the symbol's final flash address, A is the addend extracted from the instruction encoding, and P is the address of the instruction being patched.

4.3 How the linker patches addresses

Once all symbols have final flash addresses assigned, the linker iterates over every relocation entry and applies the formula. Here is a concrete walk-through with real Thumb-2 encoding:

BEFORE linking BL with zero offset (objdump -d main.o)

8: f7ff fffe bl 0x00000000 < zero placeholder



Linker assigns flash addresses (from linker script):

main() at 0x08000100

add_values() at 0x08000130



Instruction is at P = 0x08000108

PC after fetch = 0x0800010c (ARM: PC = current addr + 4)

S (add_values|Thumb) = 0x08000131 (bit 0 set = Thumb)

offset = S - PC = 0x08000131 - 0x0800010c = 0x25



AFTER linking (objdump -d firmware.elf)

08000100 <main>:

8000100: b580 push {r7, lr}

8000102: b082 sub sp, 8

8000104: 2003 movs r0, 3

8000106: 2104 movs r1, 4

8000108: f000 f812 bl 0x08000130 <add_values> < patched!

800010c: 9001 str r0, [sp, 4]

800010e: b002 add sp, 8

8000110: bd80 pop {r7, pc}



08000130 <add_values>:

8000130: b082 sub sp, 8

8000132: 4408 add r0, r1

8000134: b002 add sp, 8

8000136: 4770 bx lr


The Thumb bit: arm-none-eabi-nm shows add_values at 0x08000130, but the BL target encoded in the instruction is 0x08000131 (bit 0 = 1). The CPU strips bit 0 before fetching the instruction it is a Thumb interworking flag, not part of the address. Branching to an even address on Cortex-M triggers a UsageFault. This is why function pointers for Thumb code must always have bit 0 set.

Fig 3. The R_ARM_THM_CALL relocation entry drives the linker to replace the zero-offset BL encoding with the correct Thumb-2 branch offset .


5. The Final Binary

5.1 Merging sections

After symbol resolution, the linker lays out the output ELF. Sections with the same name and compatible flags from all input objects are concatenated into a single output section. The .text from startup_stm32f407xx.o is followed by main.o's .text, then utils.o's .text, and so on. The same happens for .data, .rodata, and .bss.

Each output section gets two addresses:

- VMA (Virtual Memory Address) the address code uses at runtime. For .text and .rodata this is a flash address. For .data and .bss this is a RAM address.

- LMA (Load Memory Address) where the section's initial bytes actually live in the ELF file or in ROM. For .data, initial values are stored in flash (LMA) and copied to RAM (VMA) by the startup code before main() runs.

Fig 4. The linker concatenates input sections into output sections, assigns flash VMAs to code and RAM VMAs to data, and groups everything into two PT_LOAD segments.

NB: On Cortex-M, VMA is simply the runtime address used by the CPU—there is no virtual memory or MMU

5.2 The linker script: MEMORY and SECTIONS

The default GNU ld script is designed for hosted Linux programs and is completely wrong for Cortex-M. An embedded linker script takes control of the entire memory map. Here is a realistic script for an STM32F407 (1 MB flash, 128 KB SRAM1):

/* stm32f407.ld */



ENTRY(Reset_Handler)



MEMORY {

FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 1024K

RAM (rwx) : ORIGIN = 0x20000000, LENGTH = 128K

}



SECTIONS {

/* - Flash */

.isr_vector : {

KEEP(*(.isr_vector)) /* Vector table must be first */

} > FLASH



.text : {

*(.text*) /* All Thumb-2 code */

*(.rodata*) /* String literals, const data */

. = ALIGN(4);

_etext = .; /* End of flash content */

} > FLASH



/* - RAM */

.data : {

_sdata = .;

*(.data*)

. = ALIGN(4);

_edata = .;

} > RAM AT > FLASH /* VMA=RAM, LMA=FLASH */



.bss : {

_sbss = .;

__bss_start__ = _sbss;

*(.bss*)

*(COMMON)

. = ALIGN(4);

_ebss = .;

__bss_end__ = _ebss;

} > RAM



/* Initial stack pointer: top of RAM */

_estack = ORIGIN(RAM) + LENGTH(RAM);

}


The symbols _sdata, _edata, _sbss, _ebss, and _etext are consumed by the startup assembly:

/* startup_stm32f407xx.s Reset_Handler */

Reset_Handler:

ldr sp, =_estack @ set initial stack pointer



@ Copy .data initializers from flash (LMA) to SRAM (VMA)

ldr r0, =_sdata @ destination start (RAM)

ldr r1, =_edata @ destination end (RAM)

ldr r2, =_etext @ source start (FLASH LMA)

copy_data:

cmp r0, r1

ittt lo

ldrlo r3, [r2], 4

strlo r3, [r0], 4

blo copy_data



@ Zero .bss in SRAM

ldr r0, =_sbss

ldr r1, =_ebss

mov r2, 0

zero_bss:

cmp r0, r1

it lo

strlo r2, [r0], 4

blo zero_bss



bl SystemInit

b main


Why .isr_vector must come first: On Cortex-M, the processor reads the initial stack pointer from address 0x08000000 and the Reset_Handler address from 0x08000004 on power-up. If anything lands before .isr_vector, the CPU reads garbage and locks up before a single instruction executes. KEEP also prevents gc-sections from eliminating the vector table, which would otherwise appear unreferenced from the linker's perspective.


6. Why Linking Order Matters

Object files passed directly on the command line are always included. Static libraries (.a archives such as libm.a or the HAL library) are handled differently: the linker pulls a member out of an archive only if that member defines a currently-unresolved symbol. Archives are scanned left to right, once.

Wrong: -lm scanned before main.o has recorded its undefined sqrtf

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \

-lm startup.o main.o -T stm32f407.ld -o fw.elf

ld: main.o: undefined reference to sqrtf'

Correct: object files first, libraries after

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \

startup.o main.o -lm -T stm32f407.ld -o fw.elf

When libraries depend on each other circularly -- which is common with

-lc, -lm, and -lgcc on bare-metal ARM -- a single left-to-right

pass cannot satisfy all references. --start-group tells the linker to

scan the grouped libraries repeatedly until no new symbols are resolved.





# For circular dependencies between libc, libm and libgcc

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \

startup.o main.o \

-Wl,--start-group -lc -lm -lgcc -Wl,--end-group \

-T stm32f407.ld -o firmware.elf

Dead-code elimination matters a lot on microcontrollers with limited flash. Place each function in its own section at compile time and let the linker garbage-collect the unreferenced ones:

Each function gets its own .text.funcname section

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \

-ffunction-sections -fdata-sections \

-c main.c -o main.o

Linker removes unreferenced sections from the final binary

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb \

startup.o main.o utils.o \

-Wl,gc-sections -Wl,-Map=firmware.map \

-lc -lm -lgcc -T stm32f407.ld -o firmware.elf

Check what was removed

$ grep "removed" firmware.map

*(.text.unused_sensor_driver)

utils.o:(.text.unused_sensor_driver)

KEEP and gc-sections: gc-sections starts from a root set (the entry point and any section wrapped in KEEP() in the linker script) and discards everything unreachable. The vector table and startup code must be protected with KEEP or the linker eliminates them, producing a binary that does nothing on power-up.


7. Conclusion

The linker performs three conceptually distinct operations on every firmware build:

- Symbol resolution matching every undefined BL target (and every external data reference) to exactly one strong definition, using the weak/strong binding rules that power the CMSIS interrupt-handler override mechanism.

- Layout assigning flash and RAM addresses to every output section according to the MEMORY and SECTIONS directives in the linker script, and exporting the boundary symbols (_sdata, _edata, _sbss, _ebss, _estack) that the startup assembly depends on.

- Relocation iterating over every R_ARM_THM_CALL entry (and others) and rewriting the placeholder BL encoding with the correct Thumb-2 PC-relative branch offset, including setting the Thumb interworking bit so the CPU never faults on a mode transition.

With this model in place, any linker error becomes diagnosable. Undefined reference means a symbol was never defined or the defining archive came before the object that needed it. HardFault on boot usually means the vector table missed flash origin, or _estack points into an invalid RAM region, or the .data copy loop read from _etext under a different name than the startup code expected. Data that is always zero at runtime means the initial-value copy loop never ran or ran with wrong boundary symbols.


Want to master this? Here are your next steps:



Created with