May 14 / Sebastian Helmut

How Linux Loads Shared Libraries: Dynamic Linking, GOT, and PLT

Don't hesitate

Get the Professional Embedded Starter Kit: Production-ready templates and architectural cheat sheets for your firmware projects.

How Linux Loads Shared Libraries: Dynamic Linking, GOT, and Lazy Binding

1. Introduction

When the linker builds a standalone executable it assigns a concrete address to every function and every global variable. The CPU jumps to those addresses directly. That model is simple and efficient but it only works when you know at build time exactly where in memory the code will run.

Shared libraries break that assumption entirely. On Linux, a single copy of libc.so is mapped into dozens of processes simultaneously, and each process sees it at a different virtual address. If the library contained absolute addresses baked in at build time, it would work in exactly one process and crash in every other. The kernel cannot rearrange memory just to satisfy a hardcoded pointer inside a .so file.

The solution is Position Independent Code (PIC): machine code that produces correct results regardless of where in memory it is loaded. PIC does not rely on any symbol landing at a fixed address. Instead it uses PC-relative branches for calls within the library, and a runtime-patched pointer table for anything that lives outside it.

This article follows a shared library from compilation through dynamic loading, examining the two data structures that make PIC possible: the Global Offset Table (GOT), which holds the runtime addresses of external symbols, and the Procedure Linkage Table (PLT), which enables lazy resolution of external function calls. We also look at what the dynamic linker does before main() ever runs.

2. The Problem with Absolute Addresses

2.1 Why fixed addresses break shared libraries

When the linker builds a standalone executable it patches every address placeholder with a concrete value. The OS loads the binary at a known base address and everything lands where the linker expected.

A shared library is different. Consider two processes, A and B, both linking against libfoo.so:

- Process A has already loaded several other libraries. The first free virtual address slot is 0x7f3a00000000. The OS maps libfoo.so there.

- Process B has a different memory layout. Its first free slot is 0x7f8b00000000. The OS maps the same libfoo.so there.

If libfoo.so contained the instruction ldr r0, =0x7f3a00001234 (an absolute load of a global variable), it would work in process A and produce a segfault in process B. The library cannot know at compile time which address it will land at.

2.2 Why PIC is the solution

The solution is to never emit absolute addresses for anything that might move. Instead:

- Code references other code using PC-relative branches. A bl or b instruction encodes a signed offset from the current PC. As long as the caller and callee stay at the same relative distance, the branch works at any base address.

- Data (global variables, external function addresses) is accessed through a table of pointers the GOT whose entries are filled in at load time by the dynamic linker with the correct runtime addresses.

Fig 1. Two processes load the same shared library at different virtual addresses. Absolute addresses embedded at link time would work in one process and crash in the other.

Struggling to implement this for a professional project? If you need to master full-scale firmware architecture, security, and build automation, join our 1-D or 4-Day Live Implemnetation Workshops. I'll show you the exact direct path to production-ready firmware without the trial and error.

3. Position Independent Code

3.1 PC-relative addressing

On ARM Thumb-2, almost all branch instructions are already PC-relative. A bl foo encodes the signed offset from the current PC to foo, not an absolute address. If the whole library shifts by 1 MB, both the caller and foo shift by 1 MB, so the offset stays the same and the branch still works.

The same applies to data accessed within the same translation unit using adr or PC-relative load instructions. The problem only arises when code needs to reach a symbol defined in a different shared library, or a global variable whose address is not known until the dynamic linker runs.

3.2 How the compiler generates PIC

Compile with -fPIC to tell the compiler to generate position-independent code:

$ arm-none-eabi-gcc -mcpu=cortex-m4 -mthumb -fPIC -shared \

-c libfoo.c -o libfoo.o

With -fPIC enabled, the compiler never emits absolute addresses for globals or external calls. Instead it emits GOT-relative or PC-relative sequences. You can see the difference with and without the flag:

WITHOUT -fPIC: absolute address of global_var baked in

ldr r0, =0x20001234 @ absolute breaks if library moves

WITH -fPIC: address fetched through GOT at runtime

ldr r0, [pc, offset] @ load GOT entry address (PC-relative)

ldr r0, [r0] @ dereference: get actual address of global_var

ldr r0, [r0] @ load the value

The first load is PC-relative and always works. The second load reads a pointer that the dynamic linker will have written at load time. The compiler does not know the final address it only knows that there will be a slot in the GOT for it.

4. The Global Offset Table

4.1 Purpose and structure

The GOT is a section (.got) in the shared library ELF file. It is an array of pointer-sized slots. At compile time each slot holds zero or a placeholder. At load time the dynamic linker fills each slot with the correct runtime address of the symbol it represents.

The GOT itself lives at a fixed offset from the code, so PC-relative loads can always reach it. But the values inside the GOT slots are absolute addresses that only make sense at runtime.

The first three entries of the GOT are reserved:

GOT[0]	Address of the .dynamic section
GOT[1]	Pointer to the link map (set by the dynamic linker)
GOT[2] \|	Address of the dynamic linker's resolver function

All subsequent entries are one per external symbol the library references.

4.2 How PIC code accesses globals through the GOT

When PIC code in libfoo.so wants to read a global variable counter defined in libbar.so, the compiled sequence is:

@ Step 1: load the address of the GOT slot for counter (PC-relative, always valid)

ldr r3, [pc, got_offset] @ r3 = &GOT[counter]

@ Step 2: load the runtime address of counter from the GOT slot

ldr r3, [r3] @ r3 = GOT[counter] = actual address of counter

@ Step 3: load the value

ldr r0, [r3] @ r0 = counter

Step 1 is a PC-relative load it works regardless of where the library is mapped. Step 2 reads a pointer that the dynamic linker wrote into the GOT at load time. Step 3 is a normal indirect load.

Fig 2. PIC code reaches a global variable through the GOT. The PC-relative load in step 1 is position-independent. The GOT slot filled by the dynamic linker in step 2 provides the absolute runtime address.

5. The Procedure Linkage Table

5.1 Purpose and structure

The GOT handles data. External function calls need a similar mechanism, but with one important optimization: lazy binding. Most programs call only a fraction of the functions available in a shared library. Resolving every symbol at startup would add measurable latency. The PLT defers resolution until the first actual call.

The PLT is a section (.plt) containing small stubs one per external function. Each stub is a few instructions long. On the first call the stub triggers symbol resolution. On every subsequent call the stub jumps directly to the resolved function.

5.2 Lazy binding: how the first call resolves a symbol

Each PLT stub has a corresponding GOT entry (in .got.plt). Before the first call, that GOT entry points back into the PLT stub itself (specifically to its second instruction). This creates the lazy resolution loop:

1. Caller executes bl printf@plt jumps to the PLT stub for printf.

2. PLT stub loads the GOT entry for printf and branches to it.

3. Before resolution, the GOT entry points back to the PLT's resolver trampoline.

4. The resolver trampoline calls the dynamic linker's _dl_runtime_resolve.

5. The dynamic linker looks up printf in libc.so, finds its address, and overwrites the GOT entry with the real address.

6. The dynamic linker jumps to printf. The function executes normally.

On every call after the first:

1. Caller executes bl printf@plt.

2. PLT stub loads the GOT entry which now holds the real address of printf.

3. PLT stub branches directly to printf. No dynamic linker involved.

5.3 Step by step: what happens on the first and subsequent calls

The PLT stub itself never changes. It is always the same two instructions: 

@ PLT stub for printf -- code is identical on every call

printf@plt:

ldr ip, [pc, #got_printf_offset] @ load address of GOT slot

ldr pc, [ip] @ jump to whatever GOT slot contains

What changes is the content of the GOT slot, not the code. These are the two states it moves through:
Before the first call GOT slot points to the resolver trampoline:

@ GOT_printf = address of PLT resolver trampoline (set by ld.so at load time)

@ Result: PLT stub jumps into trampoline -> _dl_runtime_resolve -> finds printf

@ ld.so then writes real address of printf into GOT_printf

@ ld.so jumps to printf -- function executes normally

After the first call -- GOT slot points directly to printf:

@ GOT_printf = 0x7f3a001234 (real address of printf in libc.so)

@ Result: PLT stub jumps directly to printf -- dynamic linker never involved

The key insight is that the GOT entry is writable (it lives in a read-write segment) while the PLT stub is in the read-only code segment. The dynamic linker patches the GOT, not the code. This is why W^X (write-xor-execute) policies require special handling for lazy binding.

Fig 3. PLT lazy binding sequence. On the first call the dynamic linker resolves the symbol and patches the GOT entry. On every subsequent call the PLT stub jumps directly to the resolved function, bypassing the dynamic linker entirely.

6. The Dynamic Linker

6.1 What ld.so does at startup

When the OS executes a dynamically linked program, control does not go to main() first. The ELF PT_INTERP segment names the dynamic linker (/lib/ld-linux.so on Linux). The kernel maps the dynamic linker into the process and transfers control to it.

The dynamic linker then:

1. Maps all required shared libraries (.so files) listed in the binary's DT_NEEDED entries.

2. Performs load-time relocations: patches GOT entries for symbols that cannot be lazily bound (e.g. data symbols, or when LD_BIND_NOW=1 is set).

3. Runs each library's initialization functions (.init and .init_array sections).

4. Transfers control to the program's entry point.

6.2 Symbol lookup and binding

When the dynamic linker resolves a symbol either at load time or lazily through the PLT it searches shared libraries in a defined order:

1.LD_PRELOAD libraries (searched first, highest priority)

2.The executable itself

3.Libraries listed in DT_NEEDED, in order

The first library that provides a strong definition of the symbol wins. This is the same strong/weak rule for compile time link, now applied at runtime instead of link time. LD_PRELOAD exploits this: a library listed there is searched first, allowing any symbol to be overridden without recompiling.

Inspect what the dynamic linker will load for a binary

$ readelf -d firmware_app | grep NEEDED

0x00000001 (NEEDED) Shared library: [libfoo.so.1]

0x00000001 (NEEDED) Shared library: [libc.so.6]

Force early symbol resolution (no lazy binding)

$ LD_BIND_NOW=1 ./firmware_app

Override a symbol with a custom implementation

$ LD_PRELOAD=./my_malloc.so ./firmware_app

Fig 4. Dynamic linker startup flow: the kernel hands control to ld.so, which maps libraries, patches the GOT, runs initializers, then transfers control to main(). Lazy PLT resolution happens later on first call.

7. Conclusion

Position Independent Code rests on two complementary mechanisms. PC-relative branches let code call other code within the same library without knowing its load address. The GOT and PLT let code reach symbols in other libraries through a layer of indirection that the dynamic linker fills in at runtime.

The three things to remember are:

- GOT holds runtime addresses of data symbols, written by the dynamic linker at load time.

- PLT provides per-function stubs that trigger lazy resolution on first call and become direct jumps thereafter.

- The dynamic linker owns the startup sequence: it maps libraries, patches the GOT, runs initializers, and hands control to main().

We are an online educational platform that helps professionals and aspiring individuals to succeed in their goals.

How Linux Loads Shared Libraries: Dynamic Linking, GOT, and PLT

Get the Professional Embedded Starter Kit: Production-ready templates and architectural cheat sheets for your firmware projects.

Featured links

Connect with us