May 10 / Sebastian Helmut

Linux Memory Management Explained: MMU, Page Tables, and Page Faults

Don't hesitate

Get the Professional Embedded Starter Kit: Production-ready templates and architectural cheat sheets for your firmware projects.


Linux Memory Management Explained: MMU, Page Tables, and Page Faults


1. Introduction

Every program you write uses memory addresses that do not correspond to real locations in physical RAM. When your code reads from address 0x400000, that address means nothing to the hardware until something translates it. That something is the Memory Management Unit, or MMU.

The MMU is one of the most misunderstood pieces of the hardware-software boundary. Many developers assume it is part of the kernel, or that the kernel performs address translation in software. Neither is true. The MMU is a hardware component integrated into the CPU, and it performs address translation autonomously on every single memory access. The kernel's role is not to translate addresses. It is to configure the structures the MMU uses to do so.

Understanding how these two work together explains some of the most fundamental behaviors of Linux: process isolation, demand paging, page faults, and copy-on-write.


2. Virtual vs Physical Addresses: The Core Problem

When a process runs, it operates entirely in virtual address space. It sees a large, flat, contiguous range of addresses starting from zero. This is an illusion. Physical RAM is finite, shared between all running processes, and almost certainly fragmented.

The gap between what a process sees and what actually exists in hardware is bridged by address translation. Every memory access a process makes uses a virtual address. The MMU translates that virtual address into a physical address before the access reaches RAM.

This separation gives Linux three critical properties. First, each process gets its own private address space so process A cannot read or write process B's memory even if they use the same virtual address. Second, the kernel can place physical pages anywhere in RAM regardless of where the process expects them to be. Third, the kernel can defer allocating physical memory until a process actually touches a page, a technique called demand paging.

Fig 1. Two processes share physical RAM through separate virtual address spaces.

Struggling to implement this for a professional project? If you need to master full-scale firmware architecture, security, and build automation, join our 1-D or 4-Day Live Implemnetation Workshops. I'll show you the exact direct path to production-ready firmware without the trial and error.


3. The MMU: What It Is and What It Does

3.1 The MMU Is Hardware, Not Software

The MMU is a dedicated hardware block integrated directly into the CPU alongside the cache controllers and execution units. It operates at hardware speed because it has to. Every load, every store, every instruction fetch goes through address translation. On a modern system running at several gigahertz, that means billions of translations per second. No software could keep up.

The kernel does not perform translations. It sets up data structures in RAM that the MMU reads autonomously. Once those structures are in place, the MMU does its job without kernel involvement on every normal memory access.

3.2 Address Translation Mechanics

A process can address up to 256 terabytes of virtual memory on a 64-bit system. Storing a flat table that maps every possible virtual page to a physical page would require gigabytes of RAM per process just for the table itself. The solution is a tree-structured page table that only allocates entries for regions the process actually uses.

The virtual address is split into several fields, each acting as an index into one level of the tree. The MMU starts at the root of the tree, uses the first field to find the next level, uses the second field to go one level deeper, and so on until it reaches the final entry which contains the physical page number. The remaining bits of the virtual address are an offset within that physical page and pass through unchanged.

On a 64-bit Linux system this tree has four levels named PGD, PUD, PMD and PTE. The CPU holds the address of the root table in a dedicated register. On x86 this is CR3. On ARM this is TTBR0 for user space and TTBR1 for kernel space. When the kernel switches processes it updates this register, instantly switching the entire address space.

[47:39] PGD index (9 bits) : indexes into top level table

[38:30] PUD index (9 bits) : indexes into second level

[29:21] PMD index (9 bits) : indexes into third level

[20:12] PTE index (9 bits) : indexes into final level

[11:0] page offset (12 bits) : byte offset within the physical page

Fig 2. 4-level page table tree: each field of the virtual address indexes one level until the physical page is found.

3.3 The TLB: Translation Cache

Walking four levels of page tables on every memory access would be prohibitively slow. The TLB, Translation Lookaside Buffer, solves this by caching recent translations. When the MMU needs to translate a virtual address it checks the TLB first. If the translation is cached, the physical address is available immediately. If not, the MMU performs the full page table walk and caches the result.

A TLB miss is expensive because it requires multiple RAM accesses to walk the page table. Applications that access memory with good spatial locality keep the TLB warm and pay very little translation overhead. Applications that jump randomly across large memory regions suffer frequent TLB misses.

4. Page Tables: The Kernel's Side of the Deal

Page tables are the shared data structure that connects the kernel and the MMU. The kernel allocates and manages them. The MMU reads them on every translation.

Each entry in the final level of a page table contains the physical page number that backs a given virtual page, along with a set of permission bits. These bits control whether the page is readable, writable, and executable, and whether it is accessible from user space or kernel space only. The MMU enforces these permissions in hardware on every access. Violating them generates a fault immediately.

When a process is created, the kernel allocates a fresh set of page tables for it and loads the top-level address into the CPU register. From that point, the MMU uses those tables for every translation the process makes. When the kernel switches between processes, it updates that register to point to the new process's page tables. This is how each process gets its own isolated view of memory.

The kernel does not fill in page table entries immediately when a process requests memory. It creates a record of the mapping but leaves the page table entry empty. Physical memory is allocated only when the process actually accesses the page. This is demand paging.

5. Page Faults: How the MMU and Kernel Communicate

5.1 What Triggers a Page Fault

When the MMU cannot complete a translation, it raises a page fault exception. This happens in three situations: the page table entry is not present, meaning the physical page has not been allocated yet or has been swapped to disk; the access violates the permission bits in the entry; or the virtual address falls outside any mapped region entirely.

5.2 The Page Fault Handler Flow

A page fault transfers control from the MMU to the kernel's page fault handler. The handler receives the faulting virtual address and the reason for the fault. It looks up the process's memory map to decide what to do.

If the fault is a valid demand paging fault, the handler allocates a physical page, fills it with the appropriate content, updates the page table entry with the new physical address, flushes the relevant TLB entry, and returns control to the process. The faulting instruction is retried and succeeds.

char buf = malloc(1024 1024); // kernel creates VMA, no physical page yet

buf[0] = 'A'; // first access triggers page fault

/ / kernel allocates page, updates PTE

buf[1] = 'B'; // TLB hit, no fault, direct access

Fig 3. Page fault sequence: MMU raises exception, kernel allocates page, updates page table, returns control.

5.3 Demand Paging and Copy-on-Write

Demand paging means physical memory is only allocated when a page is first touched. This is why a process that allocates a large buffer does not immediately consume RAM. Pages are faulted in one at a time as the process accesses them.

Copy-on-write takes this further. When a process calls `fork()`, the kernel does not copy the parent's physical pages. Instead it marks all shared pages as read-only in both processes' page tables. The first time either process writes to a shared page, the MMU raises a fault. The kernel then allocates a new physical page, copies the content, and updates the page table entry. The write proceeds on the private copy.

pid_t pid = fork(); / / child shares parent pages, all marked read-only

if (pid == 0)

buf[0] = 'X'; // write triggers fault, kernel copies page

Fig 4. Copy-on-write: parent and child share pages after fork, split on first write.


6. Process Isolation: Why Each Process Sees Its Own Memory

Process isolation is a direct consequence of each process having its own page tables. Two processes can have the same virtual address mapped to completely different physical pages. When the kernel schedules a process, it loads that process's top-level page table address into the CPU register. Every translation the MMU performs from that point uses that process's mappings.

User space has no way to access physical addresses directly. It only ever sees virtual addresses. This means a bug in one process cannot corrupt another process's memory. The MMU enforces the boundary in hardware, not in software.


7. Conclusion

The MMU and the Linux kernel divide the work of memory management along a clear boundary. The MMU handles the speed-critical job of translating every virtual address to a physical address using page tables configured by the kernel. The kernel handles the policy: deciding when to allocate physical pages, how to set permissions, and how to respond when translations fail.

Page faults are the primary communication channel between the two. Every demand-paging allocation, every copy-on-write split, and every access violation passes through the page fault handler. Understanding this flow explains behaviors that would otherwise seem mysterious: why a freshly allocated buffer does not consume RAM, why `fork()` is fast, and why a segfault happens instantly when you access an unmapped address.


Want to master this? Here are your next steps:




Created with