Break, Remap, Debug: Using the FPB Unit on ARM Cortex-M
1. Introduction
When you debug code that lives in Flash, "software" breakpoints aren't always enough. The Flash Patch and Breakpoint (FPB) unit is a tiny hardware block in the ARM debug fabric that lets you do two powerful things without touching your binary: set true hardware breakpoints on instruction fetches, and—optionally—replace the fetched instruction with one you choose. In practice, that means you can halt exactly at a function in Flash, or even redirect the first instruction to a small veneer to tweak behavior for tests and hotfixes.
2. FPB Features
What is FPB?
The Flash Patch & Breakpoint (FPB) is a CoreSight debug component in ARMv7-M/ARMv7E-M (e.g., STM32H7) that watches instruction/literal fetches and can either raise a debug event or substitute the fetched word, enabling non-intrusive breakpoints and lightweight patches in code that runs from Flash.
2.1 Hardware Breakpoints
The FPB's comparators watch instruction (or literal) fetches in the CODE region. When a comparator matches an address, the core raises a debug event—either halting in place (halt-mode) or entering the DebugMonitor exception—so you can stop in Flash without changing your binary.
To work, you must set both enables: the global FP_CTRL.ENABLE and the per-comparator FP_COMP[n].ENABLE. Matches are at halfword granularity, so the address's bit1 selects the lower/upper halfword you're targeting. This is the simplest, most reliable way to break on code in Flash, and it's what most debuggers use under the hood.

Figure 1: FPB Hardware Breakpoint
2.2 Flash Patch (Remap) — Lightweight Instruction Replacement
Flash Patch (remap) lets FPB swap one fetched instruction word with a word you pre-load in RAM. You point FP_REMAP at a 32-byte table (8×32-bit) in SRAM—it must be 32-byte aligned and lives in SRAM because hardware forces FP_REMAP[31:29]=0b001. ARM comparator n at the target address; when the CPU fetches there, FPB substitutes table[n] for the original 32-bit fetch.
In Thumb, each 32-bit fetch contains two 16-bit halves: the lower halfword at address A and the upper at A+2. The REPLACE field chooses what you override: LOWER (just bits 15:0 at A), UPPER (bits 31:16 at A+2), or BOTH (all 32 bits—use this for 32-bit Thumb-2 ops like B.W). Typical uses are swapping a single instruction, tweaking a literal, or replacing the first instruction with a branch veneer in Flash (needed because a single B.W only reaches ±16 MB, so long jumps to RAM require a veneer).

Figure 2: FPB Remap Flow
Credit: The Definitive Guide to ARM Cortex-M3 and Cortex-M4
3. How to Use FPB
3.1 Hardware Breakpoint
To halt when test() is fetched from Flash without touching the app, first enable CoreSight access (DEMCR.TRCENA=1) and turn on the FPB (FP_CTRL.ENABLE=1). Then program one comparator (e.g., FP_COMP[0]) with the first-instruction address of test(): clear the Thumb bit (bit0) but preserve bit1 so the comparator matches the correct halfword; set the comparator's ENABLE bit. Issue DSB; ISB barriers, and call test(). On the next fetch, the FPB match raises a debug event—your debugger halts immediately (halt mode) or the DebugMonitor exception fires—giving you a true hardware breakpoint in Flash with zero code changes.
typedef struct {
volatile uint32_t FP_CTRL, FP_REMAP, FP_COMP[8];
} FPB_Type;
#define FPB ((FPB_Type*)0xE0002000UL)
#define FPB_CTRL_ENABLE (1u<<0)
#define FPB_COMP_ENABLE (1u<<0)
void test(void) {
/* debugger halts here */
__NOP();
}
static inline void fpb_enable(void) {
/* allow FPB/DWT/ITM */
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
FPB->FP_CTRL = FPB_CTRL_ENABLE;
__DSB();
__ISB();
}
void demo_breakpoint(void) {
fpb_enable();
uint32_t a = ((uint32_t)(uintptr_t)test) & ~1u;
FPB->FP_COMP[0] = (a & 0x1FFFFFFCu) | (a & 0x2u) | FPB_COMP_ENABLE;
__DSB();
__ISB();
test();
}
int main(void) {
HAL_Init();
SystemClock_Config();
demo_breakpoint();
while (1) {}
}
3.2 Flash Patch (Remap)
To change behavior at runtime without touching Flash, use FPB's remap path: instead of executing the word fetched from Flash, the core can substitute a word you preload in a tiny table in SRAM. Point FP_REMAP at a 32-byte (8×32-bit) table that's 32-byte aligned, then arm a comparator on the first instruction address of the function you want to intercept (clear the T-bit, but preserve bit1 so the correct halfword is selected).
For a 32-bit replacement (e.g., injecting a B.W), set REPLACE_BOTH; after a DSB; ISB, the next fetch at that address will read your replacement word from the table, not from Flash. A practical pattern is to inject a 32-bit branch to a tiny veneer in Flash (keeps within the ±16 MB range of B.W), do your tweak there, then BX LR back.
/* 3.2 Flash patch (remap): replace first fetch of test() with a 32-bit B.W to a veneer */
#define FPB_COMP_REPLACE_LOWER (1u<<30)
#define FPB_COMP_REPLACE_UPPER (2u<<30)
#define FPB_COMP_REPLACE_BOTH (3u<<30)
__attribute__((aligned(32))) static uint32_t fpb_table[8];
__attribute__((noinline)) static void veneer_dec(void) {
/* example: adjust state, then return */
__NOP();
__asm volatile("bx lr");
}
/* Encode Thumb-2 unconditional B.W (T4) */
static inline uint32_t encode_bw(uint32_t pc, uint32_t target_t) {
uint32_t tgt = target_t | 1u; /* ensure Thumb bit */
int32_t imm = (int32_t)tgt - (int32_t)pc; /* bytes */
imm >>= 1; /* halfword scale */
uint32_t S=(imm>>20)&1u, imm10=(imm>>11)&0x3FFu, imm11=imm&0x7FFu;
uint16_t hi=(uint16_t)(0xF000 | (S<<10) | 0x0800 | imm10);
uint16_t lo=(uint16_t)(0xF800 | (S<<10) | imm11);
return ((uint32_t)hi<<16) | lo;
}
void demo_patch(void) {
/* 1) Enable CoreSight + FPB */
CoreDebug->DEMCR |= CoreDebug_DEMCR_TRCENA_Msk;
FPB->FP_CTRL = FPB_CTRL_ENABLE;
__DSB();
__ISB();
/* 2) Program FP_REMAP to our 32-byte table in SRAM */
uint32_t base = ((uint32_t)(uintptr_t)fpb_table) & ~0x1Fu; /* align */
FPB->FP_REMAP = base;
/* 3) Build the replacement word: a 32-bit branch to a small veneer in FLASH */
uint32_t entry = ((uint32_t)(uintptr_t)test) & ~1u; /* clear T-bit */
uint32_t pc_for_b = entry + 4u; /* PC during 32-bit fetch */
uint32_t bw = encode_bw(pc_for_b, (uint32_t)(uintptr_t)veneer_dec);
/* 4) Write the same replacement into all 8 slots (keeps it simple) */
for (uint32_t s=0; s<8; ++s)
((volatile uint32_t*)(base + s*4u))[0] = bw;
__DSB();
__ISB();
/* 5) Arm comparator 0 on the exact entry halfword, REPLACE_BOTH for 32-bit */
FPB->FP_COMP[0] = (entry & 0x1FFFFFFCu) | (entry & 0x2u) /* keep bit1 (halfword) */
| FPB_COMP_REPLACE_BOTH | FPB_COMP_ENABLE;
__DSB();
__ISB();
/* Call: first fetch of test() is replaced by our B.W → veneer runs, then returns */
test();
}
4. Conclusion
FPB shines when you need true hardware breakpoints in Flash without touching the binary and when you want fast, reversible hot patches—swap a single instruction, tweak a literal, or redirect a function prologue to a small veneer. It's especially useful for production debugging and in-field diagnostics where reflashing is risky.