Get the Professional Embedded Starter Kit: Production-ready templates and architectural cheat sheets for your firmware projects.
ARM Calling Convention on Cortex-M: How Arguments, Registers and the Stack Really Work
1.
Introduction
Every time your C code calls a function, the compiler silently follows a set of rules that governs how arguments travel to the callee, how the return value comes back, and which registers each side is responsible for saving. These rules are defined by the ARM Architecture Procedure Call Standard (AAPCS) the ABI that every compiler, every library, and every startup file on Cortex-M must agree on.
When you write C exclusively, the compiler handles all of this automatically. You never see it. But the moment you write assembly that calls a C function or a C function that calls assembly you are responsible for following these rules yourself. Get them wrong and the program silently corrupts data, crashes, or produces wrong results with no helpful error message.
This article makes the calling convention concrete. We start with the register map, work through argument passing, and finish with a real context switch in assembly that calls a C scheduler function a case where the compiler cannot help you and you must apply the ABI rules by hand.
2.
Registers and Their Roles
The Cortex-M4 has sixteen 32-bit registers. The AAPCS assigns a specific role to each one:
Register |
Alias |
Role |
Who saves it |
R0 |
a1 |
Argument 1 / return value |
Caller |
R1 |
a2 |
Argument 2 |
Caller |
R2 |
a3 |
Argument 3 |
Caller |
R3 |
a4 |
Argument 4 |
Caller |
R4 |
v1 |
Callee-saved variable register |
Callee |
R5 |
v2 |
Callee-saved variable register |
Callee |
R6 |
v3 |
Callee-saved variable register |
Callee |
R7 |
v4 |
Callee-saved variable register |
Callee |
R8 |
v5 |
Callee-saved variable register |
Callee |
R9 |
v6 |
Callee-saved variable register |
Callee |
R10 |
v7 |
Callee-saved variable register |
Callee |
R11 |
v8 |
Callee-saved variable register |
Callee |
R12 |
ip |
Intra-procedure scratch |
Caller |
R13 |
sp |
Stack pointer |
Special |
R14 |
lr |
Link register (return address) |
Caller |
R15 |
pc |
Program counter |
Special |
The critical split is between caller-saved and callee-saved registers:
- Caller-saved (r0r3, r12, lr): if the caller needs these values after a bl instruction, it must save them before the call. The callee is free to overwrite them.
- Callee-saved (r4r11): if the callee uses these registers, it must push them on entry and pop them on exit. The caller can rely on them being unchanged across any function call.
This is not arbitrary. It means a leaf function that only uses r0r3 needs no prologue and no epilogue zero overhead. Only functions that need persistent state across calls pay the cost of saving and restoring registers.
Struggling to implement this for a professional project? If you need to master full-scale firmware architecture, security, and build automation, join our 1-D or 4-Day Live Implemnetation Workshops. I'll show you the exact direct path to production-ready firmware without the trial and error.
3.
Passing Arguments and Returning Values
The four-register rule
The first four integer or pointer arguments go in r0, r1, r2, r3 in left-to-right order. The compiler generates this:
/* C source */ int add4(int a, int b, int c, int d);
/* Call site */ int result = add4(1, 2, 3, 4); |
@ Compiler output (arm-none-eabi-objdump -d) mov r0, #1 @ arg1 mov r1, #2 @ arg2 mov r2, #3 @ arg3 mov r3, #4 @ arg4 bl add4 @ call return value arrives in r0 |
Clean and zero overhead. All four arguments fit in registers, no stack touched.
When arguments spill to the stack
The fifth argument and beyond are pushed onto the stack before the bl, in right-to-left order. The caller is responsible for cleaning the stack after the call returns:
int add5(int a, int b, int c, int d, int e); int result = add5(1, 2, 3, 4, 5); |
mov r0, #1 @ arg1 mov r1, #2 @ arg2 mov r2, #3 @ arg3 mov r3, #4 @ arg4 push {r3} @ arg5 pushed onto stack < spill bl add5 add sp, #4 @ caller cleans the stack < cleanup |
On the callee side, add5 finds the fifth argument by reading [sp] on entry because the bl instruction itself does not move the stack pointer, the fifth argument sits exactly at [sp] when the function body begins.
Returning values
- Single 32-bit value (int, pointer, uint32_t): returned in r0.
- 64-bit value (int64_t, double): returned in r0:r1 pair, low word in r0.
- Struct up to 4 bytes: returned in r0.
- Struct larger than 4 bytes: the caller allocates space and passes a hidden pointer in r0; the callee writes the result there.
Stack alignment
The AAPCS requires the stack to be 8-byte aligned at every public function call boundary. The Cortex-M hardware enforces this on exception entry automatically. In hand-written assembly you must enforce it yourself pushing an odd number of registers and then calling a C function will misalign the stack and corrupt any code that relies on 8-byte-aligned stack accesses (which includes all double and int64\_t operations).
Fig
1. The function call stack frame lifecycle.
Fig 2. Argument passing side by side: four arguments fit entirely in r0--r3 with the stack untouched. A fifth argument spills to [sp] and the caller cleans up with add sp, #4
4.
Assembly Calling C: the Context Switch
A context switch is the canonical example of assembly that must call a C function while manually following every ABI rule. The compiler cannot write a context switch for you because it does not know about the Process Stack Pointer (PSP), the Task Control Block (TCB), or the need to save and restore registers that it would normally treat as scratch space.
The PendSV exception is the standard Cortex-M mechanism for triggering a context switch at the lowest interrupt priority. Here is a complete, minimal implementation:
The C side scheduler
/* scheduler.h */ typedef struct { uint32_t *stack_pointer; /* other TCB fields */ } TCB_t;
extern TCB_t *current_tcb; extern TCB_t *next_tcb;
/* Called from assembly with r0 = current task's updated SP */ uint32_t *vTaskSwitchContext(uint32_t *sp);
|
/* scheduler.c */ uint32_t *vTaskSwitchContext(uint32_t *sp) { current_tcb->stack_pointer = sp; /* save current SP into TCB */ current_tcb = next_tcb; /* switch to next task */ return current_tcb->stack_pointer; /* return new task SP */ } |
The assembly side PendSV handler
/* pendsv.s */ .syntax unified .thumb
.global PendSV_Handler .type PendSV_Handler, %function
PendSV_Handler: /* Step 1: save callee-saved regs of the current task. The CPU already saved r0-r3, r12, lr, pc, xPSR onto the PSP stack automatically on exception entry. We must save r4-r11 manually they are callee-saved and the scheduler C function may use them. */ mrs r0, psp @ r0 = current task PSP stmdb r0!, {r4-r11} @ push r4-r11 onto task stack @ r0 now points to top of saved frame
/* Step 2: call the C scheduler. ABI: argument in r0 (current SP), return value in r0 (new SP). We saved r4-r11 above so the C function can use them freely. */ bl vTaskSwitchContext @ r0 = new task stack pointer
/* Step 3: restore callee-saved regs of the next task. r0 holds the new task SP returned by vTaskSwitchContext. */ ldmia r0!, {r4-r11} @ pop r4-r11 from new task stack msr psp, r0 @ update PSP to new task
/* Step 4: return from exception. EXC_RETURN in LR tells the CPU to restore r0-r3, r12, lr, pc, xPSR from the PSP stack resuming the new task. */ bx lr |
Why each line is forced by the ABI
Instruction |
ABI reason |
mrs r0, psp |
Argument must be in r0 per AAPCS |
stmdb r0!, {r4-r11} |
Callee (C function) may clobber r4-r11; we save them first |
bl vTaskSwitchContext |
Standard call; return value arrives in r0 |
ldmia r0!, {r4-r11} |
Restore next task's callee-saved registers |
msr psp, r0 |
Update PSP after ldmia adjusted r0 past the saved frame |
bx lr |
EXC_RETURN value CPU unwinds exception frame from PSP |
The
C function vTaskSwitchContext can use r4r11 freely because we saved
them before the call. It must save and restore any of r4r11 it uses
internally which the compiler guarantees automatically via the
function prologue/epilogue it generates. This is the ABI working as
designed: both sides follow the same contract and neither corrupts
the other.Fig
3. PendSV context switch register save and restore.
5.
Conclusion
The ARM calling convention rests on three rules that every line of assembly interacting with C must respect:
- Arguments go in r0r3. The fifth argument and beyond spill to the stack and the caller cleans them up.
- r0r3 and r12 are caller-saved. If you need them after a bl, save them yourself before the call.
- r4r11 are callee-saved. If your assembly uses them, push them on entry and pop them on exit. If you call a C function, save them before the call because the compiler assumes it can use them freely inside any C function.
The context switch example shows why these rules matter in practice. The compiler cannot write PendSV_Handler because the ABI rules must be applied with knowledge of two separate stacks, the PSP/MSP distinction, and the CPU's automatic exception frame none of which the compiler knows about. Understanding the ABI is what lets you write this code correctly and confidently.
