May 12 / Sebastian Helmut

ARM Calling Convention on Cortex-M: How Arguments, Registers and the Stack Really Work

Don't hesitate

Get the Professional Embedded Starter Kit: Production-ready templates and architectural cheat sheets for your firmware projects.

ARM Calling Convention on Cortex-M: How Arguments, Registers and the Stack Really Work


1. Introduction

Every time your C code calls a function, the compiler silently follows a set of rules that governs how arguments travel to the callee, how the return value comes back, and which registers each side is responsible for saving. These rules are defined by the ARM Architecture Procedure Call Standard (AAPCS) the ABI that every compiler, every library, and every startup file on Cortex-M must agree on.

When you write C exclusively, the compiler handles all of this automatically. You never see it. But the moment you write assembly that calls a C function or a C function that calls assembly you are responsible for following these rules yourself. Get them wrong and the program silently corrupts data, crashes, or produces wrong results with no helpful error message.

This article makes the calling convention concrete. We start with the register map, work through argument passing, and finish with a real context switch in assembly that calls a C scheduler function a case where the compiler cannot help you and you must apply the ABI rules by hand.


2. Registers and Their Roles

The Cortex-M4 has sixteen 32-bit registers. The AAPCS assigns a specific role to each one:

Register

Alias

Role

Who saves it

R0

a1

Argument 1 / return value

Caller

R1

a2

Argument 2

Caller

R2

a3

Argument 3

Caller

R3

a4

Argument 4

Caller

R4

v1

Callee-saved variable register

Callee

R5

v2

Callee-saved variable register

Callee

R6

v3

Callee-saved variable register

Callee

R7

v4

Callee-saved variable register

Callee

R8

v5

Callee-saved variable register

Callee

R9

v6

Callee-saved variable register

Callee

R10

v7

Callee-saved variable register

Callee

R11

v8

Callee-saved variable register

Callee

R12

ip

Intra-procedure scratch

Caller

R13

sp

Stack pointer

Special

R14

lr

Link register (return address)

Caller

R15

pc

Program counter

Special

The critical split is between caller-saved and callee-saved registers:

- Caller-saved (r0r3, r12, lr): if the caller needs these values after a bl instruction, it must save them before the call. The callee is free to overwrite them.

- Callee-saved (r4r11): if the callee uses these registers, it must push them on entry and pop them on exit. The caller can rely on them being unchanged across any function call.

This is not arbitrary. It means a leaf function that only uses r0r3 needs no prologue and no epilogue zero overhead. Only functions that need persistent state across calls pay the cost of saving and restoring registers.


Struggling to implement this for a professional project? If you need to master full-scale firmware architecture, security, and build automation, join our 1-D or 4-Day Live Implemnetation Workshops. I'll show you the exact direct path to production-ready firmware without the trial and error.


3. Passing Arguments and Returning Values

The four-register rule

The first four integer or pointer arguments go in r0, r1, r2, r3 in left-to-right order. The compiler generates this:

/* C source */

int add4(int a, int b, int c, int d);



/* Call site */

int result = add4(1, 2, 3, 4);

@ Compiler output (arm-none-eabi-objdump -d)

mov r0, #1 @ arg1

mov r1, #2 @ arg2

mov r2, #3 @ arg3

mov r3, #4 @ arg4

bl add4 @ call return value arrives in r0

Clean and zero overhead. All four arguments fit in registers, no stack touched.

When arguments spill to the stack

The fifth argument and beyond are pushed onto the stack before the bl, in right-to-left order. The caller is responsible for cleaning the stack after the call returns:

int add5(int a, int b, int c, int d, int e);

int result = add5(1, 2, 3, 4, 5);

mov r0, #1 @ arg1

mov r1, #2 @ arg2

mov r2, #3 @ arg3

mov r3, #4 @ arg4

push {r3} @ arg5 pushed onto stack < spill

bl add5

add sp, #4 @ caller cleans the stack < cleanup

On the callee side, add5 finds the fifth argument by reading [sp] on entry because the bl instruction itself does not move the stack pointer, the fifth argument sits exactly at [sp] when the function body begins.

Returning values

- Single 32-bit value (int, pointer, uint32_t): returned in r0.

- 64-bit value (int64_t, double): returned in r0:r1 pair, low word in r0.

- Struct up to 4 bytes: returned in r0.

- Struct larger than 4 bytes: the caller allocates space and passes a hidden pointer in r0; the callee writes the result there.

Stack alignment

The AAPCS requires the stack to be 8-byte aligned at every public function call boundary. The Cortex-M hardware enforces this on exception entry automatically. In hand-written assembly you must enforce it yourself pushing an odd number of registers and then calling a C function will misalign the stack and corrupt any code that relies on 8-byte-aligned stack accesses (which includes all double and int64\_t operations).

Fig 1. The function call stack frame lifecycle.


Fig 2. Argument passing side by side: four arguments fit entirely in r0--r3 with the stack untouched. A fifth argument spills to [sp] and the caller cleans up with add sp, #4


4. Assembly Calling C: the Context Switch

A context switch is the canonical example of assembly that must call a C function while manually following every ABI rule. The compiler cannot write a context switch for you because it does not know about the Process Stack Pointer (PSP), the Task Control Block (TCB), or the need to save and restore registers that it would normally treat as scratch space.

The PendSV exception is the standard Cortex-M mechanism for triggering a context switch at the lowest interrupt priority. Here is a complete, minimal implementation:

The C side scheduler

/* scheduler.h */

typedef struct {

uint32_t *stack_pointer;

/* other TCB fields */

} TCB_t;



extern TCB_t *current_tcb;

extern TCB_t *next_tcb;



/* Called from assembly with r0 = current task's updated SP */

uint32_t *vTaskSwitchContext(uint32_t *sp);


/* scheduler.c */

uint32_t *vTaskSwitchContext(uint32_t *sp) {

current_tcb->stack_pointer = sp; /* save current SP into TCB */

current_tcb = next_tcb; /* switch to next task */

return current_tcb->stack_pointer; /* return new task SP */

}

The assembly side PendSV handler

/* pendsv.s */

.syntax unified

.thumb



.global PendSV_Handler

.type PendSV_Handler, %function



PendSV_Handler:

/* Step 1: save callee-saved regs of the current task.

The CPU already saved r0-r3, r12, lr, pc, xPSR

onto the PSP stack automatically on exception entry.

We must save r4-r11 manually they are callee-saved

and the scheduler C function may use them. */

mrs r0, psp @ r0 = current task PSP

stmdb r0!, {r4-r11} @ push r4-r11 onto task stack

@ r0 now points to top of saved frame



/* Step 2: call the C scheduler.

ABI: argument in r0 (current SP), return value in r0 (new SP).

We saved r4-r11 above so the C function can use them freely. */

bl vTaskSwitchContext @ r0 = new task stack pointer



/* Step 3: restore callee-saved regs of the next task.

r0 holds the new task SP returned by vTaskSwitchContext. */

ldmia r0!, {r4-r11} @ pop r4-r11 from new task stack

msr psp, r0 @ update PSP to new task



/* Step 4: return from exception.

EXC_RETURN in LR tells the CPU to restore r0-r3, r12,

lr, pc, xPSR from the PSP stack resuming the new task. */

bx lr

Why each line is forced by the ABI

Instruction

ABI reason

mrs r0, psp

Argument must be in r0 per AAPCS

stmdb r0!, {r4-r11}

Callee (C function) may clobber r4-r11; we save them first

bl vTaskSwitchContext

Standard call; return value arrives in r0

ldmia r0!, {r4-r11}

Restore next task's callee-saved registers

msr psp, r0

Update PSP after ldmia adjusted r0 past the saved frame

bx lr

EXC_RETURN value CPU unwinds exception frame from PSP

The C function vTaskSwitchContext can use r4r11 freely because we saved them before the call. It must save and restore any of r4r11 it uses internally which the compiler guarantees automatically via the function prologue/epilogue it generates. This is the ABI working as designed: both sides follow the same contract and neither corrupts the other.Fig 3. PendSV context switch register save and restore.


5. Conclusion

The ARM calling convention rests on three rules that every line of assembly interacting with C must respect:

- Arguments go in r0r3. The fifth argument and beyond spill to the stack and the caller cleans them up.

- r0r3 and r12 are caller-saved. If you need them after a bl, save them yourself before the call.

- r4r11 are callee-saved. If your assembly uses them, push them on entry and pop them on exit. If you call a C function, save them before the call because the compiler assumes it can use them freely inside any C function.

The context switch example shows why these rules matter in practice. The compiler cannot write PendSV_Handler because the ABI rules must be applied with knowledge of two separate stacks, the PSP/MSP distinction, and the CPU's automatic exception frame none of which the compiler knows about. Understanding the ABI is what lets you write this code correctly and confidently.


Want to master this? Here are your next steps:



Created with