Embedded Systems Software is much more than just a normal Software Code
Understanding Software and Hardware Behavior in Embedded Systems

Introduction
Writing software for low-level embedded systems is different from high-level programming (web development, data analytics, or user applications). Embedded systems programming requires consideration of hardware restrictions and low-level control over software behavior, especially regarding memory management, execution order, and C code translation into machine code.
This article explores practical examples of how hardware restrictions impact software execution, using an ARM Cortex-M7 processor platform with a 6-stage pipeline and AXI4 Bus interconnect as an example. The platform supports out-of-order execution and memory access, which can affect program behavior.
How to Spot Divergence Between Software and Hardware Behavior
Divergence between written software and actual hardware execution can cause unexpected results, poor performance, or faults. Two major ways to identify this divergence are:
- Fault Exceptions
- Execution Time Analysis
Fault Exceptions
Fault exceptions help detect mismatches between software and hardware. For example, asynchronous bus faults on Cortex-M7 may indicate late access to an invalid memory address due to buffered access (cache or store buffer). This results in fault locations that do not match the executing code line.
Execution Time Analysis
Execution time can reveal hardware effects like superscalar execution. For instance, executing multiple NOP instructions may take fewer cycles than expected due to dual-issue optimization in Cortex-M7 processors.

Embedded systems: intersection between software and hardware
Example 1: Memory Access Re-Ordering
Cortex-M7 assigns attributes to memory areas based on their purpose:
- Strongly Ordered Memory: Enforces strict execution order; rarely used due to performance penalties.
- Device Memory: Used for peripheral registers; enforces access order with less overhead.
- Normal Memory: Used for code and data; allows out-of-order execution and memory access re-ordering for better performance.
Pseudo-code illustrating access re-ordering:
LDR r0, #0x20000000
LDR r1, #0x20040000
LDR r2, #0x20000004
STR r3, [r0]
STR r3, [r1]
STR r3, [r2]
NOP
Due to store buffering and AXI4 bus behavior, actual memory access order may differ:
STR r3, [r0]
STR r3, [r2]
STR r3, [r1]
If a protected address triggers a bus fault, the fault may occur before expected, highlighting the difference between software expectations and hardware behavior.
Solutions:
- Change memory attribute to strongly ordered
- Use memory barriers like
DMBandDSB
Example 2: Cache Memory Influence
Cache memory stores frequently accessed instructions and data to minimize latency. Its effect is noticeable in performance measurements.
Pseudo-code:
for(int i = 0; i < 10; i++) {
timer_start;
code_line_1;
code_line_2;
code_line_3;
code_line_4;
code_line_5;
timer_stop;
}
Performance results (clock cycles):
iteration_0 : 2000
iteration_1 : 800
iteration_2 : 800
...
iteration_9 : 800
The first iteration takes longer due to cache misses. Subsequent iterations benefit from cached instructions, demonstrating the importance of considering cache effects on performance.
Conclusion
Embedded systems programming requires understanding both hardware and software constraints. Developers must anticipate potential issues, understand the causes of unexpected behavior, and remain motivated despite limitations. Writing software for hardware involves training the brain to predict hardware-software interactions and optimize accordingly.