How to Write a Bare Metal Ethernet Driver for STM32 in C
1.
Introduction
Configuring Ethernet on STM32 without a HAL means touching every layer directly: enabling the right clocks, routing the RMII signals through GPIO alternate functions, initializing the MAC registers for speed and duplex, building the DMA descriptor rings, and writing the transmit and receive logic that moves frames between your buffers and the wire.
This article implements a complete bare metal Ethernet driver for STM32 with an external LAN8720 PHY over RMII. No HAL, no middleware, no generated code. Every register write is explicit and every design decision is explained.
2.
STM32 Ethernet Hardware Overview
2.1 MAC and DMA Peripheral Blocks
The STM32 Ethernet peripheral consists of two main blocks. The MAC block handles frame transmission and reception, address filtering, flow control, and the MDIO interface to the PHY. The DMA block manages data transfers between the MAC FIFOs and system memory using the descriptor ring mechanism.
The MAC and DMA are configured through memory-mapped registers. The MAC registers start at offset 0x0000 from the Ethernet peripheral base address. The DMA registers start at offset 0x1000.
2.2 Clock Configuration
The Ethernet peripheral requires three clocks to be enabled before any register access. All three are gated through the AHB1 peripheral clock enable register.
/* enable Ethernet MAC, TX, and RX clocks */ RCC->AHB1ENR |= RCC_AHB1ENR_ETHMACEN | RCC_AHB1ENR_ETHMACTXEN | RCC_AHB1ENR_ETHMACRXEN; |
The RMII reference clock at 50 MHz must also be present before the MAC can operate. On most STM32 boards with Ethernet, this clock is supplied by an external oscillator connected to the PHY, which then distributes it to the MAC over the RMII REF_CLK line.
2.3 RMII Pin Mapping
The RMII interface requires 9 signals routed through GPIO pins configured as alternate function AF11. The exact pins depend on your board schematic. Define them as macros at the top of your driver so the initialization code is portable across different boards without modification
/* adapt these to match your board schematic */ #define ETH_RMII_REF_CLK_PORT GPIOA #define ETH_RMII_REF_CLK_PIN 1U #define ETH_MDIO_PORT GPIOA #define ETH_MDIO_PIN 2U #define ETH_RMII_CRS_DV_PORT GPIOA #define ETH_RMII_CRS_DV_PIN 7U #define ETH_RMII_TX_EN_PORT GPIOB #define ETH_RMII_TX_EN_PIN 11U
#define ETH_RMII_TXD0_PORT GPIOB #define ETH_RMII_TXD0_PIN 12U
#define ETH_RMII_TXD1_PORT GPIOB #define ETH_RMII_TXD1_PIN 13U
#define ETH_MDC_PORT GPIOC #define ETH_MDC_PIN 1U
#define ETH_RMII_RXD0_PORT GPIOC #define ETH_RMII_RXD0_PIN 4U #define ETH_RMII_RXD1_PORT GPIOC #define ETH_RMII_RXD1_PIN 5U #define ETH_GPIO_AF 11U |
Each pin must be configured as alternate function AF11 with high speed output and no pull-up or pull-down. A helper macro keeps the GPIO initialization readable:
static void gpio_set_af_eth(GPIO_TypeDef *port, uint32_t pin) { /* alternate function mode */ port->MODER &= ~(3U << (pin * 2U)); port->MODER |= (2U << (pin * 2U));
/* high speed */ port->OSPEEDR |= (3U << (pin * 2U));
/* no pull */ port->PUPDR &= ~(3U << (pin * 2U));
/* AF11 */ if (pin < 8U) port->AFR[0] |= (ETH_GPIO_AF << (pin * 4U)); else port->AFR[1] |= (ETH_GPIO_AF << ((pin - 8U) * 4U)); }
void eth_gpio_init(void) { /* enable GPIO clocks */ RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN | RCC_AHB1ENR_GPIOBEN | RCC_AHB1ENR_GPIOCEN;
gpio_set_af_eth(ETH_RMII_REF_CLK_PORT, ETH_RMII_REF_CLK_PIN); gpio_set_af_eth(ETH_MDIO_PORT, ETH_MDIO_PIN); gpio_set_af_eth(ETH_RMII_CRS_DV_PORT, ETH_RMII_CRS_DV_PIN); gpio_set_af_eth(ETH_RMII_TX_EN_PORT, ETH_RMII_TX_EN_PIN); gpio_set_af_eth(ETH_RMII_TXD0_PORT, ETH_RMII_TXD0_PIN); gpio_set_af_eth(ETH_RMII_TXD1_PORT, ETH_RMII_TXD1_PIN); gpio_set_af_eth(ETH_MDC_PORT, ETH_MDC_PIN); gpio_set_af_eth(ETH_RMII_RXD0_PORT, ETH_RMII_RXD0_PIN); gpio_set_af_eth(ETH_RMII_RXD1_PORT, ETH_RMII_RXD1_PIN); } |
Fig
1. RMII signals routed through GPIO alternate function AF11 to the
external PHY.
Struggling to implement this for a professional project?
If you need to master full-scale firmware architecture, security, and build automation, join our 1-D or 4-Day Live Implemnetation Workshops. I'll show you the exact direct path to production-ready firmware without the trial and error.
3.
Initializing the Ethernet MAC
3.1 Reset Sequence
Before configuring any MAC or DMA register, the DMA must be software reset. This resets both the MAC and the DMA to their default state and clears any pending operations.
void eth_dma_reset(void) { /* set software reset bit */ ETH->DMABMR |= ETH_DMABMR_SR;
/* wait for reset to complete */ while (ETH->DMABMR & ETH_DMABMR_SR); } |
After the reset completes, wait at least 4 AHB clock cycles before accessing any other register. A short delay loop is sufficient in practice.
3.2 MAC Configuration Registers
The main MAC configuration register, MACCR, controls duplex mode, speed, checksum offload, and the transmitter and receiver enable bits. For a 100 Mbps full duplex link with no checksum offload:
void eth_mac_init(void) { /* 100 Mbps, full duplex, disable checksum offload */ ETH->MACCR = ETH_MACCR_FES /* fast Ethernet speed: 100 Mbps */ | ETH_MACCR_DM; /* full duplex mode */
/* pass all multicast frames, disable promiscuous mode */ ETH->MACFFR = 0U;
/* set MAC address (example: 02:00:00:00:00:01) */ ETH->MACA0HR = 0x00000002UL; ETH->MACA0LR = 0x01000000UL; } |
The MAC address registers MACA0HR and MACA0LR store the 6-byte hardware address split across two 32-bit registers. MACA0HR holds bytes 5 and 4, and MACA0LR holds bytes 3 through 0. The bit layout follows the STM32 reference manual.
3.3 Enabling the Transmitter and Receiver
The transmitter and receiver are not enabled during initial configuration. They are enabled only after the DMA descriptor rings are fully set up and the PHY link is confirmed up. Enabling them too early can cause the DMA to fetch uninitialized descriptors.
void eth_mac_start(void) { /* enable transmitter and receiver */ ETH->MACCR |= ETH_MACCR_TE | ETH_MACCR_RE;
/* enable DMA transmit and receive */ ETH->DMAOMR |= ETH_DMAOMR_ST | ETH_DMAOMR_SR; } |
4.
Setting Up DMA Descriptors
4.1 TX Descriptor Ring
Each TX descriptor tells the DMA the address and size of one frame buffer to transmit. For a minimal driver, a ring of 4 descriptors is sufficient. Each descriptor maps to a dedicated buffer large enough for a maximum Ethernet frame of 1524 bytes.
#define ETH_TX_DESC_COUNT 4U #define ETH_TX_BUF_SIZE 1524U
/* DMA TX descriptor control bits */ #define ETH_TDES0_OWN (1UL << 31) /* owned by DMA */ #define ETH_TDES0_LS (1UL << 29) /* last segment */ #define ETH_TDES0_FS (1UL << 28) /* first segment */ #define ETH_TDES0_TCH (1UL << 20) /* second addr is next desc */ #define ETH_TDES1_TBS1(n) ((n) & 0x1FFFUL)
static eth_dma_desc_t tx_desc[ETH_TX_DESC_COUNT]; static uint8_t tx_buf[ETH_TX_DESC_COUNT][ETH_TX_BUF_SIZE];
void eth_tx_desc_init(void) { for (uint32_t i = 0; i < ETH_TX_DESC_COUNT; i++) { tx_desc[i].status = ETH_TDES0_TCH; tx_desc[i].ctrl = 0U; tx_desc[i].buf1_addr = (uint32_t)tx_buf[i]; tx_desc[i].buf2_addr = (uint32_t)&tx_desc[(i + 1U) % ETH_TX_DESC_COUNT]; }
/* tell the DMA where the TX descriptor ring starts */ ETH->DMATDLAR = (uint32_t)&tx_desc[0]; } |
4.2 RX Descriptor Ring
The RX descriptor ring works the same way. Each descriptor is pre-loaded with a buffer address and ownership is given to the DMA at initialization so it can store incoming frames immediately.
#define ETH_RX_DESC_COUNT 4U #define ETH_RX_BUF_SIZE 1524U
/* DMA RX descriptor control bits */ #define ETH_RDES0_OWN (1UL << 31) /* owned by DMA */ #define ETH_RDES0_FL_SHIFT 16U /* frame length shift */ #define ETH_RDES0_FL_MASK 0x3FFFUL /* frame length mask */ #define ETH_RDES1_RCH (1UL << 14) /* second addr is next desc */ #define ETH_RDES1_RBS1(n) ((n) & 0x1FFFUL)
static eth_dma_desc_t rx_desc[ETH_RX_DESC_COUNT]; static uint8_t rx_buf[ETH_RX_DESC_COUNT][ETH_RX_BUF_SIZE];
void eth_rx_desc_init(void) { for (uint32_t i = 0; i < ETH_RX_DESC_COUNT; i++) { rx_desc[i].status = ETH_RDES0_OWN; rx_desc[i].ctrl = ETH_RDES1_RCH | ETH_RDES1_RBS1(ETH_RX_BUF_SIZE); rx_desc[i].buf1_addr = (uint32_t)rx_buf[i]; rx_desc[i].buf2_addr = (uint32_t)&rx_desc[(i + 1U) % ETH_RX_DESC_COUNT]; }
/* tell the DMA where the RX descriptor ring starts */ ETH->DMARDLAR = (uint32_t)&rx_desc[0]; } |
4.3 Linking Descriptors to Buffers
The buf2_addr field of each descriptor points to the next descriptor in the ring, and ETH_TDES0_TCH / ETH_RDES1_RCH tells the DMA to treat buf2_addr as a chained descriptor pointer rather than a second data buffer. This is the chained descriptor mode and is the simplest way to implement the ring without requiring contiguous descriptor memory.
Fig 2. TX and RX descriptor rings initialized and linked to their frame buffers.
5. Sending a Raw Ethernet Frame
5.1 Building the Frame in a Buffer
A raw Ethernet frame starts with the 6-byte destination MAC address, followed by the 6-byte source MAC address, the 2-byte EtherType, and then the payload. The FCS is computed and appended by the MAC hardware automatically.
static uint32_t tx_index = 0U;
int eth_tx_frame(const uint8_t *dst_mac, uint16_t ethertype, const uint8_t *payload, uint16_t length) { eth_dma_desc_t *desc = &tx_desc[tx_index];
/* check descriptor is available */ if (desc->status & ETH_TDES0_OWN) return -1; /* DMA still owns this descriptor */
uint8_t *buf = tx_buf[tx_index]; uint16_t total = 14U + length; /* 6 dst + 6 src + 2 type + payload */
/* destination MAC */ buf[0] = dst_mac[0]; buf[1] = dst_mac[1]; buf[2] = dst_mac[2]; buf[3] = dst_mac[3]; buf[4] = dst_mac[4]; buf[5] = dst_mac[5];
/* source MAC (02:00:00:00:00:01) */ buf[6] = 0x02U; buf[7] = 0x00U; buf[8] = 0x00U; buf[9] = 0x00U; buf[10] = 0x00U; buf[11] = 0x01U;
/* EtherType */ buf[12] = (uint8_t)(ethertype >> 8); buf[13] = (uint8_t)(ethertype & 0xFFU);
/* payload */ for (uint16_t i = 0; i < length; i++) buf[14U + i] = payload[i];
/* configure descriptor: first and last segment, frame size */ desc->ctrl = ETH_TDES1_TBS1(total); desc->status = ETH_TDES0_OWN | ETH_TDES0_FS | ETH_TDES0_LS | ETH_TDES0_TCH;
/* advance index */ tx_index = (tx_index + 1U) % ETH_TX_DESC_COUNT;
return 0; } |
5.2 Handing the Frame to the DMA
Setting the OWN bit in the descriptor status field hands the descriptor to the DMA. The DMA will pick it up on its next poll. If the DMA transmit process is suspended, writing to the transmit poll demand register wakes it immediately
void eth_tx_poll(void) { /* wake DMA transmit process if suspended */ ETH->DMATPDR = 0U; } |
5.3 Polling for Completion
After calling eth_tx_poll(), the CPU can poll the descriptor's OWN bit to wait for the DMA to finish transmission. When the OWN bit clears, the frame has been sent and the descriptor is available for reuse.
void eth_tx_wait(uint32_t index) { while (tx_desc[index].status & ETH_TDES0_OWN); } |
Fig 3. TX path: CPU fills buffer, sets OWN bit, polls demand register, waits for OWN to clear.
6. Receiving a Raw Ethernet Frame
6.1 Polling the RX Descriptor
On reception, the DMA writes the incoming frame into the next available RX buffer and clears the OWN bit. The CPU polls the OWN bit to detect a completed reception.
static uint32_t rx_index = 0U;
int eth_rx_frame_ready(void) { /* return 1 if the current descriptor has been filled by DMA */ return !(rx_desc[rx_index].status & ETH_RDES0_OWN); } |
6.2 Reading the Frame from the Buffer
When a frame is ready, the frame length is extracted from the descriptor status field and the frame data is read directly from the buffer.
uint16_t eth_rx_frame_length(void) { return (uint16_t)((rx_desc[rx_index].status >> ETH_RDES0_FL_SHIFT) & ETH_RDES0_FL_MASK); }
const uint8_t *eth_rx_frame_buffer(void) { return rx_buf[rx_index]; } |
6.3 Releasing the Descriptor Back to DMA
After the CPU has processed the frame, the descriptor must be returned to the DMA by setting the OWN bit again and advancing the index. The DMA receive poll demand register is written to wake the DMA if it was suspended waiting for a free descriptor.
void eth_rx_release(void) { /* return descriptor to DMA */ rx_desc[rx_index].status = ETH_RDES0_OWN;
/* advance index */ rx_index = (rx_index + 1U) % ETH_RX_DESC_COUNT;
/* wake DMA receive process if suspended */ ETH->DMARPDR = 0U; } |
Fig 4. RX path: DMA fills buffer, clears OWN bit, CPU reads frame, releases descriptor.
7. Conclusion
A bare metal STM32 Ethernet driver requires four things to work correctly: GPIO and clock configuration for the RMII interface, MAC register initialization with the correct speed and duplex settings, DMA descriptor rings that link descriptors to frame buffers in a chained ring, and the transmit and receive polling logic that coordinates ownership between the CPU and the DMA.
With this driver in place, the STM32 can send and receive raw Ethernet frames without any HAL or middleware. The next article builds on this foundation by implementing ARP, IPv4, and UDP from scratch directly on top of this bare metal driver, turning raw frame I/O into a functional UDP/IP stack.
