Same Name, Different Data: How Thread-Local Storage Works
1.Introduction:
When writing multithreaded software, one of the first challenges you
face is shared data. A global variable might seem harmless---until
two threads try to update it at the same time. Suddenly, you've got race
conditions, corrupted values, and a debugging headache that keeps you up
at night.
The usual advice is to protect shared variables with **locks or
mutexes**, but that comes at a cost: extra complexity, performance hits,
and the constant risk of deadlocks. But what if some variables don't
actually need to be shared at all? What if every thread could have its
own private copy?
That's exactly what Thread-Local Storage (TLS) provides. Instead of
forcing threads to fight over a single global variable, TLS gives each
thread its own instance. Same name, same code access, but under the hood
each thread works with its own memory slot
2. What is Thread-Local Storage?
Thread-Local Storage is a mechanism that allows each thread to maintain
a unique copy of a variable. Conceptually, it feels like a global
variable---you can access it from anywhere in your code---but in
reality, the value you see belongs only to the current thread.
This is different from stack variables, which are temporary and vanish
after a function returns. TLS variables persist for the lifetime of the
thread, making them perfect for storing per-thread state like counters,
buffers, or error codes.

Figure1 – Global vs Thread-Local Variable in Two Threads
In short: **globals are shared, stacks are temporary, TLS is private and
persistent.** It's a simple concept, but one that solves a big problem
in concurrent programming.
3. Why Do We Need TLS?
The need for TLS shows up the moment threads start stepping on each
other's toes. Imagine a logging system where multiple threads write to
the same global buffer. Without careful locking, the messages get
scrambled together like two people talking over each other on the same
phone line. With TLS, each thread can write to its own buffer, and
later those logs can be merged cleanly.
A classic real-world example is the errno variable in C. Every time a
system call fails, it sets errno to indicate the error. But in a
multithreaded program, each thread might be dealing with different
errors at the same time. If errno were global, one thread could
overwrite another's value. TLS solves this by making errno thread-local:
each thread gets its own error code, safely isolated from the rest.
TLS is the quiet workhorse that keeps multithreaded software sane: no
lock overhead, no messy synchronization, and no accidental overwrites.
Just simple, per-thread data isolation.
4. How TLS Works in C11?
C11 introduced the \_Thread_local storage-class specifier, giving C
programmers a standard way to declare variables that are local to each
thread. This means every thread has its own independent copy of the
variable, even though the name is the same across the program.
for example :
#include \<stdio.h\>
#include \<pthread.h\>
_Thread_local int counter = 0;
void* worker(void* arg)
{
int id = *(int*)arg;
counter++;
printf("Thread %d counter: %d \n", id, counter);
return NULL;
int main()
{
pthread_t t1, t2;
int a = 1, b = 2;
pthread_create(&t1, NULL, worker, &a);
pthread_join(t1, NULL);
pthread_create(&t2, NULL, worker, &b);
pthread_join(t2, NULL);
}
If we compile and run this, the output looks like:

Each thread starts with its own counter at zero, so when it
increments, it prints 1. The two threads don't interfere with each other
because each has its own private copy of the variable.
Now let's see what happens without \_Thread_local, if we just
declare a normal global variable
Running the same program gives:

Here, both threads are incrementing the same global counter. The
first thread sets it to 1, and the second thread sees that new value and
increments it to 2.
To prove that \_Thread_local actually gives **separate per-thread
instances, let's look at the assembly dump: globals compile to one
fixed address**, while thread-locals compile to
thread-pointer--relative accesses.
When we declare int counter; as a plain global, the compiler generates
instructions like this:

Now compare that with the code generated for _Thread_local int counter:

Instead of %rip, the compiler uses the special %fs segment register with
an offset (@tpoff). %fs is set differently for each thread by the OS and
pthreads runtime. That means the same symbol counter resolves to a
different memory location depending on which thread is running.
At the C source level, both versions look almost identical, but the
assembly reveals what's really happening under the hood.
5.Conclusion:
Thread-local storage in C11 gives each thread its own private "copy" of
a variable, avoiding the conflicts of shared globals. At the C level it
looks almost identical, but the assembly reveals the truth: globals map
to one fixed memory address, while \_Thread_local variables resolve
through the thread pointer, giving every thread its own instance. A tiny
keyword, but a powerful tool for writing safer, cleaner multithreaded
code.