Email copied!
Buy Me A Coffee
Back to Home

Why Counter++ Isn't Thread-Safe: Understanding Atomic Operations

An explanation of why the increment/decrement operators are not atomic and how this affects concurrent systems.

• calculating...
programming c go concurrency

While learning about concurrency, I observed an interesting behavior with increment/decrement operations. Consider the following C code snippet:

#include <stdio.h>
volatile int counter = 0;
int main(){
	printf("%d",counter++); // still 0
	printf("%d",counter); // 1
	return 0;
}

Note: The volatile keyword prevents certain compiler optimizations, but it does not make an operation thread-safe. It ensures each counter++ actually reads from and writes to memory, but the underlying three-step process (load-increment-store) is still non-atomic at the CPU level.

Programmers are familiar with the ++ and -- operators, which increment or decrement numeric variables by one. They are simple and efficient to use.

Let’s take this a step further by running the operation in two separate threads. For a given input N, the final result should be 2*N.

Threads execute multiple instructions within a single process. Essentially, a process can create many threads to perform tasks concurrently. Read more!

#include "common.h"
#include "common_threads.h"
#include <stdio.h>
#include <stdlib.h>

volatile int counter = 0;
int loops;
void *worker(void *arg) {
  int i;
  for (i = 0; i < loops; i++) {
    counter++;
  }
  return NULL;
}
int main(int argc, char *argv[]) {
  if (argc != 2) {
    fprintf(stderr, "usage: threads <loops>\n");
    exit(1);
  }
  loops = atoi(argv[1]);
  pthread_t p1, p2;
  printf("Initial value : %d\n", counter);
  Pthread_create(&p1, NULL, worker, NULL);
  Pthread_create(&p2, NULL, worker, NULL);
  Pthread_join(p1, NULL);
  Pthread_join(p2, NULL);
  printf("Final value   : %d\n", counter);
  return 0;
}

You might predict that with two threads simultaneously updating the same variable, the output would be deterministic. For an input of 1000, the output should be 2000, which is exactly what we see in this case.

 ./threads 1000
Initial value : 0
Final value   : 2000

But what happens if we try a higher value?

 ./threads 10000
Initial value : 0
Final value   : 14900

That’s not right; it should be 20000. Let’s try running it again.

 ./threads 10000
Initial value : 0
Final value   : 14026

Woah! Not only is the output wrong, but it’s also different each time we run the program. This behavior is due to the non-atomic nature of increment and decrement operations.

What is Atomicity?

Atomicity is a property of an operation that guarantees it will execute as a single, indivisible unit. From the perspective of other threads, an atomic operation has either happened completely or not at all.

In the case of an increment or decrement operation, let’s break down what happens under the hood:

  1. Load: Read the current value from the memory location.
  2. Increment: Add one to the value that was just read.
  3. Store: Write the new value back to the same memory location.

These are three distinct operations. We can confirm this by inspecting the machine instructions generated by the compiler. This will produce assembly code specific to your CPU architecture (the code will differ between machines, but the underlying logic remains the same).

# Using the -S flag, we can generate the assembly code
gcc -S threads.c

A simplified view of the assembly might look like this:

mov counter(%rip), %eax  # Load:  Read the value from 'counter' into a register
add $1, %eax             # Increment: Add 1 to the register's value
mov %eax, counter(%rip)  # Store: Write the new value from the register back to 'counter'

The Race Condition

Because the simple counter++ operation is not atomic, a “race condition” can occur. Due to rapid context switching, the two threads can interleave their instructions in a destructive way.

Imagine the following sequence:

Time →
Thread 1: [Read=5]     [Inc=6]   [Write=6]
Thread 2:       [Read=5][Inc=6]        [Write=6]
                    ↑ Both read 5!              ↑ Both write 6!

Here, both threads read the value 5 before either one has a chance to write its updated value back. They both calculate 6 and write it back. The result is that one of the increment operations is completely lost. The final value is 6 instead of the correct value of 7.

When multiple threads or goroutines (in Go) attempt to update the same memory location concurrently without proper synchronization, it can result in this kind of undefined behavior.

How to Fix It

To tackle this challenge, programming languages provide various solutions for synchronization and atomic operations.

C: stdatomic.h

In modern C, you can use the <stdatomic.h> header and its atomic types and functions.

#include <stdatomic.h>

atomic_int counter = 0;
// This performs the increment atomically
atomic_fetch_add(&counter, 1);

Go: Mutexes or sync/atomic

In Go, you can use a mutex (mutual exclusion), which locks a piece of data to ensure only one goroutine can access it at a time.

Alternatively, for simple operations like this, you can use the sync/atomic package, which is often more efficient.

import "sync/atomic"

var counter int32
atomic.AddInt32(&counter, 1)      // Atomically add to the counter
atomic.LoadInt32(&counter)       // Atomically read the value
atomic.StoreInt32(&counter, 42)  // Atomically write a value

The machine code for a true atomic operation is different. It uses a special instruction (often prefixed with LOCK) to ensure the Load-Increment-Store sequence is indivisible.

# An atomic increment uses a special CPU instruction:
LOCK INCQ 0x618dd8    # This is a single, atomic instruction

# In contrast, a mutex involves multiple instructions and OS calls:
# 1. Try to acquire lock (might block and cause a context switch)
# 2. Perform the increment
# 3. Release the lock (which might wake up other waiting threads)

Concurrency is essential for performance in modern distributed and backend systems. Understanding why things are designed the way they are is valuable, and digging a layer deeper is always helpful.

That’s it, hope you understood!


Go Example

Here is the Go equivalent of the C example, which uses goroutines.

package main
import (
	"fmt"
	"os"
	"strconv"
	"sync"
)
var counter int
var loops int
func worker(wg *sync.WaitGroup) {
	defer wg.Done()
	for i := 0; i < loops; i++ {
		counter++
	}
}
func main() {
	if len(os.Args) != 2 {
		fmt.Fprintf(os.Stderr, "Usage: threads <loops>\n")
		os.Exit(1)
	}
	var err error
	loops, err = strconv.Atoi(os.Args[1])
	if err != nil {
		os.Exit(1)
	}
	var wg sync.WaitGroup
	fmt.Printf("Initial counter value: %d\n", counter)
	wg.Add(2)
	go worker(&wg)
	go worker(&wg)
	wg.Wait()
	fmt.Printf("Final counter value: %d\n", counter)

}

If you run this program with the -race flag, the Go runtime will report where the data race occurs:

 go run -race main.go 100000
Initial counter value: 0
==================
WARNING: DATA RACE
Read at 0x000000618dd8 by goroutine 9:
  main.worker()
      /home/fyzanshaik/workspace/os/ostep-code/intro/main.go:16 +0x88
  main.main.gowrap2()
      /home/fyzanshaik/workspace/os/ostep-code/intro/main.go:37 +0x33

To visualize the object dump, you can use the Go toolchain:

go tool objdump main.go > tmp.txt # Stores the output inside a temporary file

References