Memory Allocation and Deallocation

TL;DR #


Detailed explanation of memory Allocation and Deallocation in glibc #

This post explains how modern glibc (on Linux) handles dynamic memory: what happens on malloc(), calloc(), realloc(), and free(). It’s written for experienced Elixir developers with light-to-intermediate C familiarity—no historical detours, just what matters today.

Key mental model: glibc tries hard to satisfy allocations from memory it already has—via per-thread caches and per-arena free lists. It asks the kernel for more only when it must, and it returns memory to the kernel only when it’s practical (large mmap()d blocks, or heap “trimming” at the top).

The basic building block: chunks #

Every allocation is a chunk: a small header (size + flags; extra pointers when free) followed by your user data. Glibc rounds your request up to alignment and internal size classes. Freed chunks aren’t immediately returned to the OS—they’re linked into internal lists/caches for reuse.

If there’s insufficient free space, glibc either splits the “top chunk” (free space at the end of a heap) or grows memory (via brk() for the main heap or mmap() for others). Conversely, if the top of the heap becomes big enough, glibc can trim it back to the OS.

Arenas: concurrency without pain #

To reduce lock contention, glibc uses multiple arenas—independent heaps with their own free lists. Threads are assigned arenas; work proceeds mostly in parallel. The hard limit of arenas is implementation-defined and generally proportional to CPU cores (commonly 8× cores on 64-bit systems). You can cap it with M_ARENA_MAX or MALLOC_ARENA_MAX.

Implication: memory freed in one arena isn’t instantly usable by another; long-lived multi-threaded apps can appear to “hold on” to memory across threads. Tuning the arena limit can reduce per-thread overhead (at some contention cost).

Free lists and caches (fast to slower) #

Glibc organizes freed chunks into several layers, searched roughly in this order:

  1. Tcache (Thread-local cache) — per-thread, no locks, default: 7 chunks per size class; covers small allocations up to tcache_max (default often ~1032 bytes on 64-bit). This is the hottest path for small, frequent (de)allocations.
  2. Fastbins — per-arena singly-linked lists for very small sizes; fast insertion/removal; no immediate coalescing on free() to keep the path hot.
  3. Unsorted bin — freshly freed (non-fastbin) chunks land here; next malloc can grab from it quickly, otherwise chunks get moved into size-segregated bins.
  4. Small / Large bins — sorted free lists for precise size classes (small) or ranged classes (large). Glibc may split a too-large chunk and return the remainder to a bin.

This layered design keeps the common path lock-free and O(1) while still managing fragmentation for the general case.

What happens on malloc(size) #

  1. Normalize size: round up for alignment + metadata. If size == 0, glibc returns a unique pointer you can safely free().
  2. Large request? If the (adjusted) size exceeds the mmap threshold (default ~128 KiB, dynamically adjusted), glibc mmap()s a dedicated region. These chunks skip the bins and go straight back to the OS on free() via munmap().
  3. Pick an arena for this thread; lock if needed. Multiple arenas reduce contention.
  4. Try tcache (exact size class). If present, pop and return—no locks.
  5. Try fastbins (if in fastbin range). Pop and return.
  6. Try unsorted bin; if a big enough chunk is found, possibly split it. Otherwise, the allocator promotes unsorted entries into small/large bins and continues.
  7. Search small/large bins for a best fit (splitting if needed).
  8. No fit? Grow memory by carving from the top chunk, extending the heap with brk() (or mapping a new heap region) when necessary.

What happens on free(ptr) #

  1. NULL? Do nothing.
  2. Identify the chunk (metadata sits just before your pointer).
  3. If it was mmap()ed, glibc munmap()s it immediately—memory returns to the OS right away.
  4. Otherwise, fast path: put small chunks into tcache (if not full) — O(1), no locks. If tcache is full or size is out of range, consider fastbins.
  5. For non-fastbin sizes, glibc may coalesce with adjacent free neighbors to reduce fragmentation, then insert into the unsorted bin (later migrating to small/large bins).
  6. Top-of-heap trimming: if coalescing grows the top chunk beyond the trim threshold (default ~128 KiB; often set to the current mmap threshold), glibc shrinks the heap with brk() to give memory back to the OS. Trimming only applies to contiguous free space at the end of the heap.

Note: free() does not zero memory. That’s by design for speed; calloc() is the zero-initializing API.

How calloc() and realloc() fit in #

Returning memory to the OS #

Tuning knobs you might care about #

Use mallopt() (C) or environment variables / GLIBC tunables to adjust behavior. Common ones:

Observability tip: call malloc_info() to dump allocator state (arenas, bins) as XML; handy for debugging/tuning in production.


Practical examples #

1) Reuse of a freed small chunk (tcache/fast path) #
#include <stdio.h>
#include <stdlib.h>

int main(void) {
void *p1 = malloc(100);
void *p2 = malloc(100);
free(p1); // likely goes to tcache
void *p3 = malloc(100); // likely pops the same chunk
printf("p1=%p\np2=%p\np3=%p\n", p1, p2, p3);
return 0;
}

Typical output shows p3 == p1: glibc reused the recently freed chunk from tcache/bins—no heap growth required.

2) Large allocations use mmap() and return fast #
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main(void) {
size_t big = 512 * 1024; // 512 KiB > default mmap threshold
void *p = malloc(big); // likely via mmap
// ... use memory ...
free(p); // likely munmap -> back to OS now
printf("freed large block\n");
return 0;
}

Because this crosses the default mmap_threshold, freeing returns the mapping to the OS immediately.

3) calloc() zero-init and realloc() growth #
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
int *a = calloc(4, sizeof(int)); // zero-initialized
a = realloc(a, 1024 * sizeof(int)); // may move, may not
printf("a[0]=%d\n", a[0]); // still 0
free(a);
return 0;
}

calloc() returns zeroed memory; realloc() may resize in place or allocate+copy+free, depending on fit.


Tips for Elixir developers #


Sources #

All details refer to current glibc behavior and official documentation/manpages at the time of writing; specifics can vary slightly by distribution and glibc version.


Since you've made it this far, sharing this article on your favorite social media network would be highly appreciated 💖! For feedback, please ping me on Twitter.

Published