Memory Allocation and Deallocation

TL;DR #

Glibc’s allocator (malloc/free) manages memory as chunks and reuses freed chunks from per-thread caches and per-arena bins before asking the kernel for more. That’s why a process often doesn’t shrink immediately after free().
Fast path: a per-thread tcache (default: up to 7 chunks per size class) serves small allocations with zero locking; then come fastbins, the unsorted bin, and the size-segregated small/large bins.
Large allocations (by default, >~128 KiB) use mmap() and are returned to the OS immediately with munmap() when freed. Regular heap memory goes back to the OS only when the top of the heap can be trimmed past a threshold (default ~128 KiB, often twice the current mmap threshold).
Multi-threading: glibc uses multiple arenas (heaps) to reduce contention. On 64-bit systems the default hard limit is typically 8 × #CPU cores (tunable).
You can tune behavior with mallopt() and GLIBC tunables (e.g., MALLOC_ARENA_MAX, glibc.malloc.tcache_count, trim_threshold, mmap_threshold). Use sparingly; the defaults are good for most apps.

Detailed explanation of memory Allocation and Deallocation in glibc #

This post explains how modern glibc (on Linux) handles dynamic memory: what happens on malloc(), calloc(), realloc(), and free(). It’s written for experienced Elixir developers with light-to-intermediate C familiarity—no historical detours, just what matters today.

Key mental model: glibc tries hard to satisfy allocations from memory it already has—via per-thread caches and per-arena free lists. It asks the kernel for more only when it must, and it returns memory to the kernel only when it’s practical (large mmap()d blocks, or heap “trimming” at the top).

The basic building block: chunks #

Every allocation is a chunk: a small header (size + flags; extra pointers when free) followed by your user data. Glibc rounds your request up to alignment and internal size classes. Freed chunks aren’t immediately returned to the OS—they’re linked into internal lists/caches for reuse.

If there’s insufficient free space, glibc either splits the “top chunk” (free space at the end of a heap) or grows memory (via brk() for the main heap or mmap() for others). Conversely, if the top of the heap becomes big enough, glibc can trim it back to the OS.

Arenas: concurrency without pain #

To reduce lock contention, glibc uses multiple arenas—independent heaps with their own free lists. Threads are assigned arenas; work proceeds mostly in parallel. The hard limit of arenas is implementation-defined and generally proportional to CPU cores (commonly 8× cores on 64-bit systems). You can cap it with M_ARENA_MAX or MALLOC_ARENA_MAX.

Implication: memory freed in one arena isn’t instantly usable by another; long-lived multi-threaded apps can appear to “hold on” to memory across threads. Tuning the arena limit can reduce per-thread overhead (at some contention cost).

Free lists and caches (fast to slower) #

Glibc organizes freed chunks into several layers, searched roughly in this order:

Tcache (Thread-local cache) — per-thread, no locks, default: 7 chunks per size class; covers small allocations up to tcache_max (default often ~1032 bytes on 64-bit). This is the hottest path for small, frequent (de)allocations.
Fastbins — per-arena singly-linked lists for very small sizes; fast insertion/removal; no immediate coalescing on free() to keep the path hot.
Unsorted bin — freshly freed (non-fastbin) chunks land here; next malloc can grab from it quickly, otherwise chunks get moved into size-segregated bins.
Small / Large bins — sorted free lists for precise size classes (small) or ranged classes (large). Glibc may split a too-large chunk and return the remainder to a bin.

This layered design keeps the common path lock-free and O(1) while still managing fragmentation for the general case.

What happens on `malloc(size)` #

Normalize size: round up for alignment + metadata. If size == 0, glibc returns a unique pointer you can safely free().
Large request? If the (adjusted) size exceeds the mmap threshold (default ~128 KiB, dynamically adjusted), glibc mmap()s a dedicated region. These chunks skip the bins and go straight back to the OS on free() via munmap().
Pick an arena for this thread; lock if needed. Multiple arenas reduce contention.
Try tcache (exact size class). If present, pop and return—no locks.
Try fastbins (if in fastbin range). Pop and return.
Try unsorted bin; if a big enough chunk is found, possibly split it. Otherwise, the allocator promotes unsorted entries into small/large bins and continues.
Search small/large bins for a best fit (splitting if needed).
No fit? Grow memory by carving from the top chunk, extending the heap with brk() (or mapping a new heap region) when necessary.

What happens on `free(ptr)` #

NULL? Do nothing.
Identify the chunk (metadata sits just before your pointer).
If it was mmap()ed, glibc munmap()s it immediately—memory returns to the OS right away.
Otherwise, fast path: put small chunks into tcache (if not full) — O(1), no locks. If tcache is full or size is out of range, consider fastbins.
For non-fastbin sizes, glibc may coalesce with adjacent free neighbors to reduce fragmentation, then insert into the unsorted bin (later migrating to small/large bins).
Top-of-heap trimming: if coalescing grows the top chunk beyond the trim threshold (default ~128 KiB; often set to 2× the current mmap threshold), glibc shrinks the heap with brk() to give memory back to the OS. Trimming only applies to contiguous free space at the end of the heap.

Note: free() does not zero memory. That’s by design for speed; calloc() is the zero-initializing API.

How `calloc()` and `realloc()` fit in #

calloc(n, sz) allocates n*sz bytes and zero-fills the user area. Under the hood, it still draws from the same caches/bins; glibc may use optimizations (like mmap + kernel-zeroed pages) for larger blocks.
realloc(p, new) tries to resize in place when possible; otherwise it allocates a new chunk (possibly from tcache/bins), copies data, and frees the old chunk (which then follows the normal free path).

Returning memory to the OS #

Always returned: standalone mmap() allocations are freed with munmap() and leave the process immediately. Great for big, bursty allocations.
Sometimes returned: the heap top can be trimmed when large contiguous free space accumulates at the end (over trim_threshold). Otherwise, freed memory stays in the process for future reuse—which is usually good for performance.

Tuning knobs you might care about #

Use mallopt() (C) or environment variables / GLIBC tunables to adjust behavior. Common ones:

MALLOC_ARENA_MAX / M_ARENA_MAX — cap the number of arenas. Helpful for memory-sensitive, highly threaded services (e.g., set to a small number to reduce per-thread heap overhead). Default hard limit is proportional to CPU count (often 8× cores on 64-bit).
glibc.malloc.tcache_count — max cached chunks per size class in a thread (default 7, upper limit 65535). Set to 0 to disable tcache.
glibc.malloc.tcache_max — largest tcache-eligible allocation (default ~1032 bytes on 64-bit).
M_TRIM_THRESHOLD / trim_threshold — size beyond which the top chunk is trimmed. Defaults to ~128 KiB and is often updated dynamically to 2× the current mmap_threshold. Lower it to return memory more aggressively.
M_MMAP_THRESHOLD / mmap_threshold — sizes above this use mmap() directly (~128 KiB by default; dynamically adjusted upward after very large requests).

Observability tip: call malloc_info() to dump allocator state (arenas, bins) as XML; handy for debugging/tuning in production.

Practical examples #

1) Reuse of a freed small chunk (tcache/fast path) #

#include <stdio.h>
#include <stdlib.h>

int main(void) {
    void *p1 = malloc(100);
    void *p2 = malloc(100);
    free(p1);                 // likely goes to tcache
    void *p3 = malloc(100);   // likely pops the same chunk
    printf("p1=%p\np2=%p\np3=%p\n", p1, p2, p3);
    return 0;
}

Typical output shows p3 == p1: glibc reused the recently freed chunk from tcache/bins—no heap growth required.

2) Large allocations use `mmap()` and return fast #

#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

int main(void) {
    size_t big = 512 * 1024;          // 512 KiB > default mmap threshold
    void *p = malloc(big);            // likely via mmap
    // ... use memory ...
    free(p);                          // likely munmap -> back to OS now
    printf("freed large block\n");
    return 0;
}

Because this crosses the default mmap_threshold, freeing returns the mapping to the OS immediately.

3) `calloc()` zero-init and `realloc()` growth #

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void) {
    int *a = calloc(4, sizeof(int));  // zero-initialized
    a = realloc(a, 1024 * sizeof(int)); // may move, may not
    printf("a[0]=%d\n", a[0]);        // still 0
    free(a);
    return 0;
}

calloc() returns zeroed memory; realloc() may resize in place or allocate+copy+free, depending on fit.

Tips for Elixir developers #

FFI/native code (NIFs/Ports/C-Nodes): if your NIFs or linked C libraries allocate via glibc, your BEAM process memory profile follows these rules. Expect plateaus after frees unless you free big mmap chunks or the heap top can be trimmed. Consider explicit malloc_trim(0) at quiescent points if memory return matters.
Threading model: the BEAM runs schedulers and potentially many threads; unbounded arenas can waste memory. If you see high but flat RSS, try lowering MALLOC_ARENA_MAX cautiously and measure contention.
Allocation patterns: batch allocations/frees of similar sizes benefit most from tcache. On memory-tight systems, consider reducing tcache_count or disabling tcache to reduce per-thread caches (trade-off: slower alloc/free).
Measure before tuning: use malloc_info() and OS tools (/proc/self/smaps, pmap, perf) to confirm where memory sits—arenas vs mmaps—before changing thresholds.

Sources #

GNU C Library Reference Manual – Memory Allocation Tunables (tcache defaults, thresholds, arenas/tunables).
malloc(3) man page (semantics of malloc/free/calloc/realloc, zero-init guarantees, large allocations).
mallopt(3) man page (arenas, thresholds, tunables via API).
Dynamic thresholds explanation (mmap/trim) (mmap/trim threshold interaction).
Arenas rule of thumb (8× cores on 64-bit; practical tuning guidance).
Allocator internals overview (bins, unsorted bin behavior, fast path).
malloc_info(3) man page (allocator state inspection).

All details refer to current glibc behavior and official documentation/manpages at the time of writing; specifics can vary slightly by distribution and glibc version.

Since you've made it this far, sharing this article on your favorite social media network would be highly appreciated 💖! For feedback, please ping me on Twitter.

Published 04 Sep 2025