10 Ways Go Optimizes Performance with Stack Allocation
Stack allocation is a game-changer for Go performance. By moving allocations from the heap to the stack, Go reduces garbage collector pressure and speeds up memory management. This article dives into ten key insights from the Go team's work on stack allocation, especially for constant-sized slices.
1. Heap Allocations Are Expensive
Each heap allocation triggers a complex code path. Go’s runtime must find a suitable memory block, update bookkeeping, and later track it for garbage collection. This overhead can dominate execution time in hot code paths. Stack allocations avoid most of this—they simply adjust the stack pointer, often at zero cost.

2. Garbage Collection Adds Hidden Costs
Heap allocations feed the garbage collector. Even with modern algorithms like Green Tea, GC pauses and mark/sweep cycles consume CPU. Stack allocations produce no GC workload because they’re reclaimed automatically when the function returns. This reduces GC frequency and makes programs more predictable.
3. Stack vs. Heap: A Speed Comparison
Stack allocations are orders of magnitude faster. They require no lock, no scanning, and often fit in CPU cache. Heap allocations involve mutexes, memory barriers, and deallocation overhead. For short-lived objects, the stack is the clear winner.
4. Dynamic Slice Growth Causes Repeated Allocations
When appending to a slice, Go doubles its capacity each time it runs out of room. For a small slice, this means frequent heap allocations: size 1, then 2, then 4, etc. Each allocation creates garbage (the old backing array) and triggers GC work. This startup phase is especially wasteful.
5. The Startup Phase Is Often the Only Phase
In many real-world programs, slices never grow large. The repeated allocations at sizes 1, 2, 4 may be all the slice ever experiences. This means a disproportionate amount of time is spent in the allocator and GC, even though the final slice is tiny.
6. Waste from Transient Backing Arrays
During growth, each old backing array becomes garbage. For a slice that ends at size 8, you discard arrays of sizes 1, 2, and 4—seven elements worth of memory—and allocate a new array of 8. This inefficiency compounds if many slices are created in a tight loop.
7. Constant-Sized Slices Can Be Stack Allocated
When the Go compiler can determine the maximum size of a slice at compile time (e.g., a fixed limit), it can allocate the backing array on the stack instead of the heap. This eliminates all allocation overhead and GC pressure for that slice.
8. How the Compiler Recognizes Constant-Sized Slices
The compiler analyzes the code to see if a slice’s capacity is bounded by a constant. For example, appending items up to a known count or using a slice literal with a fixed size. This analysis happens during escape analysis and inlining passes.
9. Stack Allocation Improves Cache Locality
Stack-allocated data is contiguous and often in L1 cache. Heap allocations are scattered across memory, causing cache misses. By keeping slice backing arrays on the stack, Go improves data locality and reduces memory latency.
10. Future Directions: More Aggressive Stack Allocation
The Go team continues to extend stack allocation to more patterns. Future releases may allocate variable-sized slices on the stack if the total size is bounded, or move entire structs with heap pointers to the stack using value copying.
Understanding stack allocation helps you write faster Go code. While the compiler does the heavy lifting, being aware of these optimizations lets you structure your code to take advantage of them—especially when dealing with small, temporary slices.
Related Discussions