-
Notifications
You must be signed in to change notification settings - Fork 996
How to Reduce CPU Run Time Spent in `runtime.morestack`
Go implements dynamic stack allocation for goroutines, and stack-capacity
checking on most function calls. If a goroutine ever needs more stack space
than its current allocation, runtime.morestack
is the routine that arranges
this. If you see large amounts of cumulative run-time spent in
runtime.morestack
in a CPU profile, the cause may well be the small amount of
stack space that Go gives goroutines when they start, which is currently set
at 2 KiB in Go 1.6.
It is easy to change the default goroutine stack allocation as long as you are
willing to rebuild Go, or at least rebuild the Go runtime
package, from a
source installation. Go does not currently provide a command, switch or
environment variable to alter this setting.
To change the default, look in your Go source installation directory in the
file src/runtime/stack.go
around line 71 for a line that looks like
_StackMin = 2048
Change 2048 to a larger power-of-2, then rebuild/reinstall the runtime
package
by executing
go install -a
from that directory.
The chart below shows the benefit of going to an 8 KiB minimum stack size for both an IBM POWER8 (P8) and Intel Xeon (X86) standalone server running a busywork benchmark. Note that this is not a competitive comparison; It simply shows that for this benchmark, a greater than 10% throughput improvement is possible for both systems simply by increasing the default goroutine stack allocation.
In the chart above, the shape of the curve for the POWER8 (P8) system is more-or-less expected. The shape of the X86 curve may or may not be real, as there is a moderate amount of run-to-run variability in the throughput measured by this benchmark.
This is the busywork make target used to generate the above plots,
executed from fabric/tools/busywork/counters
:
.PHONY: sweepNoops
sweepNoops:
for clients in 1 2 4 8 16 32 64; do \
userModeNetwork -noops 4; \
./driver \
-clients $$clients \
-transactions $$(1024 / $$clients)) \
-arrays 64 \
-peerBurst 16 \
; \
done