Skip to content
This repository was archived by the owner on Mar 30, 2018. It is now read-only.

How to Reduce CPU Run Time Spent in `runtime.morestack`

Bishop Brock edited this page Jul 22, 2016 · 2 revisions

How to Reduce CPU Run-Time Spent in runtime.morestack

Go implements dynamic stack allocation for goroutines, and stack-capacity checking on most function calls. If a goroutine ever needs more stack space than its current allocation, runtime.morestack is the routine that arranges this. If you see large amounts of cumulative run-time spent in runtime.morestack in a CPU profile, the cause may well be the small amount of stack space that Go gives goroutines when they start, which is currently set at 2 KiB in Go 1.6.

It is easy to change the default goroutine stack allocation as long as you are willing to rebuild Go, or at least rebuild the Go runtime package, from a source installation. Go does not currently provide a command, switch or environment variable to alter this setting.

To change the default, look in your Go source installation directory in the file src/runtime/stack.go around line 71 for a line that looks like

_StackMin = 2048

Change 2048 to a larger power-of-2, then rebuild/reinstall the runtime package by executing

go install -a

from that directory.

The chart below shows the benefit of going to an 8 KiB minimum stack size for both an IBM POWER8 (P8) and Intel Xeon (X86) standalone server running a busywork benchmark. Note that this is not a competitive comparison; It simply shows that for this benchmark, a greater than 10% throughput improvement is possible for both systems simply by increasing the default goroutine stack allocation.

Benefit of increasing Go runtime._StackMin for the Hyperledger Fabric

In the chart above, the shape of the curve for the POWER8 (P8) system is more-or-less expected. The shape of the X86 curve may or may not be real, as there is a moderate amount of run-to-run variability in the throughput measured by this benchmark.

This is the busywork make target used to generate the above plots, executed from fabric/tools/busywork/counters:

.PHONY: sweepNoops
sweepNoops:
	for clients in 1 2 4 8 16 32 64; do \
	    userModeNetwork -noops 4; \
	    ./driver \
             -clients $$clients \
		     -transactions $$(1024 / $$clients)) \
		     -arrays 64 \
		     -peerBurst 16 \
		     ; \
       done		     
Clone this wiki locally