Skip to content

CPU consumer (cpu_hog) is really really bad at consuming CPU #11

@marcusmueller

Description

@marcusmueller

The problem

Since the purpose of stress -c is to put real load on the specified number of CPU cores, it's desirable that of these CPU cores, pipelines don't stall a majority of time, yet:

perf stat stress -c 16 -t 5

tells us that the CPU is mostly idle (if occupied):

stress: info: [526148] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd
stress: info: [526148] successful run completed in 5s

 Performance counter stats for 'stress -c 16 -t 5':

         79,580.45 msec task-clock:u                     #   15.910 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               309      page-faults:u                    #    3.883 /sec
   418,716,815,425      cycles:u                         #    5.262 GHz
   262,176,845,042      stalled-cycles-frontend:u        #   62.61% frontend cycles idle
   617,055,840,870      instructions:u                   #    1.47  insn per cycle
                                                  #    0.42  stalled cycles per insn
   175,186,890,751      branches:u                       #    2.201 G/sec
       269,450,686      branch-misses:u                  #    0.15% of all branches

       5.001799550 seconds time elapsed

      79.463002000 seconds user
       0.007854000 seconds sys

I'd like to draw antention to 62.61% frontend cycles idle, meaning that the CPU cores' frontends couldn't proceed processing instructions. That means, for example, in hyperthreading cores, that we're not really using the "half-core" fully, thus not slowing down the other half of the hyperthreaded core as intended (which is what I needed to use stress for).

Why does that happen?

simple. The code is

while(1){sqrt(rand());}

which, unlike the man page claims, isn't actually "spinning on sqrt()". In fact, a compiler with floating point exceptions disabled might notice the result of sqrt is never used and simply not even execute it. (that's not what's happening in a default build, however.)

Instead, the bottleneck is rand(), which isn't even reentrant, and should never have been used from multiple threads. In a perf record stress -c 16 -t 5, you'll notice on a modern x86_64, that the CPU is stuck ca. 99.8% of time in __random(); to little surprise, because, and that's the problem here, rand() modifies global state, and hence heavily depends on memory views being kept consistent between CPU cores.

So, a serious bug to use rand() here, and a slight bug to use sqrt() without doing anything with the result.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions