CPU consumer (cpu_hog) is really really bad at consuming CPU

# The problem

Since the purpose of `stress -c` is to put real load on the specified number of CPU cores, it's desirable that of these CPU cores, pipelines don't stall a majority of time, yet:

```shell
perf stat stress -c 16 -t 5
```
tells us that the CPU is mostly idle (if occupied):

```
stress: info: [526148] dispatching hogs: 16 cpu, 0 io, 0 vm, 0 hdd
stress: info: [526148] successful run completed in 5s

 Performance counter stats for 'stress -c 16 -t 5':

         79,580.45 msec task-clock:u                     #   15.910 CPUs utilized
                 0      context-switches:u               #    0.000 /sec
                 0      cpu-migrations:u                 #    0.000 /sec
               309      page-faults:u                    #    3.883 /sec
   418,716,815,425      cycles:u                         #    5.262 GHz
   262,176,845,042      stalled-cycles-frontend:u        #   62.61% frontend cycles idle
   617,055,840,870      instructions:u                   #    1.47  insn per cycle
                                                  #    0.42  stalled cycles per insn
   175,186,890,751      branches:u                       #    2.201 G/sec
       269,450,686      branch-misses:u                  #    0.15% of all branches

       5.001799550 seconds time elapsed

      79.463002000 seconds user
       0.007854000 seconds sys
```

I'd like to draw antention to **62.61% frontend cycles idle**, meaning that the CPU cores' frontends couldn't proceed processing instructions. That means, for example, in hyperthreading cores, that we're not really using the "half-core" fully, thus not slowing down the other half of the hyperthreaded core as intended (which is what I needed to use `stress` for).

# Why does that happen? 

simple. The code is 

```C
while(1){sqrt(rand());}
```

which, unlike the man page claims, isn't actually "spinning on sqrt()". In fact, a compiler with floating point exceptions disabled might notice the result of `sqrt` is never used and simply not even execute it. (that's not what's happening in a default build, however.)

Instead, the bottleneck is `rand()`, which isn't even reentrant, and *should never have been used from multiple threads*. In a `perf record stress -c 16 -t 5`, you'll notice on a modern x86_64, that the CPU is stuck ca. 99.8% of time in `__random()`; to little surprise, because, and that's the problem here, `rand()` modifies *global* state, and hence heavily depends on memory views being kept consistent between CPU cores.

So, a serious bug to use `rand()` here, and a slight bug to use `sqrt()` without doing anything with the result.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CPU consumer (cpu_hog) is really really bad at consuming CPU #11

The problem

Why does that happen?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CPU consumer (cpu_hog) is really really bad at consuming CPU #11

Description

The problem

Why does that happen?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions