In this PR, rust-timer found a 4% regression on instruction counts on the match-stress benchmark. When measuring locally, I consistently find instead a 14% improvement on that same benchmark (even after rebasing on master). This is pretty annoying because I can't try to find the source of the regression locally at all.
Do you know what could cause such a difference? Could a difference in architecture explain that? Is there anything I could do to make the results closer to CI? Maybe flags or environment variables I could set?