My experience with slowness is that slowness is most often caused by Coq doing work it doesn't need to do. Usually, this work comes in the form of retypechecking terms it shouldn't need to retypecheck, and occasionally it comes in the form of (re)doing typeclass search that can be known to be useless. I don't want to have to micro-optimize my tactic scripts to get 1000x speed-ups. In general, I have the sense that I can't trust Coq to be fast, and I can't trust scripts that are fast to remain fast in future versions of Coq unless I'm proactive about reporting bugs and performance regressions.
Some examples of Coq being really slow:
-
Bug #5148 - Ltac reification of moderate sized terms can take over 90 hours - see
slow_fiat_crypto_reify/README.mdfor more details. -
Bug #5146 -
nsatzcan sometimes spend 10 minutes or more in reification - seeslow_nsatz.v. -
Bug #3441 -
pose proof H as kis sometimes an order of magnitude slower thanpose H as k; clearbody k- seeslow_pose_proof.vfor an example of a 4x slowdown. -
Bug #3280
match goal with |- ?f ?x = ?g ?y => idtac endcan be arbitrarily slow - seeevar-normalization-slowness/very_silly_slow_tactic.vandevar-normalization-slowness/exercise-tactics/exercise-tactics.sh. Also, see this graph of the time of tactics vs the size of goal -
Bug #4819 - interactive time is impacted by large terms that don't exist anymore in the goal -
interactive_proof_state_slowness.v(interactive_proof_state_slowness.shfor Coq 8.5, unless you want to usecoqtop -emacs -time < interactive_proof_state_slowness.v) -
Bug #4662 -
unfold ... in ...should insert a cast annotation, elseDefinedcan take over 6 minutes when it doesn't need to - seeslow_unfold.v -
Bug #4637 -
vm_compute in ...should insert vm_casts - seemissing_vm_casts.v -
Bug #4776 - there should be a way to terminate typeclass resolution early - see
slow_failing_setoid_rewrite.v -
Bug #4636 -
set (x := y)can be 100x slower thanpose y as x; change y with x- seeslow_set.v. (The reverse can also happen, wherechangeis orders of magnitude slower thanset. See bug #4779 in the next bullet.) -
Bug #4639 - running
simpl @sndcan take over 158 hours - seeslow_fiat_simpl_example/README.md -
Bug #4657 -
omegais slow onx - y - (x - y - x) = x - y(takes about 1 s, when it solves an equivalent equation in 0.06 s) - seeslow_omega.v -
Bug #4810 - if the type of the field is a projection from a record, then the performance of
field_simplify_eqis linear in the number of record fields of that record, which is absurd, given that the fields of the record shouldn't matter. -
Bug #3448 -
Hint Extern (foo _) => ...is significantly slower thanHint Extern foo => ...; typeclass resolution is slow atintro- seeslow_typeclasses_intro.v -
Bug #3425 - unification can be very slow - see
slow_unification.v. (Matthieu's explanation in the file.) -
Bug #5038 -
Definitiondoes head-zeta-reduction. Seeslow_lets.vfor an example of exponential behavior.
-
Bug #5096 -
vm_computeis exponentially slower thannative_compute,lazy, andcompute. Seeslow_fiat_crypto_vm_compute/README.mdfor more details and instructions on running. -
Bug #3450 -
End foo.is slower in trunk in some cases; it's also slower in batch mode than in interactive mode. The original slowdown was due to hashconsing of universes (then to substituting universes) on leaving a section. But now the example has gotten orders of magnitude slower, in earlier places. Seesuper_slow_end_section_and_record.v(Coq 8.5 only). ThePseudofunctorrecord takes 999 s to be defined, and you can watch the<projection> is defintedmessages appear one by one, very slowly. -
Bug #4187 -
admitis slow on a goal of the formG' -> Propwhen it's fast on a goal of the formG'- seeslow_admit.v -
Bug #4821 -
simple eapplycan take 2 hours - seeslow_fiat_eapply_example/src/Parsers/Refinement/SharpenedJavaScriptAssignmentExpression.v -
Bug #4777 - unless
Set Silentis on, the printing time is impacted by large terms that don't print - seeinteractive_hidden_slowness.v(interactive_hidden_slowness.shfor Coq 8.5, unless you want to usecoqtop -emacs -time < interactive_hidden_slowness.v) -
Bug #4537 - Coq 8.5 is slower (sometimes by as much as 5x-6x) than Coq 8.5beta2 in typeclass resolution with identical traces. Pierre-Marie Pédrot said:
All right, I think I'm the responsible here. This is probably because of commit a895b2c0ca that introduced a performance enhancement of auto hints in the non-polymorphic case. The polymorphic case has a nice comment
(** FIXME: We're being inefficient here because we substitute the whole
evar map instead of just its metas, which are the only ones
mentioning the old universes. *)which is exactly the hotspot I'm seeing right there. I'll write a quick fix.
and then, after fixing that, said:
Note that there is still a hotspot that looks suspicious to me. Indeed, the code for Evarsolve.noccur_evar whose goal is essentially to ensure that an evar does not appear in a term repeatedly calls the following snippet of code
let c =
try Retyping.expand_projection env evd p c []
with Retyping.RetypeError _ ->
(* Can happen when called from w_unify which doesn't assign evars/metas
eagerly enough *) c
in occur_rec acc cwhich is responsible for ~50% of the time passed in one of your tactics for no good reason.
- Bug #3447 - Some
Defineds are 30x slower in trunk - most of the time was spent hashconsing universes.
- Bug #4669 -
printing of dependent evars in ProofGeneral can slow emacs
interaction to a crawl (because printing of dependent evars in the
goal does not respect
Set Printing Width) - seeslow_dependent_evars_in_pg.v, and make sure to let it setcoq-end-goals-regexp-show-subgoalstonilappropriately.
#4640, Has to do with slow [End Section] in special case of no section variables.
- Bug #4625 -
checking
Defined/Qedcauses coqtop to drop the most recent proof state. This makes it a pain to debug things like:
Definition foo : bar.
Proof.
some_slow_tactic.
some_tactic_that_makes_Defined_slow.
Defined.- Bugs #4643,
#4642, and
#4779:
Defined.sometimes takes 2 minutes;End Sectioncan take 30 seconds, even though there are no section variables, no tactics, no notations, noLets, and only one or twoDefinitions;cbv [some identifiers]can run through 64 GB of RAM in 15 minutes; andpose y as x; change y with x in Hcan be 1300x slower thanset (x := y) in H; respectively. Seeslow_fiat_examples/README.mdfor more details and instructions on running. (Be warned, some of the examples of slowness themselves take 20 minutes to compile.)