This is the artifact for HotOS 2025 paper: Can Large Language Models Verify System Software? A Case Study Using FSCQ as a Benchmark.
All the model-generated proofs used for evaluation are located in generated directory.
-
Pull FSCQ by
git submodule update --init --recursive -
Use OPAM, run
opam install . --deps-only opam install coq opam install coq-serapi -
Install Rust, make sure
cargois in your$PATH.
-
Set environment variable
OPENAI_API_KEYorGOOGLE_API_KEYfor OpenAI/Gemini API token. -
Run
./run.sh <MODEL> <HINT_RATE> <SAMPLE_RATE>For example,./run.sh gpt-4o-2024-11-20 0.5 0.05