PARALLELPROMPT is the first benchmark for measuring intra-query semantic parallelism in real-world LLM prompts. Our benchmark enables both method and system evaluation by providing 37,000+ naturally occurring prompts with structured schemas that reveal parallelizable structure within individual user queries.
# Clone the repository
git clone https://github.com/stevenkolawole/parallelprompt.git
cd parallelprompt
# Set OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Compile the execution engine
make
# Run a quick test (10 samples by default, schema-driven)
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output test_results.json
# Run end-to-end evaluation (with schema extraction)
./bin/alphabits --queries datasets/wildchat_parallelizable_queries.csv --output e2e_results.json --end-to-end
# View results
cat test_results.json
- 10.3% of real user prompts contain parallelizable structure
- Up to 7x speedups across different task categories
- >90% quality preservation on factual tasks
- 37,000+ prompts across 11+ languages with structured schemas
PARALLELPROMPT supports both schema-driven and end-to-end execution modes:
Pre-extracted Schemas (CSV) βββΊ Parallel Execution Engine (C++)
ββ Serial Execution
ββ Parallel Execution
ββ Performance Analysis
Raw User Prompt βββΊ Schema Extraction βββΊ Parallel Execution Engine
(GPT-4o) ββ Serial Execution
ββ Parallel Execution
ββ Performance Analysis
parallelprompt/
βββ src/ # Execution engine (C++)
β βββ serial_vs_parallel.cpp # Main benchmarking suite
β β # - Schema-driven execution
β β # - End-to-end evaluation
β β # - Post-processing support
β βββ parallel_vary_n.cpp # Scalability analysis
β βββ Makefile # Build system
βββ datasets/ # Benchmark data (Detailed documentation on HuggingFace's Dataset link)
β βββ lmsys_parallelizable_queries.csv # LMSYS subset (963 prompts)
β βββ wildchat_parallelizable_queries.csv # WildChat subset
βββ data_curation/ # Schema extraction tools (legacy)
β βββ find_parallelprompts.py # Original Claude 3.5 extraction
β βββ system_prompt.txt # Extraction prompt template
βββ evaluation/ # Quality assessment tools
β βββ openai_eval/ # LLM judge evaluation
β βββ README.md # Evaluation documentation
βββ utils/ # Schema conversion utilities
βββ include/ # OpenAI API headers
- C++ Compiler: GCC 9+ or Clang with C++20 support
- Libraries:
libcurl
,nlohmann-json
- OpenAI API Key: For both schema extraction (if doing end-to-end) and execution
# Install dependencies (Ubuntu/Debian)
sudo apt-get install build-essential libcurl4-openssl-dev
# Set OpenAI API key
export OPENAI_API_KEY="your-api-key-here"
# Compile the execution engine
make
# Verify installation
./bin/alphabits --help
Uses pre-extracted schemas from the CSV files for fast evaluation:
# Basic execution (10 samples)
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output results.json
# Custom sample size
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output results.json --sample-size 50
# Full dataset
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output full_results.json --sample-size all
# With post-processing cleanup
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output clean_results.json --post-process
Extracts schemas from raw prompts using GPT-4o, then executes in parallel:
# End-to-end with schema extraction
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output e2e_results.json --end-to-end
# End-to-end with post-processing
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output e2e_clean.json --end-to-end --post-process
# Small sample for testing
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output e2e_test.json --sample-size 5 --end-to-end
The benchmark supports any OpenAI-compatible API server (SGLang, vLLM, LocalAI, etc.):
# SGLang server
export OPENAI_API_BASE=http://localhost:30000/v1
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output results.json
# vLLM server
export OPENAI_API_BASE=http://localhost:8000/v1
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output results.json
# One-liner approach
OPENAI_API_BASE=http://localhost:8000/v1 ./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output results.json
Option | Description | Default |
---|---|---|
--queries |
Path to CSV file with prompts | Required |
--output |
Output JSON file path | Required |
--sample-size |
Number of prompts to process (<num> or all ) |
10 |
--post-process |
Enable output cleanup using GPT-4o-mini | Disabled |
--end-to-end |
Extract schemas from raw prompts (vs. using CSV) | Disabled |
{
"prompt": "Generate 10 room descriptions...",
"category": "Repeated Generation",
"serial_output": "...",
"parallel_output": ["...", "...", "..."],
"speedup": 3.41,
"normalized_speedup": 4.22,
"serial_duration_ms": 5420,
"total_parallel_duration_ms": 1590,
"post_processed_output": "..." // if --post-process enabled
}
{
"prompt": "Generate 10 room descriptions...",
"category": "Repeated Generation",
"extracted_category": "Repeated Generation",
"extracted_template": "Generate a detailed description of {data}...",
"schema_extraction_duration_ms": 1200,
"e2e_parallel_duration_ms": 2790,
"e2e_speedup": 1.94,
"extraction_successful": true,
// ... plus all schema-driven fields
}
./bin/alphabits --queries datasets/lmsys_parallelizable_queries.csv --output baseline.json
# Your decomposition method
./bin/alphabits --queries your_schemas.csv --output your_results.json
Update model strings in:
call_openai(..., model="gpt-4o-mini", ...)
call_gpt_schema_extraction(..., model="gpt-4o", ...)
- Update
extract_schema_from_prompt()
- Extend
get_system_prompt()
for new cases - Evaluate with
--end-to-end
Edit post_process_outputs()
function in src/serial_vs_parallel.cpp
:
string post_process_prompt = "Your custom post-processing instructions...";
If you use this benchmark or find it relevant, please cite:
@article{parallelprompt2025,
title={ParallelPrompt: Extracting Parallelism from Large Language Model Queries},
author={Kolawole, Steven and Santhanam, Keshav and Smith, Virginia and Thaker, Pratiksha},
journal={Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025) Datasets and Benchmarks Track},
year={2025}
}
- π GitHub Issues
- π§ Contact