Skip to content

Model eval gets caught in a loop if the endpoint returns HTTP errors. #18

@valerie-autumn-skye

Description

@valerie-autumn-skye

I ran metacoder without having OpenAI configued, and the evaluation process gets stuck in a failure loop.
This should be detected so that the process can exit cleanly.

(metacoder) PS C:\Users\CTParker\PycharmProjects\metacoder> uv run metacoder eval .\tests\input\literature_mcp_eval_config.yaml
🔬 Running evaluations from: tests\input\literature_mcp_eval_config.yaml
📊 Loaded dataset: pubmed tools evals
   Models: claude-sonnet
   Coders: goose, dummy (all available)
   Cases: 1
   Total evaluations: 2

🚀 Starting evaluations...
Progress: 1/1 - goose/claude-sonnet/PMID_28027860_Full_Text with servers: mcp-simple-pubmed
Running goose with claude-sonnet on case 'PMID_28027860_Full_Text'
📁 Preparing workdir: eval_workdir\claude-sonnet_goose_PMID_28027860_Full_Text_mcp-simple-pubmed\claude-sonnet_goose_PMID_28027860_Full_Text
🔒 Obtaining lock for eval_workdir\claude-sonnet_goose_PMID_28027860_Full_Text_mcp-simple-pubmed\claude-sonnet_goose_PMID_28027860_Full_Text; current_dir=C:\Users\CTParker\PycharmProjects\metacoder      
🔧 Writing config object: .config/goose/config.yaml type=yaml
🔓 Releasing lock for eval_workdir\claude-sonnet_goose_PMID_28027860_Full_Text_mcp-simple-pubmed\claude-sonnet_goose_PMID_28027860_Full_Text; current_dir=C:\Users\CTParker\PycharmProjects\metacoder      
🔒 Obtaining lock for eval_workdir\claude-sonnet_goose_PMID_28027860_Full_Text_mcp-simple-pubmed\claude-sonnet_goose_PMID_28027860_Full_Text; current_dir=C:\Users\CTParker\PycharmProjects\metacoder      
🦆 Running command: goose run -t What is the first sentence of section 2 in PMID: 28027860?
🦆 Command took 25.10282039642334 seconds
🔓 Releasing lock for eval_workdir\claude-sonnet_goose_PMID_28027860_Full_Text_mcp-simple-pubmed\claude-sonnet_goose_PMID_28027860_Full_Text; current_dir=C:\Users\CTParker\PycharmProjects\metacoder      
Evaluating with CorrectnessMetric
✨ You're running DeepEval's latest Correctness [GEval] Metric! (using gpt-4.1, strict=False, async_mode=True)...
Evaluating 1 test case(s) in parallel ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:00
    🎯 Evaluating test case #0        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:00HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
Evaluating 1 test case(s) in parallel ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:01
    🎯 Evaluating test case #0        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:01HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
Evaluating 1 test case(s) in parallel ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:02
    🎯 Evaluating test case #0        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:02HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 429 Too Many Requests"
OpenAI Error: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.ope

Aborted!                                                                                                                                                                                                   
Evaluating 1 test case(s) in parallel ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:05
    🎯 Evaluating test case #0        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0:00:05

Metadata

Metadata

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions