Skip to content

Commit c436e7f

Browse files
author
Charles Parker
committed
Modified save_results to append to existing output file rather than overwrite. Enforced UTF-8 encoding, switched to safe_dump and added document delimiter between records. Also simplified document generation. Fixes issue #24. Added second test case to literature_mcp_encoding_test.yaml for testing.
1 parent 6b64a79 commit c436e7f

File tree

2 files changed

+19
-8
lines changed

2 files changed

+19
-8
lines changed

src/metacoder/evals/runner.py

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -518,18 +518,21 @@ def run_all_evals(
518518

519519
def save_results(self, results: List[EvalResult], output_path: Path):
520520
"""Save evaluation results to file."""
521-
# Convert to list of dicts
522-
results_data = []
523-
for result in results:
524-
results_data.append(result.model_dump())
521+
# output_path.parent.mkdir(parents=True, exist_ok=True) # Not sure if the folder should be created here
522+
data = {
523+
"results": [r.model_dump() for r in results],
524+
"summary": self.generate_summary(results),
525+
}
525526

526-
# Save as YAML
527-
with open(output_path, "w") as f:
528-
yaml.dump(
529-
{"results": results_data, "summary": self.generate_summary(results)},
527+
# Append a new YAML document to the output file.
528+
with open(output_path, "a", encoding="utf-8", newline="") as f:
529+
yaml.safe_dump(
530+
data,
530531
f,
532+
explicit_start=True, # writes '---' to mark a new document
531533
default_flow_style=False,
532534
sort_keys=False,
535+
allow_unicode=True,
533536
)
534537

535538
def generate_summary(self, results: List[EvalResult]) -> Dict[str, Any]:

tests/input/literature_mcp_encoding_test.yaml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,11 @@ cases:
2727
input: Based on PMID 33926573 do microbes from alkaline sulphidic tailings show oxidative stresses?
2828
expected_output: 'The paper says No but it is retracted so the results should not be trusted.'
2929
threshold: 0.9
30+
- name: "disease"
31+
metrics: [CorrectnessMetric]
32+
input: "According to PMID:35743164, What 3 diseases are associated with ITPR1 mutations? Give me disease names and MONDO IDs"
33+
expected_output: |
34+
MONDO:0011694 (spinocerebellar ataxia type 15/16, aka SCA15)
35+
MONDO:0007298 (spinocerebellar ataxia type 29, aka SCA29)
36+
MONDO:0008795 (aniridia-cerebellar ataxia-intellectual disability syndrome; aka Gillespie syndrome)
37+
threshold: 0.7

0 commit comments

Comments
 (0)