feat: introduce benchmarking framework #73

tgilon · 2025-07-17T10:18:36Z

Closes #31.

Changes proposed in this Pull Request

This PR introduces a benchmarking framework for continuous and systematic validation of Open TYNDP model outputs against TYNDP 2024 scenarios. This framework provides flexible and scalable validation across multiple metrics and benchmarking methods.

The following metrics from the TYNDP 2024 Scenarios report are considered relevant for benchmarking:

Exogenous Inputs:
- Fig. 5-10
  - Benchmark Final Energy demand by fuel, EU27 (TWh), (Fig 5, p24 and Fig 51, p63)
  - Benchmark Electricity demand per sector, EU27 (TWh), (Fig 6, p25 and Fig 52, p63)
  - Benchmark Methane demand by sector, EU27 (TWh), (Fig 8, p27 and Fig 53, p64)
  - Benchmark Hydrogen demand by sector, EU27 (TWh), (Fig 10, p28 and Fig 54, p64)
Investment and dispatch modelling outputs:
- Fig. 13-40
  - Benchmark of net installed capacity for electricity generation, EU27 (GW), (Fig 25, p39 and Fig 55, p65)
  - Benchmark of electricity generation, EU27 (TWh), (Fig 26, p39 and Fig 56, p65)
  - Benchmark methane supply, EU27 (TWh), (Fig 32, p45 and Fig 57, p66)
  - Benchmark hydrogen supply, EU27 (TWh), (Fig 33, p46 and Fig 58, p67)
  - Benchmark biomass supply, EU27 (TWh), (Fig 59, p67)
  - Benchmark energy imports, EU27 (TWh), (Fig 40, p51 and Fig 60, p68)
  - Hourly generation profile of power generation, Fig 30, p35

The data is published in the Scenarios package.

This PR is based on the methodology proposed by Wen et al. (2022). This methodology provides a multi-criteria approach to ensure:

the diversity (each indicator have its own added value),
the effectiveness (each indicator provides essential and correct information),
the robustness (against diverse units and orders of magnitude), and
the compatibility (can be used to compare across countries) of the selected set of indicators.

This methodology defines the following indicators:

Missing: Count of carriers / sectors dropped due to missing values
sMPE (Symmetric Mean Percentage Error): Indicates the direction of the deviation between modeled scenarios and TYNDP 2024 outcomes, showing if the output is overall overestimated or underestimated.
sMAPE (Symmetric Mean Absolute Percentage Error): Indicates the absolute magnitude of the deviations, avoiding the cancellation of negative and positive errors.
sMdAPE (Symmetric Median Absolute Percentage Error): Provides skewness information to complement sMAPE.
RMSLE (Root Mean Square Logarithmic Error): Complements the percentage errors since it shows the logarithmic deviation values.
Growth error: Show the error on temporal scale. This indicator is ignored for dynamic time series (i.e. hourly generation profiles).

Hourly time series from the TYNDP 2024 will be aggregated to match the temporal resolution of Open-TYNDP.

Summary tables are computed for both the overall and per-carrier results.

Tasks

Workflow

New configuration files config/benchmarking.default.yaml.
retrieve_additional_tyndp_data: Retrieve the TYNDP 2024 Scenarios Report Data Figures package for benchmarking purposes. This rule will be deprecated once the data bundle has been updated (Update TYNDP 2024 data bundle on Zenodo #87).
(new) clean_tyndp_benchmark: Read and process the raw TYNDP 2024 Scenarios Report data. The output data structure is a long-format table.
(new) build_statistics: Compute the benchmark statistics from the optimised network. Run for every planning horizon. The output data structure is a long-format table.
- This rule takes loss factors into account for the electricity demand. Loss factors from the Supply Tool are assumed to be the correct one.
(new) make_benchmark: Compute accuracy indicators for comparing model results against reference data from TYNDP 2024.
(new) make_benchmarks to collect make_benchmark outputs
(new) plot_benchmark: Generate visualisation outputs for model validation
(new) plot_benchmarks to collect plot_benchmarks outputs
The full set of files produced for the benchmarking are stored in the results/validation/ folder. This includes:
- results/validation/resources/ for processed inputs information from both Open-TYNDP and TYNDP 2024.
- results/validation/csvs_s_{clusters}_{opts}_{sector_opts}_all_years/ for quantitive information for each table
- results/validation/graphics_s_{clusters}_{opts}_{sector_opts}_all_years/ for figures of each table
- results/validation/kpis_eu27_s_{clusters}_{opts}_{sector_opts}_all_years.csv as summary table
- results/validation/kpis_eu27_s_{clusters}_{opts}_{sector_opts}_all_years.pdf as summary figure
- the structure of these outputs can be validated in the artifacts of the GitHub CI (e.g. artifacts section here)

Open Issues

The planning year of the generation profiles is not documented. For the time being, this PR assumes 2040.
Two sources for loss factors have been identified : the Supply Tool (sheet Other data and Conversions, starting at line 215) and the Annex VI of the Scenarios Methodology Report, p. 117. While most of the values are identical, two significant discrepancies have been observed for both DE00 and EE00. From the available information, it is also unclear whether the same loss factor is used for all the nodes in countries with multiple nodes (such as Luxembourg).
- DE00: 0.05 (report) and 0.03 (supply tool) in 2030
- EE00: 0.00 (report) and 0.07 (supply tool) in 2030

Notes

The statistics extracted from the network represent a first estimation of all metrics to benchmark. It is accepted that depending on the actual implementation of remaining features, the statistics extraction may require revision. Notes are kept in the code for later review and further improvement.

The preliminary observations of the DE scenario, using a temporal resolution of 720SEG, are summarised below.

Table	Preliminary observations
Final energy demand	• Incomplete demand coverage identified • Demand requires validation • Heat and biofuels require mapping • Climate year mismatch: model uses 2013 data, available benchmark uses 2009
Electricity demand	• Only aggregated value can be compared • NT are close but the match is not perfect • Transport and prosumer demand not yet incorporated for DE and GA • ~~Climate year mismatch: model uses 2013 data, available benchmark uses 2009~~ solved with #109
Methane demand	• Sectoral mapping incomplete • Energy and non-energy industrial uses require disaggregation
Hydrogen demand	• Coverage gaps in multiple sectors • Energy and non-energy industrial uses require disaggregation • Aviation hydrogen demand not modelled
Power capacity	• Renewables capacities matches with #115 • "Small scale res" category requires specification • Demand shedding not yet implemented
Power generation	• Offshore wind generation exceeds expected values • Additional generation sources require improvements
Methane supply	• Supply coverage incomplete • Domestic production and import sources require distinction
Hydrogen supply	• "Low-carbon" and "renewable import" categories need clarification • Supply modelling incomplete
Biomass supply	• Supply appears underestimated • Mapping complete
Energy imports	• Methane import disaggregation limited by data aggregation • No biomass import assumed • Import coverage incomplete
Generation profiles	• Climate year mismatch: model uses 2013 data, available benchmark uses 2009

Example of indicators extracted from kpis_eu27_s_all__all_years.csv for NT scenario with 45SEG:

Table	sMPE	sMAPE	sMdAPE	RMSLE	Growth Error	Missing	version
Final energy demand	-0.20	0.33	0.23	0.45	0.01	6	v0.2+gb167cb1
Electricity demand	0.02	0.02	0.02	0.03	0.00	0	v0.2+gb167cb1
Methane demand	—	—	—	—	—	NA	v0.2+gb167cb1
Hydrogen demand	-0.53	0.53	0.52	0.72	—	10	v0.2+gb167cb1
Power capacity	-0.53	0.68	0.36	5.61	-0.01	3	v0.2+gb167cb1
Power generation	-0.13	0.82	0.67	3.97	-0.01	2	v0.2+gb167cb1
Methane supply	—	—	—	—	—	NA	v0.2+gb167cb1
Hydrogen supply	-0.76	1.13	1.01	9.60	-0.00	5	v0.2+gb167cb1
Biomass supply	-1.48	1.48	1.48	4.43	0.51	1	v0.2+gb167cb1
Energy imports	-1.34	1.36	2.00	27.07	0.14	2	v0.2+gb167cb1
Generation profiles	—	—	—	—	—	NA	v0.2+gb167cb1
Total (excl. time series)	-0.62	0.98	0.82	11.06	0.02	31	v0.2+gb167cb1

Example of indicators extracted from power_generation_s_all__all_years.csv for NT scenario with 45SEG:

Carrier	sMPE	sMAPE	sMdAPE	RMSLE	Growth Error	version
Battery	-2.00	2.00	2.00	13.72	0.49	v0.2+gb167cb17f
CHP and small thermal	-1.67	1.67	1.67	2.83	-0.17	v0.2+gb167cb17f
Coal + other fossil	0.56	0.56	0.56	0.59	-0.03	v0.2+gb167cb17f
Hydro and pumped storage	-0.16	0.16	0.16	0.17	-0.01	v0.2+gb167cb17f
Hydrogen	-1.71	1.71	1.71	12.21	1.54	v0.2+gb167cb17f
Methane	0.20	0.20	0.20	0.27	0.03	v0.2+gb167cb17f
Nuclear	-0.81	0.81	0.81	0.97	-0.07	v0.2+gb167cb17f
Oil	-0.08	0.27	0.27	0.29	0.05	v0.2+gb167cb17f
Solar	-0.05	0.05	0.05	0.05	-0.00	v0.2+gb167cb17f
Wind offshore	-0.00	0.00	0.00	0.00	0.00	v0.2+gb167cb17f
Wind onshore	-0.05	0.05	0.05	0.06	-0.00	v0.2+gb167cb17f
Demand shedding	—	—	—	—	—	v0.2+gb167cb17f
Small scale res	—	—	—	—	—	v0.2+gb167cb17f
Biofuels	—	—	—	—	—	v0.2+gb167cb17f

Example of figure created for the final energy demand for NT scenario in 2040 with 45SEG:

Example of figure created for the generation profiles for DE scenario in 2040 with 720SEG:

Example of summary figure created for NT scenario

Checklist

Conflicts: Snakefile rules/collect.smk

Conflicts: rules/collect.smk

tgilon · 2025-08-14T08:54:20Z

@daniel-rdt This PR is not ready yet. I still have a bunch of todos. Nevertheless, I'm already happy to receive an early feedback from you.

scripts/retrieve_tyndp_benchmark.py

daniel-rdt

Thanks @tgilon for this inital implementation. The architecture follows a very sensible logic and thanks for documenting everything so thorougly up to here.
I like the idea of using the Wen et al (2022) methodology to assess the backcasting. One idea for the plotting here might be to reproduce a similar graph to what they introduced for their graphical abstract which gives a more visual overview of the overall performance wrt those set of indicators.

I also left some comments and small suggestions (it looks like more than it is, no worries, because it is a lot of related small code suggestions). Some additional high level comments I have:

the difference between build_benchmark and make_benchmark is difficult to grasp from the rule names. Maybe we can find a clearer name for one or both of them? Maybe something like build_benchmark_statistics (since this is mainly computing outputs using the statistics module) and make_benchmark_indicators or compare_benchmark_metrics?
add documentation / overview of the output files that include the benchmark results to the PR description

config/benchmarking.tyndp.yaml

scripts/clean_tyndp_benchmark.py

scripts/make_benchmark.py

scripts/retrieve_tyndp_benchmark.py

Co-authored-by: Daniel Rüdt <[email protected]>

daniel-rdt

Thanks @tgilon. The benchmarking architecture looks good overall. Great work! :)
As this is my second round of review, I appreciate the renaming of the rules and cleaned up configuration logic, it is now much clearer to understand the flow.

I do have a few comments that need to be addressed as I found an issue with the temporal aggregation and the add_loss_factors calculation. The major points that I have are:

Fix temporal aggregation
Fix get_loss_factors function
Add KPI summary figure for all KPIs and / or optionally add new summary figure that combines all KPIs into one
Consolidate unit conversion with vercorized version from #97
Improve logging in a few places

doc/img/tyndp/benchmarking_fed_DE_2040.png

doc/benchmarking.rst

doc/configuration.rst

scripts/plot_benchmark.py

scripts/make_benchmark.py

coroa

Ok, i don't have a final review yet, but to speed up the process let me already add the questions i have instead of bunching them.

rules/collect.smk

scripts/build_statistics.py

scripts/clean_tyndp_h2_imports.py

coroa

Ok, nothing else jumped out at me.

scripts/make_benchmark.py

Co-authored-by: Daniel Rüdt <[email protected]> Co-authored-by: Jonas Hörsch <[email protected]>

for more information, see https://pre-commit.ci

Co-authored-by: Jonas Hörsch <[email protected]>

…ense

…cted

daniel-rdt · 2025-10-03T10:06:19Z

LGTM!

tgilon added 2 commits July 17, 2025 12:14

feat: introduce benchmarking configuration files

892a695

feat: introduce benchmarking rules

811384e

tgilon added this to the Release v0.3 milestone Jul 17, 2025

tgilon self-assigned this Jul 17, 2025

tgilon linked an issue Jul 17, 2025 that may be closed by this pull request

Integration of automated tests and benchmarks #31

Closed

2 tasks

daniel-rdt self-assigned this Jul 17, 2025

refactor: indent to keep all the configurations under benchmarking

f79a926

tgilon added the major feature Major feature for the Open TYNDP. label Jul 23, 2025

tgilon added 10 commits July 25, 2025 13:42

Merge branch 'master' into feat/31-benchmarks

00bc467

Conflicts: Snakefile rules/collect.smk

fix: resolve merge conflicts

eb36ecb

refactor: rename files according to naming convention

2ee5a28

feat: implement retrieve_tyndp_benchmark rule

9c2b91c

feat: implement clean_tyndp_benchmark rule

0f9e1c5

Merge branch 'master' into feat/31-benchmarks

1626eed

doc: use new SPDX license header

4f07f4c

feat: implement build_benchmark rule

215e297

feat: implement make_benchmark rule

1837838

Merge branch 'master' into feat/31-benchmarks

12e4ec8

Conflicts: rules/collect.smk

tgilon unassigned daniel-rdt Aug 12, 2025

tgilon added 5 commits August 13, 2025 09:46

feat: add growth error

415480d

fix: validate climate year, improve config, documentation and style

285f73d

fix: improve missing indicator

02acd7b

refactor: remove method options

26e8ee3

doc: improve type hints

6bcab93

tgilon commented Aug 14, 2025

View reviewed changes

scripts/retrieve_tyndp_benchmark.py Outdated Show resolved Hide resolved

fix: remove snakemake reference inside function

498ad63

daniel-rdt requested changes Aug 21, 2025

View reviewed changes

tgilon and others added 2 commits September 2, 2025 10:35

Merge branch 'master' into feat/31-benchmarks

15f2948

Apply suggestions from code review

ab3a284

Co-authored-by: Daniel Rüdt <[email protected]>

doc: improve documentation

5fdc63e

daniel-rdt requested changes Sep 30, 2025

View reviewed changes

coroa reviewed Oct 1, 2025

View reviewed changes

rules/collect.smk Show resolved Hide resolved

scripts/build_statistics.py Show resolved Hide resolved

coroa reviewed Oct 1, 2025

View reviewed changes

scripts/clean_tyndp_h2_imports.py Outdated Show resolved Hide resolved

coroa reviewed Oct 1, 2025

View reviewed changes

scripts/make_benchmark.py Outdated Show resolved Hide resolved

scripts/make_benchmark.py Show resolved Hide resolved

tgilon and others added 16 commits October 1, 2025 17:04

Apply suggestions from code review

9e6d7e1

Co-authored-by: Daniel Rüdt <[email protected]> Co-authored-by: Jonas Hörsch <[email protected]>

doc: update figures and focus on NT scenario

2d09d38

feat: make version unkown error more explicit

6b5e0d3

doc: increase logging level of missing benchmarking table

dc90559

refactor: make use of safe_year util function in build_statistics

3532de5

[pre-commit.ci] auto fixes from pre-commit.com hooks

412708a

for more information, see https://pre-commit.ci

Apply suggestions from code review

05d5654

Co-authored-by: Jonas Hörsch <[email protected]>

feat: provide a better first estimation of final energy demand

2fcf609

feat: add profiles in gitignore to locally limit number of solver_lic…

e4e7f7e

…ense

feat: improve logging for make_benchmark

0f3fb2c

Merge branch 'master' into feat/31-benchmarks

62b1d99

fix: update remaining references to retrieve_tyndp_pecd_data rule

9f43d88

refactor: simplify the unit conversion facility

e34bacc

doc: improve documentation about available planning years

1cc7462

feat: improve logging quality when missing benchmarking is to be expe…

68f4f86

…cted

feat: improve match_temporal_resolution for incomplete years

ff643e6

tgilon requested review from coroa and daniel-rdt October 2, 2025 21:09

doc: improve logging to support debugging

512ed8b

daniel-rdt approved these changes Oct 3, 2025

View reviewed changes

tgilon merged commit 6609a53 into master Oct 3, 2025
5 checks passed

tgilon deleted the feat/31-benchmarks branch October 3, 2025 10:07

tgilon mentioned this pull request Oct 22, 2025

Add radar plots to complement the benchmarking framework #160

Open

tgilon mentioned this pull request Nov 3, 2025

feat: improve elec demand for NT #192

Open

12 tasks

feat: introduce benchmarking framework #73

feat: introduce benchmarking framework #73

Uh oh!

Conversation

tgilon commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this Pull Request

Tasks

Workflow

Open Issues

Notes

Checklist

Uh oh!

tgilon commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

daniel-rdt left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniel-rdt left a comment • edited by tgilon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coroa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coroa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

daniel-rdt commented Oct 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tgilon commented Jul 17, 2025 •

edited

Loading

tgilon commented Aug 14, 2025 •

edited

Loading

daniel-rdt left a comment •

edited

Loading

daniel-rdt left a comment •

edited by tgilon

Loading