Skip to content

Conversation

@tgilon
Copy link
Member

@tgilon tgilon commented Jul 17, 2025

Closes #31.

Changes proposed in this Pull Request

This PR introduces a benchmarking framework for continuous and systematic validation of Open TYNDP model outputs against TYNDP 2024 scenarios. This framework provides flexible and scalable validation across multiple metrics and benchmarking methods.

The following metrics from the TYNDP 2024 Scenarios report are considered relevant for benchmarking:

  • Exogenous Inputs:
    • Fig. 5-10
      • Benchmark Final Energy demand by fuel, EU27 (TWh), (Fig 5, p24 and Fig 51, p63)
      • Benchmark Electricity demand per sector, EU27 (TWh), (Fig 6, p25 and Fig 52, p63)
      • Benchmark Methane demand by sector, EU27 (TWh), (Fig 8, p27 and Fig 53, p64)
      • Benchmark Hydrogen demand by sector, EU27 (TWh), (Fig 10, p28 and Fig 54, p64)
  • Investment and dispatch modelling outputs:
    • Fig. 13-40
      • Benchmark of net installed capacity for electricity generation, EU27 (GW), (Fig 25, p39 and Fig 55, p65)
      • Benchmark of electricity generation, EU27 (TWh), (Fig 26, p39 and Fig 56, p65)
      • Benchmark methane supply, EU27 (TWh), (Fig 32, p45 and Fig 57, p66)
      • Benchmark hydrogen supply, EU27 (TWh), (Fig 33, p46 and Fig 58, p67)
      • Benchmark biomass supply, EU27 (TWh), (Fig 59, p67)
      • Benchmark energy imports, EU27 (TWh), (Fig 40, p51 and Fig 60, p68)
      • Hourly generation profile of power generation, Fig 30, p35

The data is published in the Scenarios package.

This PR is based on the methodology proposed by Wen et al. (2022). This methodology provides a multi-criteria approach to ensure:

  • the diversity (each indicator have its own added value),
  • the effectiveness (each indicator provides essential and correct information),
  • the robustness (against diverse units and orders of magnitude), and
  • the compatibility (can be used to compare across countries) of the selected set of indicators.

This methodology defines the following indicators:

  • Missing: Count of carriers / sectors dropped due to missing values
  • sMPE (Symmetric Mean Percentage Error): Indicates the direction of the deviation between modeled scenarios and TYNDP 2024 outcomes, showing if the output is overall overestimated or underestimated.
  • sMAPE (Symmetric Mean Absolute Percentage Error): Indicates the absolute magnitude of the deviations, avoiding the cancellation of negative and positive errors.
  • sMdAPE (Symmetric Median Absolute Percentage Error): Provides skewness information to complement sMAPE.
  • RMSLE (Root Mean Square Logarithmic Error): Complements the percentage errors since it shows the logarithmic deviation values.
  • Growth error: Show the error on temporal scale. This indicator is ignored for dynamic time series (i.e. hourly generation profiles).

Hourly time series from the TYNDP 2024 will be aggregated to match the temporal resolution of Open-TYNDP.

Summary tables are computed for both the overall and per-carrier results.

Tasks

  • Validate architecture
  • Implement demonstrator
  • Decide if method configuration is needed
  • Use standard retrieve method instead of custom
  • Simplify configuration structure
  • Distinguish build and make benchmark more clearly
  • Address the discrepancies in the profiles when using time aggregation
  • Add plotting rule
  • Overall indicators misleading when substantial missing data
  • Test for all the scenarios
  • Assess if default values from the configuration are still relevant
  • Document the new configurations (incl. overview of outputs files and release note)
  • Add version to outputs
  • Clean log
  • Filter out EU27 statistics

Workflow

  1. New configuration files config/benchmarking.default.yaml.
  2. retrieve_additional_tyndp_data: Retrieve the TYNDP 2024 Scenarios Report Data Figures package for benchmarking purposes. This rule will be deprecated once the data bundle has been updated (Update TYNDP 2024 data bundle on Zenodo #87).
  3. (new) clean_tyndp_benchmark: Read and process the raw TYNDP 2024 Scenarios Report data. The output data structure is a long-format table.
  4. (new) build_statistics: Compute the benchmark statistics from the optimised network. Run for every planning horizon. The output data structure is a long-format table.
    • This rule takes loss factors into account for the electricity demand. Loss factors from the Supply Tool are assumed to be the correct one.
  5. (new) make_benchmark: Compute accuracy indicators for comparing model results against reference data from TYNDP 2024.
  6. (new) make_benchmarks to collect make_benchmark outputs
  7. (new) plot_benchmark: Generate visualisation outputs for model validation
  8. (new) plot_benchmarks to collect plot_benchmarks outputs
  9. The full set of files produced for the benchmarking are stored in the results/validation/ folder. This includes:
    • results/validation/resources/ for processed inputs information from both Open-TYNDP and TYNDP 2024.
    • results/validation/csvs_s_{clusters}_{opts}_{sector_opts}_all_years/ for quantitive information for each table
    • results/validation/graphics_s_{clusters}_{opts}_{sector_opts}_all_years/ for figures of each table
    • results/validation/kpis_eu27_s_{clusters}_{opts}_{sector_opts}_all_years.csv as summary table
    • results/validation/kpis_eu27_s_{clusters}_{opts}_{sector_opts}_all_years.pdf as summary figure
    • the structure of these outputs can be validated in the artifacts of the GitHub CI (e.g. artifacts section here)

image

Open Issues

  • The planning year of the generation profiles is not documented. For the time being, this PR assumes 2040.
  • Two sources for loss factors have been identified : the Supply Tool (sheet Other data and Conversions, starting at line 215) and the Annex VI of the Scenarios Methodology Report, p. 117. While most of the values are identical, two significant discrepancies have been observed for both DE00 and EE00. From the available information, it is also unclear whether the same loss factor is used for all the nodes in countries with multiple nodes (such as Luxembourg).
    • DE00: 0.05 (report) and 0.03 (supply tool) in 2030
    • EE00: 0.00 (report) and 0.07 (supply tool) in 2030

Notes

The statistics extracted from the network represent a first estimation of all metrics to benchmark. It is accepted that depending on the actual implementation of remaining features, the statistics extraction may require revision. Notes are kept in the code for later review and further improvement.

The preliminary observations of the DE scenario, using a temporal resolution of 720SEG, are summarised below.

Table Preliminary observations
Final energy demand • Incomplete demand coverage identified
• Demand requires validation
• Heat and biofuels require mapping
• Climate year mismatch: model uses 2013 data, available benchmark uses 2009
Electricity demand • Only aggregated value can be compared
• NT are close but the match is not perfect
• Transport and prosumer demand not yet incorporated for DE and GA
Climate year mismatch: model uses 2013 data, available benchmark uses 2009 solved with #109
Methane demand • Sectoral mapping incomplete
• Energy and non-energy industrial uses require disaggregation
Hydrogen demand • Coverage gaps in multiple sectors
• Energy and non-energy industrial uses require disaggregation
• Aviation hydrogen demand not modelled
Power capacity • Renewables capacities matches with #115
• "Small scale res" category requires specification
• Demand shedding not yet implemented
Power generation • Offshore wind generation exceeds expected values
• Additional generation sources require improvements
Methane supply • Supply coverage incomplete
• Domestic production and import sources require distinction
Hydrogen supply • "Low-carbon" and "renewable import" categories need clarification
• Supply modelling incomplete
Biomass supply • Supply appears underestimated
• Mapping complete
Energy imports • Methane import disaggregation limited by data aggregation
• No biomass import assumed
• Import coverage incomplete
Generation profiles • Climate year mismatch: model uses 2013 data, available benchmark uses 2009

Example of indicators extracted from kpis_eu27_s_all__all_years.csv for NT scenario with 45SEG:

Table sMPE sMAPE sMdAPE RMSLE Growth Error Missing version
Final energy demand -0.20 0.33 0.23 0.45 0.01 6 v0.2+gb167cb1
Electricity demand 0.02 0.02 0.02 0.03 0.00 0 v0.2+gb167cb1
Methane demand NA v0.2+gb167cb1
Hydrogen demand -0.53 0.53 0.52 0.72 10 v0.2+gb167cb1
Power capacity -0.53 0.68 0.36 5.61 -0.01 3 v0.2+gb167cb1
Power generation -0.13 0.82 0.67 3.97 -0.01 2 v0.2+gb167cb1
Methane supply NA v0.2+gb167cb1
Hydrogen supply -0.76 1.13 1.01 9.60 -0.00 5 v0.2+gb167cb1
Biomass supply -1.48 1.48 1.48 4.43 0.51 1 v0.2+gb167cb1
Energy imports -1.34 1.36 2.00 27.07 0.14 2 v0.2+gb167cb1
Generation profiles NA v0.2+gb167cb1
Total (excl. time series) -0.62 0.98 0.82 11.06 0.02 31 v0.2+gb167cb1

Example of indicators extracted from power_generation_s_all__all_years.csv for NT scenario with 45SEG:

Carrier sMPE sMAPE sMdAPE RMSLE Growth Error version
Battery -2.00 2.00 2.00 13.72 0.49 v0.2+gb167cb17f
CHP and small thermal -1.67 1.67 1.67 2.83 -0.17 v0.2+gb167cb17f
Coal + other fossil 0.56 0.56 0.56 0.59 -0.03 v0.2+gb167cb17f
Hydro and pumped storage -0.16 0.16 0.16 0.17 -0.01 v0.2+gb167cb17f
Hydrogen -1.71 1.71 1.71 12.21 1.54 v0.2+gb167cb17f
Methane 0.20 0.20 0.20 0.27 0.03 v0.2+gb167cb17f
Nuclear -0.81 0.81 0.81 0.97 -0.07 v0.2+gb167cb17f
Oil -0.08 0.27 0.27 0.29 0.05 v0.2+gb167cb17f
Solar -0.05 0.05 0.05 0.05 -0.00 v0.2+gb167cb17f
Wind offshore -0.00 0.00 0.00 0.00 0.00 v0.2+gb167cb17f
Wind onshore -0.05 0.05 0.05 0.06 -0.00 v0.2+gb167cb17f
Demand shedding v0.2+gb167cb17f
Small scale res v0.2+gb167cb17f
Biofuels v0.2+gb167cb17f

Example of figure created for the final energy demand for NT scenario in 2040 with 45SEG:
benchmarking_fed_NT_2030

Example of figure created for the generation profiles for DE scenario in 2040 with 720SEG:
benchmarking_gen_profiles_DE_2040

Example of summary figure created for NT scenario
benchmarking_overview_NT

Checklist

  • I tested my contribution locally and it works as intended.
  • Code and workflow changes are sufficiently documented.
  • Changed dependencies are added to envs/environment.yaml.
  • Changes in configuration options are added in config/config.default.yaml.
  • Changes in configuration options are documented in doc/configtables/*.csv.
  • Changes in configuration options are added in config/test/*.yaml.
  • OET license identifier is added to all edited or newly created code files.
  • Sources of newly added data are documented in doc/data_sources.rst.
  • A release note doc/release_notes.rst is added.
  • Major features are listed in README and doc/index.rst.

@tgilon tgilon added this to the Release v0.3 milestone Jul 17, 2025
@tgilon tgilon self-assigned this Jul 17, 2025
@tgilon tgilon linked an issue Jul 17, 2025 that may be closed by this pull request
2 tasks
@daniel-rdt daniel-rdt self-assigned this Jul 17, 2025
@tgilon tgilon added the major feature Major feature for the Open TYNDP. label Jul 23, 2025
@tgilon
Copy link
Member Author

tgilon commented Aug 14, 2025

@daniel-rdt This PR is not ready yet. I still have a bunch of todos. Nevertheless, I'm already happy to receive an early feedback from you.

Copy link
Member

@daniel-rdt daniel-rdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgilon for this inital implementation. The architecture follows a very sensible logic and thanks for documenting everything so thorougly up to here.
I like the idea of using the Wen et al (2022) methodology to assess the backcasting. One idea for the plotting here might be to reproduce a similar graph to what they introduced for their graphical abstract which gives a more visual overview of the overall performance wrt those set of indicators.

I also left some comments and small suggestions (it looks like more than it is, no worries, because it is a lot of related small code suggestions). Some additional high level comments I have:

  • the difference between build_benchmark and make_benchmark is difficult to grasp from the rule names. Maybe we can find a clearer name for one or both of them? Maybe something like build_benchmark_statistics (since this is mainly computing outputs using the statistics module) and make_benchmark_indicators or compare_benchmark_metrics?
  • add documentation / overview of the output files that include the benchmark results to the PR description

Copy link
Member

@daniel-rdt daniel-rdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tgilon. The benchmarking architecture looks good overall. Great work! :)
As this is my second round of review, I appreciate the renaming of the rules and cleaned up configuration logic, it is now much clearer to understand the flow.

I do have a few comments that need to be addressed as I found an issue with the temporal aggregation and the add_loss_factors calculation. The major points that I have are:

  • Fix temporal aggregation
  • Fix get_loss_factors function
  • Add KPI summary figure for all KPIs and / or optionally add new summary figure that combines all KPIs into one
  • Consolidate unit conversion with vercorized version from #97
  • Improve logging in a few places

Copy link
Member

@coroa coroa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, i don't have a final review yet, but to speed up the process let me already add the questions i have instead of bunching them.

Copy link
Member

@coroa coroa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nothing else jumped out at me.

@tgilon tgilon requested review from coroa and daniel-rdt October 2, 2025 21:09
@daniel-rdt
Copy link
Member

LGTM!

@tgilon tgilon merged commit 6609a53 into master Oct 3, 2025
5 checks passed
@tgilon tgilon deleted the feat/31-benchmarks branch October 3, 2025 10:07
@tgilon tgilon mentioned this pull request Nov 3, 2025
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

major feature Major feature for the Open TYNDP.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integration of automated tests and benchmarks

5 participants