-
Notifications
You must be signed in to change notification settings - Fork 4
feat: introduce benchmarking framework #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Conflicts: Snakefile rules/collect.smk
Conflicts: rules/collect.smk
|
@daniel-rdt This PR is not ready yet. I still have a bunch of todos. Nevertheless, I'm already happy to receive an early feedback from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tgilon for this inital implementation. The architecture follows a very sensible logic and thanks for documenting everything so thorougly up to here.
I like the idea of using the Wen et al (2022) methodology to assess the backcasting. One idea for the plotting here might be to reproduce a similar graph to what they introduced for their graphical abstract which gives a more visual overview of the overall performance wrt those set of indicators.
I also left some comments and small suggestions (it looks like more than it is, no worries, because it is a lot of related small code suggestions). Some additional high level comments I have:
- the difference between
build_benchmarkandmake_benchmarkis difficult to grasp from the rule names. Maybe we can find a clearer name for one or both of them? Maybe something likebuild_benchmark_statistics(since this is mainly computing outputs using the statistics module) andmake_benchmark_indicatorsorcompare_benchmark_metrics? - add documentation / overview of the output files that include the benchmark results to the PR description
Co-authored-by: Daniel Rüdt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tgilon. The benchmarking architecture looks good overall. Great work! :)
As this is my second round of review, I appreciate the renaming of the rules and cleaned up configuration logic, it is now much clearer to understand the flow.
I do have a few comments that need to be addressed as I found an issue with the temporal aggregation and the add_loss_factors calculation. The major points that I have are:
- Fix temporal aggregation
- Fix get_loss_factors function
- Add KPI summary figure for all KPIs and / or optionally add new summary figure that combines all KPIs into one
- Consolidate unit conversion with vercorized version from #97
- Improve logging in a few places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, i don't have a final review yet, but to speed up the process let me already add the questions i have instead of bunching them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nothing else jumped out at me.
Co-authored-by: Daniel Rüdt <[email protected]> Co-authored-by: Jonas Hörsch <[email protected]>
for more information, see https://pre-commit.ci
Co-authored-by: Jonas Hörsch <[email protected]>
|
LGTM! |
Closes #31.
Changes proposed in this Pull Request
This PR introduces a benchmarking framework for continuous and systematic validation of Open TYNDP model outputs against TYNDP 2024 scenarios. This framework provides flexible and scalable validation across multiple metrics and benchmarking methods.
The following metrics from the TYNDP 2024 Scenarios report are considered relevant for benchmarking:
The data is published in the Scenarios package.
This PR is based on the methodology proposed by Wen et al. (2022). This methodology provides a multi-criteria approach to ensure:
This methodology defines the following indicators:
Hourly time series from the TYNDP 2024 will be aggregated to match the temporal resolution of Open-TYNDP.
Summary tables are computed for both the overall and per-carrier results.
Tasks
methodconfiguration is neededWorkflow
config/benchmarking.default.yaml.retrieve_additional_tyndp_data: Retrieve the TYNDP 2024 Scenarios Report Data Figures package for benchmarking purposes. This rule will be deprecated once the data bundle has been updated (Update TYNDP 2024 data bundle on Zenodo #87).clean_tyndp_benchmark: Read and process the raw TYNDP 2024 Scenarios Report data. The output data structure is a long-format table.build_statistics: Compute the benchmark statistics from the optimised network. Run for every planning horizon. The output data structure is a long-format table.make_benchmark: Compute accuracy indicators for comparing model results against reference data from TYNDP 2024.make_benchmarksto collectmake_benchmarkoutputsplot_benchmark: Generate visualisation outputs for model validationplot_benchmarksto collectplot_benchmarksoutputsresults/validation/folder. This includes:results/validation/resources/for processed inputs information from both Open-TYNDP and TYNDP 2024.results/validation/csvs_s_{clusters}_{opts}_{sector_opts}_all_years/for quantitive information for each tableresults/validation/graphics_s_{clusters}_{opts}_{sector_opts}_all_years/for figures of each tableresults/validation/kpis_eu27_s_{clusters}_{opts}_{sector_opts}_all_years.csvas summary tableresults/validation/kpis_eu27_s_{clusters}_{opts}_{sector_opts}_all_years.pdfas summary figureOpen Issues
Other data and Conversions, starting at line 215) and the Annex VI of the Scenarios Methodology Report, p. 117. While most of the values are identical, two significant discrepancies have been observed for both DE00 and EE00. From the available information, it is also unclear whether the same loss factor is used for all the nodes in countries with multiple nodes (such as Luxembourg).Notes
The statistics extracted from the network represent a first estimation of all metrics to benchmark. It is accepted that depending on the actual implementation of remaining features, the statistics extraction may require revision. Notes are kept in the code for later review and further improvement.
The preliminary observations of the DE scenario, using a temporal resolution of 720SEG, are summarised below.
• Demand requires validation
• Heat and biofuels require mapping
• Climate year mismatch: model uses 2013 data, available benchmark uses 2009
• NT are close but the match is not perfect
• Transport and prosumer demand not yet incorporated for DE and GA
•
Climate year mismatch: model uses 2013 data, available benchmark uses 2009solved with #109• Energy and non-energy industrial uses require disaggregation
• Energy and non-energy industrial uses require disaggregation
• Aviation hydrogen demand not modelled
• "Small scale res" category requires specification
• Demand shedding not yet implemented
• Additional generation sources require improvements
• Domestic production and import sources require distinction
• Supply modelling incomplete
• Mapping complete
• No biomass import assumed
• Import coverage incomplete
Example of indicators extracted from
kpis_eu27_s_all__all_years.csvfor NT scenario with 45SEG:Example of indicators extracted from
power_generation_s_all__all_years.csvfor NT scenario with 45SEG:Example of figure created for the final energy demand for NT scenario in 2040 with 45SEG:

Example of figure created for the generation profiles for DE scenario in 2040 with 720SEG:

Example of summary figure created for NT scenario

Checklist
envs/environment.yaml.config/config.default.yaml.doc/configtables/*.csv.config/test/*.yaml.doc/data_sources.rst.doc/release_notes.rstis added.READMEanddoc/index.rst.