Skip to content

hrishikeshrt/LLM-Graph-Repair

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMs for Graph Repair

Code for the research paper "Graph Repairs with Large Language Models: An Empirical Study" published in GRADES-NDA '25: Proceedings of the 8th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) co-located with SIGMOD 2025.

Components

  1. connect.py: Defines Graph class which manages connection to Neo4j Graph Database
  2. dataset.py: Defines GraphDataset class, building on Graph, to provide functions for loading data
  • add_inconsistency_synthea.py: Add controlled inconsistencies to Synthea Dataset
  • dataset_synthea.py: Queries for Synthea Dataset
  • load_synthea.py: Load a dataset
  1. graph.py: Extends networkx.DiGraph to define PropertyGraph class,
  2. inconsistency.py: Find inconsistencies and store them in a pickle with PropertyGraph format
  3. encoding.py: Provides functions for computing text representations of a PG
  4. llm.py: Provides functions for connecting to LLMs and asking questions and getting answers
  5. machine_repair.py: Ask LLM to repair the graph
  6. response_statistics.py: Prepare response statistics

Pipeline

  • Load dataset using python3 load_synthea.py
  • Find inconsistencies using python3 inconsistency.py
  • Control repair parameters in machine_repair.py
  • Query LLMs for graph repair using python3 machine_repair.py
  • Prepare response statistics (generating tables, plots) using python3 response_statistics.py

License

This project is licensed under the terms of the GNU General Public License v3.0. See the LICENSE file for details.

How to Cite?

@inproceedings{10.1145/3735546.3735859,
author = {Terdalkar, Hrishikesh and Bonifati, Angela and Mauri, Andrea},
title = {Graph Repairs with Large Language Models: An Empirical Study},
year = {2025},
isbn = {9798400719233},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3735546.3735859},
doi = {10.1145/3735546.3735859},
abstract = {Property graphs are widely used in domains such as healthcare, finance, and social networks, but they often contain errors due to inconsistencies, missing data, or schema violations. Traditional rule-based and heuristic-driven graph repair methods are limited in their adaptability as they need to be tailored for each dataset. On the other hand, interactive human-in-the-loop approaches may become infeasible when dealing with large graphs, as the cost-both in terms of time and effort-of involving users becomes too high. Recent advancements in Large Language Models (LLMs) present new opportunities for automated graph repair by leveraging contextual reasoning and their access to real-world knowledge. We evaluate the effectiveness of six open-source LLMs in repairing property graphs. We assess repair quality, computational cost, and model-specific performance. Our experiments show that LLMs have the potential to detect and correct errors, with varying degrees of accuracy and efficiency. We discuss the strengths, limitations, and challenges of LLM-driven graph repair and outline future research directions for improving scalability and interpretability.},
booktitle = {Proceedings of the 8th Joint Workshop on Graph Data Management Experiences \& Systems (GRADES) and Network Data Analytics (NDA)},
articleno = {9},
numpages = {10},
keywords = {Graph Repair, Large Language Models, Property Graphs},
location = {Berlin, Germany},
series = {GRADES-NDA '25}
}