Species Trait Data Compilation

This project automates the retrieval and compilation of species-specific biological trait data by integrating biodiversity APIs with large language models. It is designed to scale from focused case studies to generalized, cross-species analyses.

1. Frog Analysis

The first phase of this project demonstrates a deep dive into amphibians (frogs) as a proof of concept.

Uses multiple APIs to collect ecological and biological information:
- AmphibiaWeb: morphology and reproductive traits (snout–vent length, clutch size, egg diameter)
- IUCN Red List: elevation ranges and habitat categories
- World Bank CCKP: temperature and rainfall statistics
Automates retrieval of structured (API) and semi-structured (XML parsed via GPT-4o) data
Compiles outputs into a clean CSV/Excel dataset with traits like morphology, reproduction, climate, and altitude

2. Generalized Data Pipeline

The system then expands into a general-purpose trait extraction pipeline.

Uses Europe PMC / PubMed Central (PMC) to query scientific literature
Retrieves PDFs, parses them, and applies LLM-based extraction prompts to pull out traits such as diet, size, habitat, or environmental associations
Works for any list of species and any set of traits, driven by an Excel file and trait description mapping
Provides a UI for easy use, supporting batch processing across taxa

How to Use

Run the GUI script:

python3 Data-Compilation-Model/02_generic_data_compilation/scripts/gui.py

In the popup window:
- Upload your Excel file: first column = species, remaining columns = traits.
- Upload your trait descriptions text file: UTF-16 encoded; each line in the format trait: description.
Start extraction:
- Click Start Data Extraction.
- The system will query APIs, fetch papers, and extract trait data.
- Results will be saved to: Data-Compilation-Model/02_generic_data_compilation/results/

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
01_frog_data_compilation		01_frog_data_compilation
02_generic_data_compilation		02_generic_data_compilation
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Species Trait Data Compilation

1. Frog Analysis

2. Generalized Data Pipeline

How to Use

About

Uh oh!

Releases

Packages

Languages

harpak-lab/Data-Compilation-Model

Folders and files

Latest commit

History

Repository files navigation

Species Trait Data Compilation

1. Frog Analysis

2. Generalized Data Pipeline

How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages