Read more about the project, the functional and the technical architecture in the technical and functional design.
This project automatically detects rooftop solar panels on Dutch homes using aerial imagery.
Key features:
- Automated Data Collection: Retrieves fresh, geo-referenced house-level images across the Netherlands
- YOLO-based Inference Service: Runs object detection per House ID and returns presence/absence plus confidence scores
- Continuous MLOps Pipeline: Retrains and redeploys the model whenever new labeled data arrive, keeping accuracy up-to-date
- Docker
- Docker Compose
- IDE (e.g., VSCode)
- Access the cloud environment at Portainer Dashboard
- Login with credentials:
- Username:
admin
- Password:
1qaz!QAZ123456
- Username:
- View all services from the Portainer dashboard by clicking on the Containers tab in the left sidebar
Note: Data collection has known issues in the cloud environment. See Known Issues section.
-
Clone the repository:
git clone https://gitlab.com/saxionnl/master-ict-se/dataops/2024-2025/group-02/02.git cd 02
Note: The .env file is included in the repository for ease of setup. In a production environment, this file should not be committed to version control.
-
Start the services using Docker Compose:
docker-compose up --build -d
-
Access FastAPI Gateway:
- URL: http://localhost:8000/docs
- Use the Swagger UI to run the data collection endpoint
- For collection via city code:
- Find city codes here
- Input the Gemeentecode with prefix
- For collection via VIDs:
- Transform VID format from
153010000328605.0
to0153010000328605
(add leading 0, remove decimal)
- Transform VID format from
-
View collected data in MinIO:
- URL: http://localhost:9001
- Credentials:
minioadmin:minioadmin
- Navigate to the
inference-data
bucket to see scraped data
-
IMPORTANT --> SET the variable to develop or deploy mode
- DEV_MODE: This variable controls the mode of operation.
- Set
DEV_MODE=True
for development mode, which enables testing and debugging features. - Set
DEV_MODE=False
for deployment mode, which optimizes the system for production use.
-
Access Airflow Webserver:
- URL: http://localhost:8080
- Credentials:
admin:admin
-
Dataset Preparation:
- Upload your training dataset to MinIO:
- Access MinIO at http://localhost:9001 (credentials:
minioadmin:minioadmin
) - Navigate to the
training-data
bucket - Create folders named
images
andlabels
if they don't exist - Upload images (.jpg/.png) to the
images
folder - Upload corresponding YOLO format labels (.txt) to the
labels
folder
- Access MinIO at http://localhost:9001 (credentials:
- The label files must follow YOLO format: one line per object with
class x_center y_center width height
- Each label file should have the same name as its corresponding image file (different extension)
- A seeded sample dataset (images and labels) is available in the
mlflow
bucket under/data/raw
directory
- Upload your training dataset to MinIO:
-
Execute
1-split_traintest
DAG at http://localhost:8080:- This DAG splits the dataset into training and testing sets
- It creates train/val folders in MinIO with appropriate distribution of data
- Verify the split in MinIO under the
training-data/train
andtraining-data/val
folders
-
Execute
2-train_yolo
DAG at http://localhost:8080:- This DAG trains the YOLO model using the prepared datasets
- Training parameters are configured in the DAG file
- Model checkpoints are saved during training
- The best model is selected based on validation performance
- Training metrics are logged and can be viewed in the Airflow logs
-
Model Persistence:
- The best model weights are saved to the
models
bucket in MinIO - The latest model is automatically used for inference
- Previous model versions are retained for comparison and rollback if needed
- The best model weights are saved to the
-
Access Airflow Webserver:
- URL: http://localhost:8080
- Credentials:
admin:admin
- Trigger the
batch_detection
DAG manually
-
View results in MinIO:
- URL: http://localhost:9001
- Credentials:
minioadmin:minioadmin
- Navigate to the
inference-data
bucket - Result images are in the
detection-results
folder - Confidence results, house IDs, and image URLs are stored in
house_id_results.csv
- Cloud Data Collection: The FastAPI service in the cloud environment cannot send requests to the
solarpanel_detection_service
container. It is recommended to run data collection and inference processes locally. - Portainer First-Time Access: On first local access, Portainer may fail to start. Restart the container through your Docker dashboard if needed.
The project is deployed on AWS with the following static IP configuration:
- Public IP: 3.88.102.215
- Private IP: 172.31.21.44
Monitor container status through Portainer:
- Cloud: https://3.88.102.215:9443 (
admin:1qaz!QAZ123456
) - Local: http://localhost:9443
- First-time local setup requires creating a new user and password
- URL: http://localhost:9001
- Credentials:
minioadmin:minioadmin
- URL: http://localhost:8080
- Credentials:
admin:admin