A deep learning solution for automated sclera segmentation in eye images using Mask R-CNN. Optimized for large datasets (1600+ images) with memory-efficient processing and training resume capabilities.
- Mask R-CNN architecture using ResNet50 backbone
- On-demand image loading for efficient memory usage
- Training with 1600+ images with memory optimization
- Automatic mini-mask generation for reduced GPU memory requirements
- Resume training capability from any checkpoint
- Detailed progress tracking and error handling
- Dynamic configuration based on dataset size
- Python 3.7+
- TensorFlow 2.x
- CUDA and cuDNN (for GPU acceleration)
- 6GB+ GPU VRAM (tested on RTX 3060)
-
Clone this repository:
git clone https://github.com/yourusername/sclera-segmentation.git cd sclera-segmentation
-
Create and activate a virtual environment:
# Using conda conda create -n sclera-gpu-final python=3.7 conda activate sclera-gpu-final # OR using venv python -m venv sclera-env # On Windows .\sclera-env\Scripts\activate # On Linux/Mac source sclera-env/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Download the COCO pre-trained weights:
# Option 1: Direct download wget https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 # Option 2: Manual download # Download from: https://github.com/matterport/Mask_RCNN/releases/download/v2.0/mask_rcnn_coco.h5 # Place in the root directory of the project
- OS: Windows 10/11, Ubuntu 20.04 LTS
- Python: 3.7 (recommended, tested)
- CUDA: 11.0
- cuDNN: 8.0
- GPU: NVIDIA RTX 3060 with 6GB VRAM
- TensorFlow: 2.4.1
- Keras: 2.4.3
- Minimum: NVIDIA GPU with 6GB VRAM (GTX 1060 or better)
- Recommended: NVIDIA GPU with 8GB+ VRAM (RTX 2070 or better)
- CPU-only: Possible but extremely slow (not recommended)
- RAM: 16GB minimum, 32GB recommended for full dataset
- These specific versions are thoroughly tested with Python 3.7
- Compatible with CUDA 11.0 and cuDNN 8.0
- Verified working with RTX 3060 (6GB) GPU
- For newer GPUs (RTX 30 series, RTX 40 series), you may need more recent TensorFlow versions
- Apple M1/M2 chips require TensorFlow-macos and different setup instructions
- Memory optimization is critical for GPUs with less than 8GB VRAM
The code expects your dataset to be organized in the following structure:
sclera_dataset/
├── train/
│ ├── images/
│ │ ├── img1.jpg
│ │ ├── img2.jpg
│ │ └── ...
│ └── masks/
│ ├── img1.jpg
│ ├── img2.jpg
│ └── ...
└── val/
├── images/
│ ├── img3.jpg
│ ├── img4.jpg
│ └── ...
└── masks/
├── img3.jpg
├── img4.jpg
└── ...
Notes on masks:
- Masks should be binary (white for sclera, black for background)
- Masks must have the same filename as their corresponding images
- Supported formats: JPG, PNG, JPEG (case insensitive)
- Image Dimensions: The model is configured for 512×512 pixel images. Resize your images if needed.
- Train/Val Split: Use approximately 80% for training and 20% for validation.
- Large Datasets: For 1600+ images, the script is already optimized with memory-efficient loading.
-
For standard training with recommended parameters (30 epochs for heads, 100 for all layers):
python train_mask_rcnn.py --epochs_heads 30 --epochs_all 100
-
To train on a subset of data (useful for testing):
# Train on 30% of the dataset python train_mask_rcnn.py --subset 0.3 --epochs_heads 10 --epochs_all 20
If training was interrupted or you want to continue from a checkpoint you have 2 options:
- Using the resume_training.py script (modify the parameters according to your needs): script will automatically detect which phase of training to resume and continue appropriately.
python resume_training.py --log_dir logs/sclera_20250422_2324 --subset 0.3 --epochs_heads 30 --epochs_all 100
2-using the train_mask_rcnn.py script (modify the parameters according to your needs): you should specify the path to the checkpoint file you want to resume from.
python train_mask_rcnn.py --resume logs/sclera_20250422_2324/checkpoint_15.h5 --subset 0.3 --epochs_heads 30 --epochs_all 100
The script will automatically detect which phase of training to resume and continue appropriately.
Argument | Default | Description |
---|---|---|
--epochs_heads |
30 | Number of epochs to train heads layers |
--epochs_all |
100 | Number of epochs to train all layers |
--subset |
1.0 | Fraction of dataset to use (0.0-1.0) |
--resume |
None | Path to checkpoint file for resuming training |
--resume_epoch |
Auto | Epoch to resume from (optional, detected from filename) |
The training proceeds in two phases:
- Heads Training: Trains only the network heads for
epochs_heads
epochs. - All Layers: Trains the entire network for
epochs_all
epochs.
During training, the script will:
- Save checkpoints every 5 epochs
- Log training metrics to a CSV file
- Display progress in the console
- Automatically handle large datasets with memory optimization
The script includes several optimizations for large datasets:
- On-demand Loading: Images are only loaded when needed rather than all at once
- Mini-masks: Uses smaller in-memory masks (128×128) to reduce GPU memory usage
- Dynamic Steps: Automatically calculates optimal steps per epoch based on dataset size
- Progress Tracking: Shows progress when processing large batches of images
To further optimize memory usage for large datasets:
- Adjust Image Size: If needed, reduce
IMAGE_MIN_DIM
andIMAGE_MAX_DIM
in ScleraConfig - Reduce Mini-mask Size: For extreme memory constraints, reduce
MINI_MASK_SHAPE
to (64, 64) - Use Image Generator: If experiencing OOM errors, modify data loading to use tf.data.Dataset
- Clear Cache: Between training sessions, clear CUDA cache and restart Python
- Monitor Memory: Use
nvidia-smi
during training to monitor VRAM usage
To evaluate a trained model and perform inference, use the provided inference script:
python inference.py --weights logs/sclera_20250422_2324/checkpoint_130.h5 --image test_images/eye1.jpg
This will output the segmentation mask and overlay it on the original image.
This project includes a comprehensive model evaluation tool to determine which checkpoint performs best on your test data.
To find the best-performing model among all checkpoints, use the evaluate_models.py
script:
python evaluate_models.py --checkpoint_dir logs/sclera_20250423_1100 --test_dir sclera_dataset/test
Argument | Description |
---|---|
--checkpoint_dir |
Directory containing model checkpoints (.h5 files) |
--test_dir |
Directory containing test images and masks |
--output_dir |
Directory to save evaluation results (default: 'evaluation_results') |
--top_n |
Number of top models to show in detail (default: 3) |
The script expects your test data to be organized either as:
sclera_dataset/test/
├── images/
│ ├── img1.jpg
│ ├── img2.jpg
│ └── ...
└── masks/
├── img1.jpg
├── img2.jpg
└── ...
Or directly in the test folder with masks in a parallel directory:
sclera_dataset/
├── test/
│ ├── img1.jpg
│ ├── img2.jpg
│ └── ...
└── masks/
├── img1.jpg
├── img2.jpg
└── ...
The evaluation calculates several metrics for each model:
- IoU (Intersection over Union): Measures overlap between predicted and ground truth masks
- Precision: Ratio of correctly predicted sclera pixels to all predicted sclera pixels
- Recall: Ratio of correctly predicted sclera pixels to all actual sclera pixels
- F1 Score: Harmonic mean of precision and recall
The script generates the following outputs:
-
CSV Report: Complete metrics for all models
evaluation_results/model_evaluation_results.csv
-
Performance Graphs: Visual comparison of models across epochs
evaluation_results/model_comparison.png
-
Visual Examples: Sample predictions for each model
evaluation_results/visualizations/checkpoint_XX/
-
Terminal Output: Summary of best models with their metrics
The evaluation ranks models by IoU score, which measures how well the predicted sclera masks overlap with the ground truth. The model with the highest IoU score typically provides the most accurate segmentation.
Example terminal output:
=== BEST MODEL ===
Model: checkpoint_85.h5
Average IoU: 0.9278
Average F1 Score: 0.9624
Full path: logs/sclera_20250423_1100/checkpoint_85.h5
Once you've identified the best model, you can use it for inference on new images.
-
Out of Memory Errors:
- Try reducing batch size by modifying
IMAGES_PER_GPU
in ScleraConfig - Use
--subset
to train on a portion of your data - Ensure GPU memory growth is enabled (already in the code)
- Try reducing batch size by modifying
-
Training Interrupted:
- Use
--resume
to continue from the last checkpoint - If the error persists, check the detailed error message and traceback
- Use
-
Mask Issues:
- Ensure masks are binary (not grayscale)
- Verify that image and mask filenames match exactly
- Check mask dimensions match the corresponding images
-
Slow Training:
- Verify GPU is being utilized (nvidia-smi)
- Check if GPU memory is sufficient for your batch size
- Consider using data augmentation for better results with fewer epochs
-
TensorFlow/CUDA Compatibility:
- For newer GPUs, ensure you're using compatible TensorFlow/CUDA versions
- Reference NVIDIA's compatibility matrix for your specific GPU model
- Checkpoints are saved every 5 epochs in the
logs/sclera_timestamp/
directory - Training logs are saved in CSV format for easy analysis
- When training is interrupted (Ctrl+C), the script will display the exact command to resume
This project includes several utility scripts to help with setup, monitoring, and training management:
-
verify_gpu.py: Checks if TensorFlow can detect and use your GPU properly
python verify_gpu.py # Output: Shows detected GPUs, CUDA version, and TensorFlow configuration
-
check_mrcnn.py: Verifies Mask R-CNN installation and dependency compatibility
python check_mrcnn.py # Output: Confirms if all Mask R-CNN dependencies are correctly installed
- prepare_dataset.py: Prepares and organizes your sclera images for training
# Create train/validation split from raw images python prepare_dataset.py --input_dir raw_images --output_dir sclera_dataset --val_split 0.2 # Resize images to model dimensions python prepare_dataset.py --resize 512 --input_dir original_dataset --output_dir sclera_dataset
-
train_mask_rcnn.py: Main training script for GPU training (detailed above)
-
train_cpu.py: Alternative training script optimized for CPU-only environments
python train_cpu.py --epochs_heads 10 --epochs_all 30 # Note: Training on CPU is ~10-20x slower than GPU
-
resume_training.py: Helper script to easily resume training from latest checkpoint
# Automatically finds latest checkpoint and resumes training python resume_training.py --log_dir logs/sclera_20250422_2324
- monitor_gpu.py: Real-time monitoring of GPU utilization during training
# Run in a separate terminal while training is in progress python monitor_gpu.py # Save monitoring data to CSV file python monitor_gpu.py --output gpu_stats.csv --interval 5
These utility scripts make it easier to set up your environment, prepare your dataset, and manage the training process. They are especially helpful when working with large datasets or when troubleshooting GPU-related issues.
If you use this project in your research, please cite it as follows:
@software{sclera-maskrcnn,
author = {Akbari Saeed, Taher},
title = {Sclera Segmentation with Mask R-CNN},
year = {2025},
url = {https://github.com/tayden1990/Sclera-Segmentation}
}
- Based on Matterport's Mask R-CNN implementation
- Optimized for large sclera image datasets
This project is licensed under the MIT License. See the LICENSE file for details.
Taher Akbari Saeed
Postgraduate Student in Hematology and Blood Transfusion
Department of Oncology, Hematology, and Radiotherapy
Institute of Postgraduate Education,
Pirogov Russian National Research Medical University (RNRMU), Russia
- Email: [email protected]
- GitHub: tayden1990
- Telegram: tayden2023
- ORCID: 0000-0002-9517-9773
Feel free to reach out for any questions or collaboration opportunities!
Last Updated: 2025-04-22 23:29:24 UTC
Author: tayden1990