A powerful command-line utility for updating lastmod
timestamps in sitemap.xml files with intelligent logic based on page priority and change frequency.
- 🤖 Intelligent Updates: Respects
<changefreq>
and<priority>
tags to update only when pages are actually due - 📅 Basic Mode: Updates all
lastmod
timestamps to the current date (legacy behavior) - 💾 Creates automatic backups before making any changes
- ✅ Validates XML structure before and after updates
- 🚀 Simple command-line interface
- 👀 Dry-run mode to preview changes
- 📊 Detailed logging and summary reports
- Clone this repository
- Install Python 3.8+
- Install dependencies:
pip install -r requirements.txt
# Basic mode (updates all lastmod to today)
python main.py sitemap.xml
# Intelligent mode (respects changefreq/priority)
python main.py sitemap.xml --intelligent
# Preview intelligent changes
python main.py sitemap.xml --intelligent --dry-run
# Skip backup creation
python main.py /var/www/sitemap.xml --no-backup
# Preview changes without modifying the file
python main.py sitemap.xml --dry-run
# Show help
python main.py --help
Updates all <lastmod>
elements to today's date, regardless of when they were last updated or their change frequency.
Uses smart logic to determine which pages actually need updating based on:
- Priority 1.0: Updated daily
- Priority 0.9: Updated every 2-3 days (randomized)
- Priority 0.8: Updated every 3-4 days (randomized)
always
/hourly
: Always updateddaily
: Updated every 1 dayweekly
: Updated every 4-7 days (randomized)monthly
: Updated every 20-30 days (randomized)yearly
: Updated every 340-365 days (randomized)never
: Never updated
<!-- This page will be updated daily due to priority=1.0 -->
<url>
<loc>https://example.com/</loc>
<lastmod>2025-09-27</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<!-- This page updated 2 days ago with priority=0.9 will be updated (every 2-3 days) -->
<url>
<loc>https://example.com/services</loc>
<lastmod>2025-09-27</lastmod>
<changefreq>monthly</changefreq>
<priority>0.9</priority>
</url>
<!-- This page updated 2 days ago with priority=0.8 will NOT be updated yet (every 3 days) -->
<url>
<loc>https://example.com/about</loc>
<lastmod>2025-09-27</lastmod>
<changefreq>yearly</changefreq>
<priority>0.8</priority>
</url>
<!-- This page will use changefreq=daily rule (updated if last modified >1 day ago) -->
<url>
<loc>https://example.com/blog</loc>
<lastmod>2025-09-25</lastmod>
<changefreq>daily</changefreq>
<priority>0.5</priority>
</url>
- Validation: Validates the input XML file structure
- Analysis: In intelligent mode, examines each
<url>
element's<priority>
and<changefreq>
tags - Backup: Creates a timestamped backup of the original file
- Update Logic:
- Calculates days since last modification from
<lastmod>
- Applies priority rules first (1.0, 0.9, 0.8)
- Falls back to changefreq rules for other priorities
- Only updates pages that are "overdue" based on their settings
- Calculates days since last modification from
- Update: Modifies
<lastmod>
elements that meet update criteria - Verification: Validates the updated XML structure
- Rollback: If anything fails, restores from backup
sitemap_updater/
├── main.py # Entry point and CLI interface
├── sitemap_updater.py # Core SitemapUpdater class with intelligent logic
├── xml_validator.py # XML validation functionality
├── xml_utils.py # XML processing utilities
├── backup_manager.py # Backup creation and restoration
├── logger_config.py # Logging configuration
└── utils.py # Utility functions
The tool supports multiple <lastmod>
date formats commonly found in sitemaps:
2023-12-25
(ISO date)2023-12-25T10:30:00Z
(ISO datetime with Z)2023-12-25T10:30:00+00:00
(ISO datetime with timezone)2023-12-25T10:30:00
(ISO datetime)2023-12-25 10:30:00
(Space-separated)
- Python 3.8+
- No external dependencies (uses only standard library)
- SEO Friendly: Avoids signaling to search engines that all pages change daily
- Natural Update Patterns: Randomized intervals prevent predictable, bot-like update schedules
- Realistic Updates: Only updates pages when they're actually due based on their change patterns
- Resource Efficient: Reduces unnecessary processing for search engine crawlers
- Content Accuracy: Maintains realistic last-modification dates based on actual content update frequency
- Unpredictable Timing: Random intervals within ranges make updates appear more organic to search engines
Made with ❤️ by Tranquil Software