Skip to content

tranquilsoftware/SitemapXMLBuddy

Repository files navigation

Sitemap XML Buddy

A powerful command-line utility for updating lastmod timestamps in sitemap.xml files with intelligent logic based on page priority and change frequency.

Features

  • 🤖 Intelligent Updates: Respects <changefreq> and <priority> tags to update only when pages are actually due
  • 📅 Basic Mode: Updates all lastmod timestamps to the current date (legacy behavior)
  • 💾 Creates automatic backups before making any changes
  • ✅ Validates XML structure before and after updates
  • 🚀 Simple command-line interface
  • 👀 Dry-run mode to preview changes
  • 📊 Detailed logging and summary reports

Installation

  1. Clone this repository
  2. Install Python 3.8+
  3. Install dependencies:
    pip install -r requirements.txt

Usage

# Basic mode (updates all lastmod to today)
python main.py sitemap.xml

# Intelligent mode (respects changefreq/priority)
python main.py sitemap.xml --intelligent

# Preview intelligent changes
python main.py sitemap.xml --intelligent --dry-run

# Skip backup creation
python main.py /var/www/sitemap.xml --no-backup

# Preview changes without modifying the file
python main.py sitemap.xml --dry-run

# Show help
python main.py --help

Update Modes

Basic Mode (Default)

Updates all <lastmod> elements to today's date, regardless of when they were last updated or their change frequency.

Intelligent Mode (--intelligent)

Uses smart logic to determine which pages actually need updating based on:

Priority-Based Rules (Takes Precedence)

  • Priority 1.0: Updated daily
  • Priority 0.9: Updated every 2-3 days (randomized)
  • Priority 0.8: Updated every 3-4 days (randomized)

Changefreq-Based Rules (Fallback)

  • always/hourly: Always updated
  • daily: Updated every 1 day
  • weekly: Updated every 4-7 days (randomized)
  • monthly: Updated every 20-30 days (randomized)
  • yearly: Updated every 340-365 days (randomized)
  • never: Never updated

Examples

<!-- This page will be updated daily due to priority=1.0 -->
<url>
  <loc>https://example.com/</loc>
  <lastmod>2025-09-27</lastmod>
  <changefreq>weekly</changefreq>
  <priority>1.0</priority>
</url>

<!-- This page updated 2 days ago with priority=0.9 will be updated (every 2-3 days) -->
<url>
  <loc>https://example.com/services</loc>
  <lastmod>2025-09-27</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.9</priority>
</url>

<!-- This page updated 2 days ago with priority=0.8 will NOT be updated yet (every 3 days) -->
<url>
  <loc>https://example.com/about</loc>
  <lastmod>2025-09-27</lastmod>
  <changefreq>yearly</changefreq>
  <priority>0.8</priority>
</url>

<!-- This page will use changefreq=daily rule (updated if last modified >1 day ago) -->
<url>
  <loc>https://example.com/blog</loc>
  <lastmod>2025-09-25</lastmod>
  <changefreq>daily</changefreq>
  <priority>0.5</priority>
</url>

How It Works

  1. Validation: Validates the input XML file structure
  2. Analysis: In intelligent mode, examines each <url> element's <priority> and <changefreq> tags
  3. Backup: Creates a timestamped backup of the original file
  4. Update Logic:
    • Calculates days since last modification from <lastmod>
    • Applies priority rules first (1.0, 0.9, 0.8)
    • Falls back to changefreq rules for other priorities
    • Only updates pages that are "overdue" based on their settings
  5. Update: Modifies <lastmod> elements that meet update criteria
  6. Verification: Validates the updated XML structure
  7. Rollback: If anything fails, restores from backup

File Structure

sitemap_updater/
├── main.py              # Entry point and CLI interface
├── sitemap_updater.py   # Core SitemapUpdater class with intelligent logic
├── xml_validator.py     # XML validation functionality
├── xml_utils.py         # XML processing utilities
├── backup_manager.py    # Backup creation and restoration
├── logger_config.py     # Logging configuration
└── utils.py            # Utility functions

Date Format Support

The tool supports multiple <lastmod> date formats commonly found in sitemaps:

  • 2023-12-25 (ISO date)
  • 2023-12-25T10:30:00Z (ISO datetime with Z)
  • 2023-12-25T10:30:00+00:00 (ISO datetime with timezone)
  • 2023-12-25T10:30:00 (ISO datetime)
  • 2023-12-25 10:30:00 (Space-separated)

Requirements

  • Python 3.8+
  • No external dependencies (uses only standard library)

Benefits of Intelligent Mode

  • SEO Friendly: Avoids signaling to search engines that all pages change daily
  • Natural Update Patterns: Randomized intervals prevent predictable, bot-like update schedules
  • Realistic Updates: Only updates pages when they're actually due based on their change patterns
  • Resource Efficient: Reduces unnecessary processing for search engine crawlers
  • Content Accuracy: Maintains realistic last-modification dates based on actual content update frequency
  • Unpredictable Timing: Random intervals within ranges make updates appear more organic to search engines

License

Made with ❤️ by Tranquil Software

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages