|
| 1 | +# Log Parser Scripts |
| 2 | + |
| 3 | +This directory contains a script to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples. |
| 4 | + |
| 5 | +## Files |
| 6 | + |
| 7 | +- `parse-log.go` - Main Go script that performs the log parsing and analysis |
| 8 | +- `README.md` - This documentation file |
| 9 | + |
| 10 | +## Purpose |
| 11 | + |
| 12 | +The script analyzes log files to distinguish between: |
| 13 | + |
| 14 | +1. **Moved Pages**: Pages that appear to be removed and created but are actually the same page moved to a new location within the same project |
| 15 | +2. **Maybe New Pages**: Pages that may be genuinely new additions |
| 16 | +3. **Maybe Removed Pages**: Pages that may be genuinely removed (not moved) |
| 17 | +4. **Applied Usage Examples**: New applied usage examples on maybe new pages only |
| 18 | + |
| 19 | +All results are reported with **project context** to clearly show which project each page belongs to. |
| 20 | + |
| 21 | +## Dependencies |
| 22 | + |
| 23 | +- Go |
| 24 | + |
| 25 | +## How It Works |
| 26 | + |
| 27 | +### Page Movement Detection |
| 28 | + |
| 29 | +A page is considered "moved" if **all three conditions** are met: |
| 30 | + |
| 31 | +1. **Same Project**: The removed page and created page are in the same project |
| 32 | +2. **Same Code Example Count**: The removed page and created page have the same number of code examples |
| 33 | +3. **Shared Segment**: At least one segment of the page ID (separated by `|`) is the same between the removed and created pages |
| 34 | + |
| 35 | +For example: |
| 36 | +- In project `ruby-driver`: `connect|tls` (removed, 6 code examples) → `security|tls` (created, 6 code examples) |
| 37 | +- Same project AND same code examples AND shared segment `tls` → **MOVED** |
| 38 | + |
| 39 | +### Applied Usage Examples Filtering |
| 40 | + |
| 41 | +Applied usage examples are only counted for truly new pages, not for moved pages. This prevents double-counting when pages are reorganized. |
| 42 | + |
| 43 | +### Maybe New and Maybe Removed Pages |
| 44 | + |
| 45 | +Some conditions may cause moved pages to not meet our criteria for "moved" pages: |
| 46 | + |
| 47 | +- Different number of code examples |
| 48 | + - Example: `connect|tls` is a "maybe removed" page and `security|tls` is a "maybe new" page but the removed page has |
| 49 | + 6 code examples and the created page has 7 code examples |
| 50 | +- No shared segments in page IDs |
| 51 | + - Example: `crud|update` is a "maybe removed" page and `write|upsert` is a "maybe new" page. Even if they have the same |
| 52 | + number of code examples, they share no segments in their page IDs so we can't programmatically detect that they're |
| 53 | + the same |
| 54 | + |
| 55 | +Because of these conditions, we can only say that a page is "maybe new" or "maybe removed" and not "moved". A human must |
| 56 | +manually review the "maybe new" and "maybe removed" results to determine if the page is truly new or removed. If it's |
| 57 | +moved, we must manually adjust the count of new applied usage examples to omit the applied usage examples from the |
| 58 | +"maybe new" but actually moved page. |
| 59 | + |
| 60 | +## Usage |
| 61 | + |
| 62 | +**Important**: You must be in the scripts directory to run the Go script directly: |
| 63 | + |
| 64 | +```bash |
| 65 | +# Navigate to the scripts directory first |
| 66 | +cd /Your/Local/Filepath/code-example-tooling/audit/gdcd/scripts |
| 67 | + |
| 68 | +# Then run the Go script |
| 69 | +go run parse-log.go ../logs/2025-09-24-18-01-30-app.log |
| 70 | +go run parse-log.go /absolute/path/to/your/log/file.log |
| 71 | +``` |
| 72 | + |
| 73 | +## Output Format |
| 74 | + |
| 75 | +The script produces four sections: |
| 76 | + |
| 77 | +### 1. MOVED PAGES |
| 78 | +``` |
| 79 | +=== MOVED PAGES === |
| 80 | +MOVED [ruby-driver]: connect|tls -> security|tls (6 code examples) |
| 81 | +MOVED [ruby-driver]: write|bulk-write -> crud|bulk-write (9 code examples) |
| 82 | +MOVED [database-tools]: installation|verify -> verify (0 code examples) |
| 83 | +``` |
| 84 | + |
| 85 | +### 2. MAYBE NEW PAGES |
| 86 | +``` |
| 87 | +=== MAYBE NEW PAGES === |
| 88 | +NEW [ruby-driver]: atlas-search (2 code examples) |
| 89 | +NEW [node]: integrations|prisma (4 code examples) |
| 90 | +NEW [atlas-architecture]: solutions-library|rag-technical-documents (6 code examples) |
| 91 | +``` |
| 92 | + |
| 93 | +### 3. MAYBE REMOVED PAGES |
| 94 | +``` |
| 95 | +=== MAYBE REMOVED PAGES === |
| 96 | +REMOVED [ruby-driver]: common-errors (4 code examples) |
| 97 | +REMOVED [cpp-driver]: indexes|work-with-indexes (4 code examples) |
| 98 | +REMOVED [docs]: tutorial|install-mongodb-on-windows-unattended (11 code examples) |
| 99 | +``` |
| 100 | + |
| 101 | +### 4. NEW APPLIED USAGE EXAMPLES |
| 102 | +``` |
| 103 | +=== NEW APPLIED USAGE EXAMPLES === |
| 104 | +APPLIED USAGE [ruby-driver]: atlas-search (1 applied usage examples) |
| 105 | +APPLIED USAGE [node]: integrations|prisma (1 applied usage examples) |
| 106 | +APPLIED USAGE [pymongo]: data-formats|custom-types|type-codecs (1 applied usage examples) |
| 107 | +
|
| 108 | +Total new applied usage examples: 17 |
| 109 | +``` |
| 110 | + |
| 111 | +## Log Format Requirements |
| 112 | + |
| 113 | +The scripts expect log lines in the following formats: |
| 114 | + |
| 115 | +- Project context: `Project changes for <project-name>` |
| 116 | +- Page events: `Page removed: Page ID: <page-id>` or `Page created: Page ID: <page-id>` |
| 117 | +- Code examples: `Code example removed: Page ID: <page-id>, <count> code examples removed` |
| 118 | +- Applied usage: `Applied usage example added: Page ID: <page-id>, <count> new applied usage examples added` |
| 119 | + |
| 120 | +**Important**: The script tracks the current project context from "Project changes for" lines and associates all subsequent page events with that project until a new project context is encountered. |
0 commit comments