Skip to content

Commit 1f8e850

Browse files
authored
Merge pull request #80 from dacharyc/add-log-parsing-script
Add a script to parse GDCD logs and try to identify moved pages
2 parents 1eca8da + 43405f5 commit 1f8e850

File tree

3 files changed

+416
-0
lines changed

3 files changed

+416
-0
lines changed

audit/gdcd/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,22 @@ The progress bar should immediately output to console and continue to display pr
123123
projects are parsed. Depending on your machine and the amount of projects specified, this can be a
124124
long-running program (~1-2hrs ).
125125

126+
## Reviewing logs
127+
128+
GDCD outputs logs to the local device's `logs` directory. The logs contain information about project events, including:
129+
130+
- New pages
131+
- Removed pages
132+
- Updated pages (where updates refer to changes to code examples or page keywords)
133+
- Code example count changes
134+
- New applied usage examples
135+
- Project summaries and any issues with the data
136+
137+
GDCD's handling for moved pages is currently very restrictive, and often misses pages that have been moved, counting
138+
them as separate removed and new page entries. As a stopgap for more accurate moved page handling, we have provided
139+
a script to parse the logs and summarize moved/new/removed pages and their associated code examples. Refer to the
140+
`scripts` directory for more details.
141+
126142
## Troubleshooting
127143
### Permission Issues
128144
```text

audit/gdcd/scripts/README.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Log Parser Scripts
2+
3+
This directory contains a script to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples.
4+
5+
## Files
6+
7+
- `parse-log.go` - Main Go script that performs the log parsing and analysis
8+
- `README.md` - This documentation file
9+
10+
## Purpose
11+
12+
The script analyzes log files to distinguish between:
13+
14+
1. **Moved Pages**: Pages that appear to be removed and created but are actually the same page moved to a new location within the same project
15+
2. **Maybe New Pages**: Pages that may be genuinely new additions
16+
3. **Maybe Removed Pages**: Pages that may be genuinely removed (not moved)
17+
4. **Applied Usage Examples**: New applied usage examples on maybe new pages only
18+
19+
All results are reported with **project context** to clearly show which project each page belongs to.
20+
21+
## Dependencies
22+
23+
- Go
24+
25+
## How It Works
26+
27+
### Page Movement Detection
28+
29+
A page is considered "moved" if **all three conditions** are met:
30+
31+
1. **Same Project**: The removed page and created page are in the same project
32+
2. **Same Code Example Count**: The removed page and created page have the same number of code examples
33+
3. **Shared Segment**: At least one segment of the page ID (separated by `|`) is the same between the removed and created pages
34+
35+
For example:
36+
- In project `ruby-driver`: `connect|tls` (removed, 6 code examples) → `security|tls` (created, 6 code examples)
37+
- Same project AND same code examples AND shared segment `tls`**MOVED**
38+
39+
### Applied Usage Examples Filtering
40+
41+
Applied usage examples are only counted for truly new pages, not for moved pages. This prevents double-counting when pages are reorganized.
42+
43+
### Maybe New and Maybe Removed Pages
44+
45+
Some conditions may cause moved pages to not meet our criteria for "moved" pages:
46+
47+
- Different number of code examples
48+
- Example: `connect|tls` is a "maybe removed" page and `security|tls` is a "maybe new" page but the removed page has
49+
6 code examples and the created page has 7 code examples
50+
- No shared segments in page IDs
51+
- Example: `crud|update` is a "maybe removed" page and `write|upsert` is a "maybe new" page. Even if they have the same
52+
number of code examples, they share no segments in their page IDs so we can't programmatically detect that they're
53+
the same
54+
55+
Because of these conditions, we can only say that a page is "maybe new" or "maybe removed" and not "moved". A human must
56+
manually review the "maybe new" and "maybe removed" results to determine if the page is truly new or removed. If it's
57+
moved, we must manually adjust the count of new applied usage examples to omit the applied usage examples from the
58+
"maybe new" but actually moved page.
59+
60+
## Usage
61+
62+
**Important**: You must be in the scripts directory to run the Go script directly:
63+
64+
```bash
65+
# Navigate to the scripts directory first
66+
cd /Your/Local/Filepath/code-example-tooling/audit/gdcd/scripts
67+
68+
# Then run the Go script
69+
go run parse-log.go ../logs/2025-09-24-18-01-30-app.log
70+
go run parse-log.go /absolute/path/to/your/log/file.log
71+
```
72+
73+
## Output Format
74+
75+
The script produces four sections:
76+
77+
### 1. MOVED PAGES
78+
```
79+
=== MOVED PAGES ===
80+
MOVED [ruby-driver]: connect|tls -> security|tls (6 code examples)
81+
MOVED [ruby-driver]: write|bulk-write -> crud|bulk-write (9 code examples)
82+
MOVED [database-tools]: installation|verify -> verify (0 code examples)
83+
```
84+
85+
### 2. MAYBE NEW PAGES
86+
```
87+
=== MAYBE NEW PAGES ===
88+
NEW [ruby-driver]: atlas-search (2 code examples)
89+
NEW [node]: integrations|prisma (4 code examples)
90+
NEW [atlas-architecture]: solutions-library|rag-technical-documents (6 code examples)
91+
```
92+
93+
### 3. MAYBE REMOVED PAGES
94+
```
95+
=== MAYBE REMOVED PAGES ===
96+
REMOVED [ruby-driver]: common-errors (4 code examples)
97+
REMOVED [cpp-driver]: indexes|work-with-indexes (4 code examples)
98+
REMOVED [docs]: tutorial|install-mongodb-on-windows-unattended (11 code examples)
99+
```
100+
101+
### 4. NEW APPLIED USAGE EXAMPLES
102+
```
103+
=== NEW APPLIED USAGE EXAMPLES ===
104+
APPLIED USAGE [ruby-driver]: atlas-search (1 applied usage examples)
105+
APPLIED USAGE [node]: integrations|prisma (1 applied usage examples)
106+
APPLIED USAGE [pymongo]: data-formats|custom-types|type-codecs (1 applied usage examples)
107+
108+
Total new applied usage examples: 17
109+
```
110+
111+
## Log Format Requirements
112+
113+
The scripts expect log lines in the following formats:
114+
115+
- Project context: `Project changes for <project-name>`
116+
- Page events: `Page removed: Page ID: <page-id>` or `Page created: Page ID: <page-id>`
117+
- Code examples: `Code example removed: Page ID: <page-id>, <count> code examples removed`
118+
- Applied usage: `Applied usage example added: Page ID: <page-id>, <count> new applied usage examples added`
119+
120+
**Important**: The script tracks the current project context from "Project changes for" lines and associates all subsequent page events with that project until a new project context is encountered.

0 commit comments

Comments
 (0)