diff --git a/audit/gdcd/README.md b/audit/gdcd/README.md index c69435d..f7e62c9 100644 --- a/audit/gdcd/README.md +++ b/audit/gdcd/README.md @@ -123,6 +123,22 @@ The progress bar should immediately output to console and continue to display pr projects are parsed. Depending on your machine and the amount of projects specified, this can be a long-running program (~1-2hrs ). +## Reviewing logs + +GDCD outputs logs to the local device's `logs` directory. The logs contain information about project events, including: + +- New pages +- Removed pages +- Updated pages (where updates refer to changes to code examples or page keywords) +- Code example count changes +- New applied usage examples +- Project summaries and any issues with the data + +GDCD's handling for moved pages is currently very restrictive, and often misses pages that have been moved, counting +them as separate removed and new page entries. As a stopgap for more accurate moved page handling, we have provided +a script to parse the logs and summarize moved/new/removed pages and their associated code examples. Refer to the +`scripts` directory for more details. + ## Troubleshooting ### Permission Issues ```text diff --git a/audit/gdcd/scripts/README.md b/audit/gdcd/scripts/README.md new file mode 100644 index 0000000..b3413d3 --- /dev/null +++ b/audit/gdcd/scripts/README.md @@ -0,0 +1,120 @@ +# Log Parser Scripts + +This directory contains a script to parse GDCD log files and analyze page changes, specifically identifying moved pages vs truly new/removed pages and tracking applied usage examples. + +## Files + +- `parse-log.go` - Main Go script that performs the log parsing and analysis +- `README.md` - This documentation file + +## Purpose + +The script analyzes log files to distinguish between: + +1. **Moved Pages**: Pages that appear to be removed and created but are actually the same page moved to a new location within the same project +2. **Maybe New Pages**: Pages that may be genuinely new additions +3. **Maybe Removed Pages**: Pages that may be genuinely removed (not moved) +4. **Applied Usage Examples**: New applied usage examples on maybe new pages only + +All results are reported with **project context** to clearly show which project each page belongs to. + +## Dependencies + +- Go + +## How It Works + +### Page Movement Detection + +A page is considered "moved" if **all three conditions** are met: + +1. **Same Project**: The removed page and created page are in the same project +2. **Same Code Example Count**: The removed page and created page have the same number of code examples +3. **Shared Segment**: At least one segment of the page ID (separated by `|`) is the same between the removed and created pages + +For example: +- In project `ruby-driver`: `connect|tls` (removed, 6 code examples) → `security|tls` (created, 6 code examples) +- Same project AND same code examples AND shared segment `tls` → **MOVED** + +### Applied Usage Examples Filtering + +Applied usage examples are only counted for truly new pages, not for moved pages. This prevents double-counting when pages are reorganized. + +### Maybe New and Maybe Removed Pages + +Some conditions may cause moved pages to not meet our criteria for "moved" pages: + +- Different number of code examples + - Example: `connect|tls` is a "maybe removed" page and `security|tls` is a "maybe new" page but the removed page has + 6 code examples and the created page has 7 code examples +- No shared segments in page IDs + - Example: `crud|update` is a "maybe removed" page and `write|upsert` is a "maybe new" page. Even if they have the same + number of code examples, they share no segments in their page IDs so we can't programmatically detect that they're + the same + +Because of these conditions, we can only say that a page is "maybe new" or "maybe removed" and not "moved". A human must +manually review the "maybe new" and "maybe removed" results to determine if the page is truly new or removed. If it's +moved, we must manually adjust the count of new applied usage examples to omit the applied usage examples from the +"maybe new" but actually moved page. + +## Usage + +**Important**: You must be in the scripts directory to run the Go script directly: + +```bash +# Navigate to the scripts directory first +cd /Your/Local/Filepath/code-example-tooling/audit/gdcd/scripts + +# Then run the Go script +go run parse-log.go ../logs/2025-09-24-18-01-30-app.log +go run parse-log.go /absolute/path/to/your/log/file.log +``` + +## Output Format + +The script produces four sections: + +### 1. MOVED PAGES +``` +=== MOVED PAGES === +MOVED [ruby-driver]: connect|tls -> security|tls (6 code examples) +MOVED [ruby-driver]: write|bulk-write -> crud|bulk-write (9 code examples) +MOVED [database-tools]: installation|verify -> verify (0 code examples) +``` + +### 2. MAYBE NEW PAGES +``` +=== MAYBE NEW PAGES === +NEW [ruby-driver]: atlas-search (2 code examples) +NEW [node]: integrations|prisma (4 code examples) +NEW [atlas-architecture]: solutions-library|rag-technical-documents (6 code examples) +``` + +### 3. MAYBE REMOVED PAGES +``` +=== MAYBE REMOVED PAGES === +REMOVED [ruby-driver]: common-errors (4 code examples) +REMOVED [cpp-driver]: indexes|work-with-indexes (4 code examples) +REMOVED [docs]: tutorial|install-mongodb-on-windows-unattended (11 code examples) +``` + +### 4. NEW APPLIED USAGE EXAMPLES +``` +=== NEW APPLIED USAGE EXAMPLES === +APPLIED USAGE [ruby-driver]: atlas-search (1 applied usage examples) +APPLIED USAGE [node]: integrations|prisma (1 applied usage examples) +APPLIED USAGE [pymongo]: data-formats|custom-types|type-codecs (1 applied usage examples) + +Total new applied usage examples: 17 +``` + +## Log Format Requirements + +The scripts expect log lines in the following formats: + +- Project context: `Project changes for ` +- Page events: `Page removed: Page ID: ` or `Page created: Page ID: ` +- Code examples: `Code example removed: Page ID: , code examples removed` +- Applied usage: `Applied usage example added: Page ID: , new applied usage examples added` + +**Important**: The script tracks the current project context from "Project changes for" lines and associates all subsequent page events with that project until a new project context is encountered. diff --git a/audit/gdcd/scripts/parse-log.go b/audit/gdcd/scripts/parse-log.go new file mode 100644 index 0000000..ef51024 --- /dev/null +++ b/audit/gdcd/scripts/parse-log.go @@ -0,0 +1,280 @@ +package main + +import ( + "bufio" + "fmt" + "log" + "os" + "regexp" + "strconv" + "strings" +) + +// PageEvent represents a page creation or removal event +type PageEvent struct { + Action string // "removed" or "created" + PageID string + Project string + CodeExamples int + AppliedUsage int +} + +// MovedPage represents a page that was moved from one location to another +type MovedPage struct { + FromID string + ToID string + Project string + CodeExamples int +} + +// AppliedUsageExample represents new applied usage examples on truly new pages +type AppliedUsageExample struct { + PageID string + Project string + Count int +} + +func main() { + if len(os.Args) != 2 { + fmt.Println("Usage: go run parse-log.go ") + fmt.Println("Example: go run parse-log.go ../logs/2025-09-24-18-01-30-app.log") + os.Exit(1) + } + + logFile := os.Args[1] + + file, err := os.Open(logFile) + if err != nil { + log.Fatalf("Error opening file: %v", err) + } + defer file.Close() + + // Regular expressions for parsing log lines + projectChangesRegex := regexp.MustCompile(`Project changes for (.+)`) + pageRemovedRegex := regexp.MustCompile(`Page removed: Page ID: (.+)`) + pageCreatedRegex := regexp.MustCompile(`Page created: Page ID: (.+)`) + codeExampleRemovedRegex := regexp.MustCompile(`Code example removed: Page ID: (.+), (\d+) code examples removed`) + codeExampleCreatedRegex := regexp.MustCompile(`Code example created: Page ID: (.+), (\d+) new code examples added`) + appliedUsageRegex := regexp.MustCompile(`Applied usage example added: Page ID: (.+), (\d+) new applied usage examples added`) + + removedPages := make(map[string]PageEvent) + createdPages := make(map[string]PageEvent) + appliedUsageMap := make(map[string]AppliedUsageExample) + + currentProject := "" + + scanner := bufio.NewScanner(file) + for scanner.Scan() { + line := scanner.Text() + + // Parse project changes line to track current project + if matches := projectChangesRegex.FindStringSubmatch(line); matches != nil { + currentProject = matches[1] + continue + } + + // Skip processing if we don't have a current project + if currentProject == "" { + continue + } + + // Parse page removed events + if matches := pageRemovedRegex.FindStringSubmatch(line); matches != nil { + pageID := matches[1] + key := currentProject + "|" + pageID + removedPages[key] = PageEvent{ + Action: "removed", + PageID: pageID, + Project: currentProject, + } + } + + // Parse page created events + if matches := pageCreatedRegex.FindStringSubmatch(line); matches != nil { + pageID := matches[1] + key := currentProject + "|" + pageID + createdPages[key] = PageEvent{ + Action: "created", + PageID: pageID, + Project: currentProject, + } + } + + // Parse code example removed events + if matches := codeExampleRemovedRegex.FindStringSubmatch(line); matches != nil { + pageID := matches[1] + count, _ := strconv.Atoi(matches[2]) + key := currentProject + "|" + pageID + if page, exists := removedPages[key]; exists { + page.CodeExamples = count + removedPages[key] = page + } + } + + // Parse code example created events + if matches := codeExampleCreatedRegex.FindStringSubmatch(line); matches != nil { + pageID := matches[1] + count, _ := strconv.Atoi(matches[2]) + key := currentProject + "|" + pageID + if page, exists := createdPages[key]; exists { + page.CodeExamples = count + createdPages[key] = page + } + } + + // Parse applied usage example events + if matches := appliedUsageRegex.FindStringSubmatch(line); matches != nil { + pageID := matches[1] + count, _ := strconv.Atoi(matches[2]) + key := currentProject + "|" + pageID + appliedUsageMap[key] = AppliedUsageExample{ + PageID: pageID, + Project: currentProject, + Count: count, + } + } + } + + if err := scanner.Err(); err != nil { + log.Fatalf("Error reading file: %v", err) + } + + // Identify moved pages + movedPages := []MovedPage{} + trulyRemovedPages := []PageEvent{} + trulyCreatedPages := []PageEvent{} + + // Create a copy of removed pages to track which ones we've processed + unprocessedRemoved := make(map[string]PageEvent) + for k, v := range removedPages { + unprocessedRemoved[k] = v + } + + // Check each created page to see if it matches a removed page within the same project + for _, createdPage := range createdPages { + matchFound := false + + for removedKey, removedPage := range unprocessedRemoved { + // Only consider matches within the same project + if removedPage.Project == createdPage.Project && + isPageMoved(removedPage.PageID, createdPage.PageID, removedPage.CodeExamples, createdPage.CodeExamples) { + // This is a moved page + movedPages = append(movedPages, MovedPage{ + FromID: removedPage.PageID, + ToID: createdPage.PageID, + Project: createdPage.Project, + CodeExamples: createdPage.CodeExamples, + }) + + // Remove from unprocessed + delete(unprocessedRemoved, removedKey) + matchFound = true + break + } + } + + if !matchFound { + // This is a truly new page + trulyCreatedPages = append(trulyCreatedPages, createdPage) + } + } + + // Remaining unprocessed removed pages are truly removed + for _, removedPage := range unprocessedRemoved { + trulyRemovedPages = append(trulyRemovedPages, removedPage) + } + + // Print results + printResults(movedPages, trulyCreatedPages, trulyRemovedPages, appliedUsageMap) +} + +// isPageMoved checks if a removed page and created page represent the same page that was moved +func isPageMoved(removedID, createdID string, removedCodeExamples, createdCodeExamples int) bool { + // Both conditions must be true: + // 1. Same number of code examples + // 2. At least one segment of the page ID is the same + + if removedCodeExamples != createdCodeExamples { + return false + } + + removedSegments := strings.Split(removedID, "|") + createdSegments := strings.Split(createdID, "|") + + // Check if any segment matches + for _, removedSegment := range removedSegments { + for _, createdSegment := range createdSegments { + if removedSegment == createdSegment { + return true + } + } + } + + return false +} + +func printResults(movedPages []MovedPage, trulyCreatedPages []PageEvent, trulyRemovedPages []PageEvent, appliedUsageMap map[string]AppliedUsageExample) { + fmt.Println("=== MOVED PAGES ===") + if len(movedPages) == 0 { + fmt.Println("No moved pages found.") + } else { + for _, moved := range movedPages { + fmt.Printf("MOVED [%s]: %s -> %s (%d code examples)\n", moved.Project, moved.FromID, moved.ToID, moved.CodeExamples) + } + } + + /* If a page doesn't meet our criteria for being a "moved page", it's a "maybe new" or "maybe removed" page. It may + * be a completely renamed existing page where no segment of the ID matches i.e. `crud|update` being renamed + * `write|upsert` - or a moved page with some matching segment element but with a different number of code examples. + * If so, it would not meet our criteria for being a "moved" page. Compare these pages with the "maybe removed pages" + * to determine if they're truly new or removed. + */ + fmt.Println("\n=== MAYBE NEW PAGES ===") + if len(trulyCreatedPages) == 0 { + fmt.Println("No maybe new pages found.") + } else { + for _, created := range trulyCreatedPages { + fmt.Printf("NEW [%s]: %s (%d total code examples)\n", created.Project, created.PageID, created.CodeExamples) + } + } + + fmt.Println("\n=== MAYBE REMOVED PAGES ===") + if len(trulyRemovedPages) == 0 { + fmt.Println("No maybe removed pages found.") + } else { + for _, removed := range trulyRemovedPages { + fmt.Printf("REMOVED [%s]: %s (%d total code examples)\n", removed.Project, removed.PageID, removed.CodeExamples) + } + } + + // If a page is maybe new, we want to check if it has any new applied usage examples. If so, we want to report those. + fmt.Println("\n=== NEW APPLIED USAGE EXAMPLES ===") + + // Filter applied usage examples to only include maybe new pages + trulyNewAppliedUsage := []AppliedUsageExample{} + totalNewAppliedUsage := 0 + + // Create a set of moved page destination keys for quick lookup + movedPageDestinations := make(map[string]bool) + for _, moved := range movedPages { + key := moved.Project + "|" + moved.ToID + movedPageDestinations[key] = true + } + + for key, usage := range appliedUsageMap { + // Only include if this page is maybe new (not moved) + if !movedPageDestinations[key] { + trulyNewAppliedUsage = append(trulyNewAppliedUsage, usage) + totalNewAppliedUsage += usage.Count + } + } + + if len(trulyNewAppliedUsage) == 0 { + fmt.Println("No new applied usage examples on maybe new pages found.") + } else { + for _, usage := range trulyNewAppliedUsage { + fmt.Printf("APPLIED USAGE [%s]: %s (%d applied usage examples)\n", usage.Project, usage.PageID, usage.Count) + } + fmt.Printf("\nTotal new applied usage examples: %d\n", totalNewAppliedUsage) + } +}