-
Notifications
You must be signed in to change notification settings - Fork 0
Home
PhantomTrace is a Rust-based tool and library for detecting and obfuscating sensitive data in text, logs, and streams. It supports common PCI/PII patterns (e.g., credit cards, SSNs, emails), multiple obfuscation methods, and structured outputs for analysis and auditing. It can be used as a CLI for files and pipelines, or as a Rust library within applications.
- Pattern detection for credit card numbers, SSNs, email addresses, phone numbers, IP addresses, API keys, and AWS access keys.
- Obfuscation methods:
- Phantom (mask characters while preserving structure)
- Vanish (remove entirely)
- Mirror (deterministic token/hash)
- Mask (replace with a fixed string)
- Tokenize (stable token for the same input)
- Severity-aware processing (Critical, High, Medium, Low) to prioritize sensitive data.
- Outputs: text (default), JSON (with optional events and trace report), CSV (event rows), and trace report JSON.
- From source:
- git clone the repository
- cargo build --release
- Binaries are provided via GitHub Releases when available.
Basic file processing:
- phantomtrace -i input.log -o output.log
Generate default configuration:
- phantomtrace --generate-config phantom_config.json
Use a custom configuration:
- phantomtrace -i input.txt -o output.txt -c phantom_config.json
JSON output with trace report:
- phantomtrace -i logs.txt -o clean.txt --trace-report --format json
Create a trace map alongside output:
- phantomtrace -i app.log -o clean.log --create-trace-map
CLI flags of note:
- --format: text | json | csv | trace-report
- --trace-report: include a trace report in JSON outputs
- --log-phantoms: include event lists in JSON outputs
- --create-trace-map: produce a .tracemap JSON with counts and coverage
- --generate-config : write a default JSON config to file
PhantomTrace reads a JSON configuration to define:
- Tracing rules (regex patterns, method, severity)
- Processing options (batch size, overlaps, performance mode)
- Output behavior (format, include reports, trace map)
Default rules include:
- Credit cards (Critical)
- SSNs (High)
- Emails (High)
- Phone numbers (Medium)
- IPv4 addresses (Medium)
- API keys (Critical)
- AWS Access Keys (Critical)
- Generic passwords (Critical)
Example custom rule: { "name": "custom_id", "pattern": "\bCUST-\d{6}\b", "method": "Phantom", "preserve_chars": 4, "severity": "Medium" }
Basic example:
- Use phantomtrace::{phantom_text, PhantomTraceConfig, PhantomTraceProcessor}
- Call phantom_text(&str) to get an obfuscated string with defaults
- Create PhantomTraceProcessor with PhantomTraceConfig::default() for advanced processing, access to events and stats
Outputs from processing include:
- phantomed_text: String
- phantom_events: Vec of events (rule, severity, original value, obfuscated value, position, trace_id)
- processing stats: lines processed, lines phantomed, event totals, duration
- Text: processed content only.
- JSON: phantomed_text plus optional events and trace_report when enabled.
- CSV: one row per event with rule name, severity, original/phantom values, start/end positions, trace_id.
- TraceReport (JSON): totals, severity breakdown, rule-level stats, generation time.
Trace map (.tracemap) optional file contains:
- Total events
- Events by severity and by rule
- Phantom coverage percentage (lines phantomed / lines processed)
PhantomTrace can be integrated before log ingestion systems or used in pipelines:
- Process files produced by applications or agents.
- Pipe stdin/stdout in shell pipelines for streaming logs.
- The stream and TCP server modes in the repository’s code are oriented to real-time processing; ensure production readiness (e.g., proper error handling, retries) for your environment.
Example shell pipeline:
- tail -F /var/log/app.log | phantomtrace --format text > /var/log/app.clean.log
- Throughput depends on the number and complexity of regex rules, batch size, and enabled reporting.
- Configuration options like performance_mode and batch_size can influence speed and memory usage.
- JSON/CSV/report generation adds overhead compared to plain text output.
- Start with the default configuration and verify that built-in patterns match expected data.
- Add custom rules conservatively and test regex performance.
- For auditing scenarios, enable event logging and trace reports (JSON output).
- For high-volume environments, benchmark with representative data and adjust batch size and reporting features.
- Validate output does not leak original sensitive values, especially when customizing rules or methods.
- Pattern not detected:
- Check regex syntax and escaping (Rust regex engine).
- Verify case sensitivity expectations (config has case_sensitive).
- Slow processing:
- Reduce number of active rules.
- Increase batch_size.
- Disable per-event logging and heavy reporting when not needed.
- Regex compilation errors:
- Test patterns in a Rust-compatible regex tool before adding to configs.
- Obfuscation is performed in-memory during processing; output should not contain original values for matched patterns.
- The hashing/tokenization provided is intended for traceability and is not a cryptographic guarantee.
- Review patterns and methods to align with internal policies and applicable regulations.
- Language: Rust
- CLI binary name: phantomtrace
- Library crate name: phantomtrace
- License: MIT-style for non-commercial use (see LICENSE). Commercial use requires a separate agreement as stated in the repository documentation.
- Repository: https://github.com/vabhishek6/PhantomTrace
- Releases may provide prebuilt binaries when available.
- Some repository commits are backdated to reflect earlier milestones; the GitHub creation date reflects when the repository was published there.
- For issues or contributions, use GitHub Issues and pull requests in the repository.
- For commercial licensing inquiries, follow the guidance in the repository’s license section.