Skip to content
Abhishek Venkatakrishna edited this page Aug 23, 2025 · 2 revisions

PhantomTrace

Phantom Banner

Overview

PhantomTrace is a Rust-based tool and library for detecting and obfuscating sensitive data in text, logs, and streams. It supports common PCI/PII patterns (e.g., credit cards, SSNs, emails), multiple obfuscation methods, and structured outputs for analysis and auditing. It can be used as a CLI for files and pipelines, or as a Rust library within applications.

Capabilities

  • Pattern detection for credit card numbers, SSNs, email addresses, phone numbers, IP addresses, API keys, and AWS access keys.
  • Obfuscation methods:
    • Phantom (mask characters while preserving structure)
    • Vanish (remove entirely)
    • Mirror (deterministic token/hash)
    • Mask (replace with a fixed string)
    • Tokenize (stable token for the same input)
  • Severity-aware processing (Critical, High, Medium, Low) to prioritize sensitive data.
  • Outputs: text (default), JSON (with optional events and trace report), CSV (event rows), and trace report JSON.

Installation

  • From source:
    • git clone the repository
    • cargo build --release
  • Binaries are provided via GitHub Releases when available.

CLI Usage

Basic file processing:

  • phantomtrace -i input.log -o output.log

Generate default configuration:

  • phantomtrace --generate-config phantom_config.json

Use a custom configuration:

  • phantomtrace -i input.txt -o output.txt -c phantom_config.json

JSON output with trace report:

  • phantomtrace -i logs.txt -o clean.txt --trace-report --format json

Create a trace map alongside output:

  • phantomtrace -i app.log -o clean.log --create-trace-map

CLI flags of note:

  • --format: text | json | csv | trace-report
  • --trace-report: include a trace report in JSON outputs
  • --log-phantoms: include event lists in JSON outputs
  • --create-trace-map: produce a .tracemap JSON with counts and coverage
  • --generate-config : write a default JSON config to file

Configuration

PhantomTrace reads a JSON configuration to define:

  • Tracing rules (regex patterns, method, severity)
  • Processing options (batch size, overlaps, performance mode)
  • Output behavior (format, include reports, trace map)

Default rules include:

  • Credit cards (Critical)
  • SSNs (High)
  • Emails (High)
  • Phone numbers (Medium)
  • IPv4 addresses (Medium)
  • API keys (Critical)
  • AWS Access Keys (Critical)
  • Generic passwords (Critical)

Example custom rule: { "name": "custom_id", "pattern": "\bCUST-\d{6}\b", "method": "Phantom", "preserve_chars": 4, "severity": "Medium" }

Library Usage (Rust)

Basic example:

  • Use phantomtrace::{phantom_text, PhantomTraceConfig, PhantomTraceProcessor}
  • Call phantom_text(&str) to get an obfuscated string with defaults
  • Create PhantomTraceProcessor with PhantomTraceConfig::default() for advanced processing, access to events and stats

Outputs from processing include:

  • phantomed_text: String
  • phantom_events: Vec of events (rule, severity, original value, obfuscated value, position, trace_id)
  • processing stats: lines processed, lines phantomed, event totals, duration

Output Formats

  • Text: processed content only.
  • JSON: phantomed_text plus optional events and trace_report when enabled.
  • CSV: one row per event with rule name, severity, original/phantom values, start/end positions, trace_id.
  • TraceReport (JSON): totals, severity breakdown, rule-level stats, generation time.

Trace map (.tracemap) optional file contains:

  • Total events
  • Events by severity and by rule
  • Phantom coverage percentage (lines phantomed / lines processed)

Log and Stream Integration

PhantomTrace can be integrated before log ingestion systems or used in pipelines:

  • Process files produced by applications or agents.
  • Pipe stdin/stdout in shell pipelines for streaming logs.
  • The stream and TCP server modes in the repository’s code are oriented to real-time processing; ensure production readiness (e.g., proper error handling, retries) for your environment.

Example shell pipeline:

  • tail -F /var/log/app.log | phantomtrace --format text > /var/log/app.clean.log

Performance Notes

  • Throughput depends on the number and complexity of regex rules, batch size, and enabled reporting.
  • Configuration options like performance_mode and batch_size can influence speed and memory usage.
  • JSON/CSV/report generation adds overhead compared to plain text output.

Operational Guidance

  • Start with the default configuration and verify that built-in patterns match expected data.
  • Add custom rules conservatively and test regex performance.
  • For auditing scenarios, enable event logging and trace reports (JSON output).
  • For high-volume environments, benchmark with representative data and adjust batch size and reporting features.
  • Validate output does not leak original sensitive values, especially when customizing rules or methods.

Troubleshooting

  • Pattern not detected:
    • Check regex syntax and escaping (Rust regex engine).
    • Verify case sensitivity expectations (config has case_sensitive).
  • Slow processing:
    • Reduce number of active rules.
    • Increase batch_size.
    • Disable per-event logging and heavy reporting when not needed.
  • Regex compilation errors:
    • Test patterns in a Rust-compatible regex tool before adding to configs.

Security Considerations

  • Obfuscation is performed in-memory during processing; output should not contain original values for matched patterns.
  • The hashing/tokenization provided is intended for traceability and is not a cryptographic guarantee.
  • Review patterns and methods to align with internal policies and applicable regulations.

Project Information

Change Management

  • Releases may provide prebuilt binaries when available.
  • Some repository commits are backdated to reflect earlier milestones; the GitHub creation date reflects when the repository was published there.

Contact

  • For issues or contributions, use GitHub Issues and pull requests in the repository.
  • For commercial licensing inquiries, follow the guidance in the repository’s license section.