Home

PhantomTrace

Phantom Banner

Overview

PhantomTrace is a Rust-based tool and library for detecting and obfuscating sensitive data in text, logs, and streams. It supports common PCI/PII patterns (e.g., credit cards, SSNs, emails), multiple obfuscation methods, and structured outputs for analysis and auditing. It can be used as a CLI for files and pipelines, or as a Rust library within applications.

Capabilities

Pattern detection for credit card numbers, SSNs, email addresses, phone numbers, IP addresses, API keys, and AWS access keys.
Obfuscation methods:
- Phantom (mask characters while preserving structure)
- Vanish (remove entirely)
- Mirror (deterministic token/hash)
- Mask (replace with a fixed string)
- Tokenize (stable token for the same input)
Severity-aware processing (Critical, High, Medium, Low) to prioritize sensitive data.
Outputs: text (default), JSON (with optional events and trace report), CSV (event rows), and trace report JSON.

Installation

From source:
- git clone the repository
- cargo build --release
Binaries are provided via GitHub Releases when available.

CLI Usage

Basic file processing:

phantomtrace -i input.log -o output.log

Generate default configuration:

phantomtrace --generate-config phantom_config.json

Use a custom configuration:

phantomtrace -i input.txt -o output.txt -c phantom_config.json

JSON output with trace report:

phantomtrace -i logs.txt -o clean.txt --trace-report --format json

Create a trace map alongside output:

phantomtrace -i app.log -o clean.log --create-trace-map

CLI flags of note:

--format: text | json | csv | trace-report
--trace-report: include a trace report in JSON outputs
--log-phantoms: include event lists in JSON outputs
--create-trace-map: produce a .tracemap JSON with counts and coverage
--generate-config : write a default JSON config to file

Configuration

PhantomTrace reads a JSON configuration to define:

Tracing rules (regex patterns, method, severity)
Processing options (batch size, overlaps, performance mode)
Output behavior (format, include reports, trace map)

Default rules include:

Credit cards (Critical)
SSNs (High)
Emails (High)
Phone numbers (Medium)
IPv4 addresses (Medium)
API keys (Critical)
AWS Access Keys (Critical)
Generic passwords (Critical)

Example custom rule: { "name": "custom_id", "pattern": "\bCUST-\d{6}\b", "method": "Phantom", "preserve_chars": 4, "severity": "Medium" }

Library Usage (Rust)

Basic example:

Use phantomtrace::{phantom_text, PhantomTraceConfig, PhantomTraceProcessor}
Call phantom_text(&str) to get an obfuscated string with defaults
Create PhantomTraceProcessor with PhantomTraceConfig::default() for advanced processing, access to events and stats

Outputs from processing include:

phantomed_text: String
phantom_events: Vec of events (rule, severity, original value, obfuscated value, position, trace_id)
processing stats: lines processed, lines phantomed, event totals, duration

Output Formats

Text: processed content only.
JSON: phantomed_text plus optional events and trace_report when enabled.
CSV: one row per event with rule name, severity, original/phantom values, start/end positions, trace_id.
TraceReport (JSON): totals, severity breakdown, rule-level stats, generation time.

Trace map (.tracemap) optional file contains:

Total events
Events by severity and by rule
Phantom coverage percentage (lines phantomed / lines processed)

Log and Stream Integration

PhantomTrace can be integrated before log ingestion systems or used in pipelines:

Process files produced by applications or agents.
Pipe stdin/stdout in shell pipelines for streaming logs.
The stream and TCP server modes in the repository’s code are oriented to real-time processing; ensure production readiness (e.g., proper error handling, retries) for your environment.

Example shell pipeline:

tail -F /var/log/app.log | phantomtrace --format text > /var/log/app.clean.log

Performance Notes

Throughput depends on the number and complexity of regex rules, batch size, and enabled reporting.
Configuration options like performance_mode and batch_size can influence speed and memory usage.
JSON/CSV/report generation adds overhead compared to plain text output.

Operational Guidance

Start with the default configuration and verify that built-in patterns match expected data.
Add custom rules conservatively and test regex performance.
For auditing scenarios, enable event logging and trace reports (JSON output).
For high-volume environments, benchmark with representative data and adjust batch size and reporting features.
Validate output does not leak original sensitive values, especially when customizing rules or methods.

Troubleshooting

Pattern not detected:
- Check regex syntax and escaping (Rust regex engine).
- Verify case sensitivity expectations (config has case_sensitive).
Slow processing:
- Reduce number of active rules.
- Increase batch_size.
- Disable per-event logging and heavy reporting when not needed.
Regex compilation errors:
- Test patterns in a Rust-compatible regex tool before adding to configs.

Security Considerations

Obfuscation is performed in-memory during processing; output should not contain original values for matched patterns.
The hashing/tokenization provided is intended for traceability and is not a cryptographic guarantee.
Review patterns and methods to align with internal policies and applicable regulations.

Project Information

Language: Rust
CLI binary name: phantomtrace
Library crate name: phantomtrace
License: MIT-style for non-commercial use (see LICENSE). Commercial use requires a separate agreement as stated in the repository documentation.
Repository: https://github.com/vabhishek6/PhantomTrace

Change Management

Releases may provide prebuilt binaries when available.
Some repository commits are backdated to reflect earlier milestones; the GitHub creation date reflects when the repository was published there.

Contact

For issues or contributions, use GitHub Issues and pull requests in the repository.
For commercial licensing inquiries, follow the guidance in the repository’s license section.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!