Skip to content

[Geneva Exporter] Optimize EncodedBatch memory usage for retry scenarios using Arc-based sharing #432

@lalitb

Description

@lalitb

Problem

Currently, EncodedBatch.upload_batch() requires cloning the entire compressed data vector on each upload attempt to preserve data for retry scenarios. This creates
unnecessary memory allocations and copies, especially for large compressed batches.

  // Current implementation requires expensive clone for retry capability
  pub async fn upload_batch(&self, batch: &EncodedBatch) -> Result<(), String> {
      self.uploader
          .upload(batch.data.clone(), &batch.event_name, &batch.metadata) // ← Full vector copy
          .await
  }

Impact

  • Memory overhead: Each retry attempt copies entire compressed batch data
  • Performance: O(n) memory allocation where n = compressed batch size
  • Scalability: Higher memory pressure under retry scenarios or high throughput

Proposed Solution

Replace Vec<u8> with Arc<Vec<u8>> in EncodedBatch to enable cheap reference sharing:

  #[derive(Debug, Clone)]
  pub struct EncodedBatch {
      pub event_name: String,
      pub data: Arc<Vec<u8>>,  // ← Shared ownership
      pub metadata: BatchMetadata,
  }

Benefits

  • Cheap clones: Arc clone is O(1) reference counting vs O(n) memory copy
  • Retry-friendly: Multiple references to same data without duplication
  • Memory efficient: Single allocation shared across retry attempts
  • API compatibility: Maintains existing retry semantics

Implementation Tasks

  1. Update EncodedBatch struct to use Arc<Vec<u8>>
  2. Modify OtlpEncoder::encode_log_batch() to wrap result in Arc
  3. Update uploader.upload() to handle Arc<Vec<u8>> input (with Arc::try_unwrap optimization)
  4. Update tests and benchmarks
  5. Consider backward compatibility if this is a breaking change

Alternative Considered

  • Consume ownership: Change API to upload_batch(batch: EncodedBatch) - but this breaks retry capability
  • Keep current clone: Simple but inefficient for large batches

Priority
Medium - Performance optimization that becomes more important with:

  • Large compressed batch sizes (>100KB)
  • High retry rates
  • High-throughput scenarios

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions