Skip to content

[ENHANCEMENT] Codebase Indexing: Separate collections per branch (disabled by default) #8565

@yavpungggi

Description

@yavpungggi

Problem (one or two sentences)

Problem

When working with multiple Git branches, the current codebase indexing system uses a single shared index for the entire workspace. This causes several issues:

  • Inaccurate search results: When switching branches, search results may show code that doesn't exist in the current branch
  • Team conflicts: Multiple developers working on different branches can overwrite each other's indexes
  • Manual re-indexing: Users need to manually trigger re-indexing after switching branches to ensure accuracy
  • Confusion: Search results don't match the actual code in the current branch, leading to wasted time and frustration

Alignment with Roadmap

  • Reliability First: Ensures consistent, accurate search results across branches
  • Enhanced User Experience: Reduces friction for developers working with multiple branches

Context (who is affected and when)

  • Developers who use Codebase Indexing and frequently switch Git branches in the same repository
  • Teams where multiple developers work on different branches of the same project (especially when sharing a Qdrant instance or dev container)
  • Situations where search results or code-context features rely on a single index that was built on a different branch
  • Common moments this shows up:
    • Immediately after switching branches
    • During active feature development across parallel branches
    • When re-indexing or background file watching occurs while changing branches

Desired behavior (conceptual, not technical)

  • Each Git branch has its own independent code index
  • When I switch branches, the extension automatically uses the index for the current branch
  • Search results and code-context always reflect the code in my current branch (no cross-branch contamination)
  • I don’t need to manually re-index after switching branches
  • The feature is optional (opt-in) and doesn’t change behavior for users who don’t need it

Constraints / preferences (optional)

  • Backward compatibility: disabled by default; no breaking changes for existing users
  • UX: a single clear toggle with an explanatory tooltip; non-intrusive and easy to discover in Advanced Configuration
  • Performance: no noticeable slowdown in search; branch switching should not block the UI; index creation should be on-demand
  • Storage: make users aware that enabling branch isolation increases storage (one index per branch)
  • Robustness: handle detached HEAD or non‑Git folders gracefully (fall back to a safe default)
  • Accessibility: labels and tooltips should be screen‑reader friendly and concise
  • Team usage: predictable behavior across machines; no accidental overwrites between branches

Request checklist

  • I've searched existing Issues and Discussions for duplicates
  • This describes a specific problem with clear context and impact

Roo Code Task Links (optional)

No response

Acceptance criteria (optional)

Scenario 1 — Opt-in is off by default

Given a fresh install or a workspace that has never set this option
When I open Codebase Indexing settings
Then the “Enable Branch Isolation” toggle is OFF
And saving without touching it keeps indexing behavior unchanged (single collection per workspace)
But switching Git branches does NOT change which collection is used

Scenario 2 — Enabling branch isolation persists and shows guidance

Given a workspace in a normal Git branch (e.g., main)
When I enable “Enable Branch Isolation” and click Save
Then the setting is persisted and remains enabled after reload
And a storage warning is displayed when the toggle is ON
But no errors are shown and no other unrelated settings are changed

Scenario 3 — Index uses branch-specific collection when enabled

Given “Enable Branch Isolation” is ON and the current branch is main
When indexing runs (initial scan or file watcher)
Then a branch-scoped collection is used with name containing “-br-main” (sanitized)
And search results are served from that branch’s collection
But results from other branches do NOT appear

Scenario 4 — Switching branches selects the correct index

Given “Enable Branch Isolation” is ON and the index exists for main
When I switch to branch feature/user-auth and restart or trigger indexing
Then a new (or previously created) collection for feature/user-auth is used
And subsequent searches reflect only the code from feature/user-auth
But queries do NOT read from the main branch’s collection

Scenario 5 — Detached HEAD and non‑Git fallbacks

Given “Enable Branch Isolation” is ON
When the workspace is in a detached HEAD state or not a Git repo
Then indexing/search falls back to the workspace-only collection (no “-br-” suffix)
And the extension logs/telemetry indicate the fallback state (if logging is enabled)
But no runtime error or crash occurs and UI remains functional

Scenario 6 — Disabling reverts to workspace-only behavior

Given “Enable Branch Isolation” was previously ON and is now turned OFF and saved
When indexing or searching runs
Then the system uses the workspace-only collection (no branch suffix)
And previously created branch collections are not deleted automatically
But branch-specific collections are NOT read while the feature is OFF

Scenario 7 — Branch name sanitization

Given “Enable Branch Isolation” is ON
When the current branch contains special characters (e.g., feature/user-auth or release/v2.0.0)
Then the chosen collection name uses a sanitized, lowercase branch segment (e.g., feature-user-auth, release-v2-0-0)
And the sanitized segment is capped to the documented length limit
But invalid characters are not present in the final collection name

Scenario 8 — Persistence across reloads

Given “Enable Branch Isolation” is ON and the current branch is feature/a
When I reload the window or restart VS Code
Then the setting remains ON and searches index against the feature/a collection (after services initialize)
And no additional manual steps are required
But the system does not regress to a workspace-only collection unless I turn the feature OFF

Scenario 9 — Search correctness and isolation

Given two branches (main and feature/a), both indexed
When I am on main and perform a semantic search for a symbol that only exists on feature/a
Then no results from feature/a are returned
And the inverse is true when I am on feature/a
But cross-branch contamination of results does NOT occur

Scenario 10 — Performance and UX expectations

Given branch isolation is enabled
When I search or switch branches in a typical repository
Then perceived search latency is not materially worse than without isolation (within reasonable variance)
And the UI remains responsive; no blocking dialogs are introduced
But enabling the feature should not cause noticeable slowdowns beyond first-time indexing per branch

Proposed approach (optional)

Add an opt-in branch isolation feature that creates separate indexes for each Git branch.

Implementation Details

  • Add configuration option in Codebase Indexing settings (Advanced Configuration)
    • Persist in global state and message contract
  • Separate Qdrant collections per branch: ws-{hash}-br-{branch-name}
    • Sanitization: lowercase, non-alphanumerics → “-”, trim/collapse dashes, length cap
  • Automatic Git branch detection
    • Detect current branch on service init (graceful fallback on detached HEAD / non‑Git)
  • Clear storage warnings in UI
    • Tooltip + inline warning when enabled
    • i18n keys:
      settings:codeIndex.branchIsolation.enableLabel
      settings:codeIndex.branchIsolation.enableDescription
      settings:codeIndex.branchIsolation.storageWarning
  • Backward compatible (disabled by default)
    • No migration required; per-branch indexes are created on demand when enabled

Trade-offs / risks (optional)

Storage and operational overhead

  • One collection per branch increases Qdrant storage proportionally to the number of branches used locally
  • More collections to manage/clean up; stale collections may accumulate after short-lived feature branches
  • Risk of hitting Qdrant limits in large repos or long-running workspaces

Mitigations:

  • Provide a “Clean old branch indexes” action

Backward compatibility and migration

  • Existing workspace-level indexes remain; enabling isolation does not migrate prior data
  • Users may need to re-index per branch for best results

Branch rename

Mitigations:

  • Document how to clean up or migrate manually
  • Consider a “rename/cleanup” maintenance command

Metadata

Metadata

Assignees

Labels

Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.enhancementNew feature or request

Type

No type

Projects

Status

Issue [In Progress]

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions