Skip to content

Conversation

@xinhaoz
Copy link
Member

@xinhaoz xinhaoz commented Oct 29, 2025

ValueStoragePolicy in SpanPolicy is now a struct defining the policy adjustment type (defaultPolicy, noValSeparation, override) and the override value for the minimum size used for value separation. Compactions now use the ValueStoragePolicy returned by the span policy for a table's key range to decide whether we can preserve blob references or not.

Previously, we would rewrite blob references if a compaction contained tables with mixed value storage policies, even if those policies were unchanged.

We now preserve blob references if:

  • The maximum blob reference depth is not exceeded.
  • For each input table, the value separation policy used when writing the table matches the current value separation policy applied to the table span.

db, valsep: do not separate mvcc values if DisableSeparationBySuffix is true

Move this field into the ValueStoragePolicy struct and make sure it's being read
at the time of value separation.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@xinhaoz xinhaoz force-pushed the value-storage-policy branch 3 times, most recently from df13ac1 to 0766ad3 Compare October 29, 2025 21:41
@xinhaoz xinhaoz marked this pull request as ready for review October 29, 2025 21:41
@xinhaoz xinhaoz requested a review from a team as a code owner October 29, 2025 21:41
@xinhaoz xinhaoz requested a review from sumeerbhola October 29, 2025 21:41
Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

@jbowens reviewed 11 of 11 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @sumeerbhola)

@xinhaoz xinhaoz force-pushed the value-storage-policy branch 2 times, most recently from 7c7791a to e809b40 Compare November 3, 2025 15:45
@xinhaoz
Copy link
Member Author

xinhaoz commented Nov 3, 2025

@jbowens I added another commit after your review to address the TODO on using the DisableValueSeparationBySuffix flag if you wouldn't mind taking another look 🙏

@xinhaoz xinhaoz requested a review from jbowens November 3, 2025 16:24
@xinhaoz xinhaoz force-pushed the value-storage-policy branch from e809b40 to 62b3e37 Compare November 3, 2025 19:42
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 of 12 files reviewed, 2 unresolved discussions (waiting on @jbowens)


options.go line 1323 at r2 (raw file):

	// keys (where the smallest suffix is the latest version), but should be
	// disabled for keys where the suffix does not correspond to a version.
	DisableSeparationBySuffix bool

So this is not related to the PolicyAdjustment? Can we lift this above the PolicyAdjustment so that the PolicyAdjustment and its fields are grouped?

In practice, in CockroachDB, for the lock table key space, do we expect to do
{DisableSeparationBySuffix: false, PolicyAdjustment: NoValueSeparation}?


options.go line 1346 at r2 (raw file):

	// Override indicates value retrieval can tolerate
	// additional latency, so Pebble should aggressively prefer storing values
	// separately if it can reduce write amplification.

(I am only looking at the policy configuration, and leaving the rest of the review to @jbowens)

I'm a bit confused by the language here. The term Override by itself suggests it can be any kind of override (except for the NoValueSeparation override), i.e., do more or less separation. But the "... can tolerate additional latency ..." implies it is only overriding to do more separation.

Then the "may chooise to separate values under the Override policy even ..." seems redundant. Presumably all that we are trying to say is that if the global options disables value separation, the override is ignored, but if value separation is globally enabled, we will do whatever the override says. Is my understanding accurate?

@xinhaoz xinhaoz force-pushed the value-storage-policy branch 2 times, most recently from e60c206 to a597f93 Compare November 3, 2025 21:22
ValueStoragePolicy in SpanPolicy is now a struct defining the policy adjustment
type (defaultPolicy, noValSeparation, override) and the override value for the
minimum size used for value separation. Compactions now use the
ValueStoragePolicy returned by the span policy for a table's key range to decide
whether we can preserve blob references or not.

Previously, we would rewrite blob references if a compaction contained tables
with mixed value storage policies, even if those policies were unchanged.

We now preserve blob references if:
- The maximum blob reference depth is not exceeded.
- For each input table, the value separation policy used when writing the table
matches the current value separation policy applied to the table span.
@xinhaoz xinhaoz force-pushed the value-storage-policy branch from a597f93 to a1be9b0 Compare November 3, 2025 21:37
Copy link
Member Author

@xinhaoz xinhaoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 4 of 13 files reviewed, 2 unresolved discussions (waiting on @jbowens and @sumeerbhola)


options.go line 1323 at r2 (raw file):
DisableSeparationBySuffix should also only apply on Override, so I moved it below and updated the comments.

In practice, in CockroachDB, for the lock table key space, do we expect to do
{DisableSeparationBySuffix: false, PolicyAdjustment: NoValueSeparation}?

If the PolicyAdjustment is NoValueSeparation, then the value of DisableSeparationBySuffix is not relevant and we never separate values. We'd need to make some further changes if we intend to allow only separating mvcc values.

We also don't currenty persist whether a table had DisableSeparationBySuffix on when writing sstables, which we'd need to read to figure out if the span policy has changed during compactions. Left a TODO and can do that in a followup.


options.go line 1346 at r2 (raw file):

Previously, sumeerbhola wrote…

(I am only looking at the policy configuration, and leaving the rest of the review to @jbowens)

I'm a bit confused by the language here. The term Override by itself suggests it can be any kind of override (except for the NoValueSeparation override), i.e., do more or less separation. But the "... can tolerate additional latency ..." implies it is only overriding to do more separation.

Then the "may chooise to separate values under the Override policy even ..." seems redundant. Presumably all that we are trying to say is that if the global options disables value separation, the override is ignored, but if value separation is globally enabled, we will do whatever the override says. Is my understanding accurate?

Yeah that's accuate - updated comment!

@xinhaoz xinhaoz force-pushed the value-storage-policy branch from a1be9b0 to 7147abb Compare November 3, 2025 21:43
Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Reviewable status: 4 of 13 files reviewed, 2 unresolved discussions (waiting on @jbowens)

…is true

Move this field into the ValueStoragePolicy struct and make sure it's being read
at the time of value separation.
@xinhaoz xinhaoz force-pushed the value-storage-policy branch from 7147abb to dc4fa9a Compare November 4, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants