Skip to content

Conversation

kbatuigas
Copy link
Contributor

@kbatuigas kbatuigas commented Apr 8, 2025

Description

PR to add to Cloud docs: redpanda-data/cloud-docs#260

This pull request introduces significant enhancements to Iceberg integration in Redpanda, including new documentation on supported Iceberg modes, updates to existing Iceberg-related pages, and improvements to the Schema Registry documentation. The changes aim to provide clearer guidance on configuring and using Iceberg modes, enhance usability, and ensure consistency across documentation.

Iceberg Integration Enhancements:

  • Added a new page, choose-iceberg-mode.adoc, detailing supported Iceberg modes (key_value, value_schema_id_prefix, value_schema_latest, and disabled), their configurations, and how they translate to table formats. This page provides examples and explains schema translation for Avro and Protobuf data.
  • Updated navigation in nav.adoc to include a link to the new "Choose Iceberg Mode" page.

Documentation Updates:

  • Revised the release notes in redpanda.adoc to list new features for Iceberg-enabled topics, such as custom partitioning, snapshot expiry, dead-letter queues, schema evolution, and structured Iceberg tables for Avro/Protobuf data without Schema Registry wire format.
  • Updated about-iceberg-topics.adoc to reflect changes in supported Iceberg modes and removed outdated details about custom partitioning. Added a cross-reference to the new "Choose Iceberg Mode" page. [1] [2] [3]
  • Modified query-iceberg-topics.adoc to reference the new "Choose Iceberg Mode" page for clarity on consuming Iceberg topics.

Schema Registry Documentation:

  • Expanded schema-reg-overview.adoc with a new section on serialization and deserialization, explaining the Schema Registry wire format and its role in message processing.

Resolves https://redpandadata.atlassian.net/browse/
Review deadline: 10 April

Page previews

Choose an Iceberg Mode

Checks

  • New feature
  • Content gap
  • Support Follow-up
  • Small fix (typos, links, copyedits, etc)

Summary by CodeRabbit

  • New Features

    • Added documentation for a new Iceberg integration mode, value_schema_latest, enabling Iceberg table creation from the latest schema in the Schema Registry without requiring the wire format.
    • Expanded documentation on handling Avro and Protobuf data in Iceberg topics, including support for structured tables without Schema Registry wire format or SerDes.
  • Documentation

    • Updated navigation and added a new page detailing all supported Iceberg integration modes and their configurations.
    • Improved and clarified release notes and topic property documentation to reflect new Iceberg features and modes.
    • Enhanced Schema Registry documentation with a new section explaining the wire format for serialization and deserialization.
    • Streamlined and clarified Iceberg documentation, removing redundant environment-specific instructions and improving references and formatting throughout.

@kbatuigas kbatuigas requested a review from a team as a code owner April 8, 2025 13:31
Copy link

netlify bot commented Apr 8, 2025

Deploy Preview for redpanda-docs-preview ready!

Name Link
🔨 Latest commit b0d933c
🔍 Latest deploy log https://app.netlify.com/sites/redpanda-docs-preview/deploys/6802abb0eac6bb0008ea4057
😎 Deploy Preview https://deploy-preview-1068--redpanda-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

hyperlint-ai bot commented Apr 8, 2025

PR Change Summary

Enhanced Iceberg integration documentation in Redpanda with a focus on the new value_subject_latest mode and schema integration usage.

  • Expanded the list of features supported by Iceberg-enabled topics, including custom partitioning and schema evolution.
  • Introduced detailed descriptions and examples for the new value_subject_latest mode.
  • Updated guidance on using value_schema_id_prefix and value_subject_latest modes with Spark SQL queries.
  • Clarified syntax and usage for the value_subject_latest mode, including optional key-value pairs.

Modified Files

  • modules/get-started/pages/release-notes/redpanda.adoc
  • modules/manage/partials/iceberg/about-iceberg-topics.adoc
  • modules/manage/partials/iceberg/query-iceberg-topics.adoc
  • modules/reference/pages/properties/topic-properties.adoc

How can I customize these reviews?

Check out the Hyperlint AI Reviewer docs for more information on how to customize the review.

If you just want to ignore it on this PR, you can add the hyperlint-ignore label to the PR. Future changes won't trigger a Hyperlint review.

Note specifically for link checks, we only check the first 30 links in a file and we cache the results for several hours (for instance, if you just added a page, you might experience this). Our recommendation is to add hyperlint-ignore to the PR to ignore the link check for this PR.

@kbatuigas kbatuigas changed the title Iceberg value_subject_latest mode Iceberg value_schema_latest mode Apr 8, 2025
@@ -43,7 +43,7 @@ endif::[]
{"user_id": 2324, "event_type": "BUTTON_CLICK", "ts": "2024-11-25T20:23:59.380Z"}
----

=== Topic with schema (`value_schema_id_prefix` mode)
=== Topic with schema (`value_schema_id_prefix` or `value_schema_latest` mode)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes because rpk only produces using the schema registry wire format and the other mode is how to do it without the wire format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Tyler last week and agreed that a new section for value_schema_latest would be a nice to have for later if we want to demonstrate producing to a topic without using rpk

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a couple of small suggestions.


=== value_schema_latest

Creates an Iceberg table whose structure matches the latest schema registered for the subject in the Schema Registry. You must register a schema in the xref:manage:schema-reg/schema-reg-overview.adoc[Schema Registry]. Unlike the `value_schema_id_prefix` mode, `value_schema_latest` does not require that producers use the wire format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the latest schema is cached periodically. The cache period is defined by the cluster config iceberg_latest_schema_cache_ttl_ms which defaults to 5min

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have this config in our docs yet - we'll have to re-run our config script and double check that it gets pulled in.

[[override-value-schema-latest-default]]
=== Override `value_schema_latest` default

In `value_schema_latest` mode, only the string `value_schema_latest` is required in the property value. This sets `value_schema_latest` mode to its default behavior, which derives the subject for the topic using xref:manage:schema-reg/schema-id-validation.adoc#set-subject-name-strategy-per-topic[TopicNameStrategy]. For Protobuf data, the default behavior also deserializes records using the first message within the corresponding Protobuf schema in the Schema Registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth while to give an example of TopicNamingStrategy: if your topic is named foo the schema is looked up in foo-value.

@@ -76,8 +76,7 @@ rpk registry schema create ClickEvent-value --schema path/to/schema.avsc --type
echo '"key1" {"user_id":2324,"event_type":"BUTTON_CLICK","ts":"2024-11-25T20:23:59.380Z"}' | rpk topic produce ClickEvent --format='%k %v\n' --schema-id=topic
----
+
The `value_schema_id_prefix` requires that you produce to a topic using the Schema Registry wire format, which includes the magic byte and schema ID in the prefix of the message payload. This allows Redpanda to identify the correct schema version in the Schema Registry for a record. See the https://www.redpanda.com/blog/schema-registry-kafka-streaming#how-does-serialization-work-with-schema-registry-in-kafka[Understanding Apache Kafka Schema Registry^] blog post to learn more.

The `value_schema_id_prefix` mode requires that you produce to a topic using the Schema Registry wire format, which includes the magic byte and schema ID in the prefix of the message payload. This allows Redpanda to identify the correct schema version in the Schema Registry for a record.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

link to examples like in the modes doc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link to new section on wire format

Redpanda Schema Registry uses the default port 8081.
Redpanda Schema Registry uses the default port 8081.

== Serialization and deserialization
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rockwotj @mattschumpert Does this subheading make sense or does it need to specifically mention the wire format?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling it the wire format makes sense, because you can serialize/deserialize without it by having another mechanism to map a topic record to a schema: static mapping of topic to latest schema in your producer/consumer, communicating the schema ID using some other out of band mechanism (message header, control messages, etc).

Generally this is the "eco system standard" way of doing it.

The wire format is a sequence of bytes consisting of the following:

. The "magic byte," a single byte that always contains the value of 0.
. A four-byte integer containing the schema ID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically for protobuf there is additionally a series of variants as well encoding which protobuf message in the protobuf schema was used. I don't feel strongly about if we need to call that out however.

@kbatuigas kbatuigas requested a review from Feediver1 April 18, 2025 15:16

Creates an Iceberg table whose structure matches the Redpanda schema for the topic, with columns corresponding to each field. You must register a schema in the xref:manage:schema-reg/schema-reg-overview.adoc[Schema Registry] and producers must write to the topic using the Schema Registry wire format.

In the xref:manage:schema-reg/schema-reg-overview.adoc#serialization-and-deserialization[Schema Registry wire format], a "magic byte" and schema ID are embedded in the message payload header. Producers to the topic must use the wire format in the serialization process so Redpanda can determine the schema used for each record, use the schema to define the Iceberg table, and store the topic values in the corresponding table columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no def/link for "magic byte"?

)
----

Use `key_value` mode if the topic data is in JSON or if you can use the Iceberg data in its semi-structured format.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read this sentence 3-4 times, and am not 100% clear on its meaning.
Use key_value mode if the topic data is in JSON, or if you can, use the Iceberg data in its semi-structured format.
Use key_value mode if the topic data is in JSON, or the Iceberg data in its semi-structured format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased


The wire format is a sequence of bytes consisting of the following:

. The "magic byte," a single byte that always contains the value of 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh good--you defined it here. thx

Copy link
Contributor

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job Kat.

Copy link
Contributor

coderabbitai bot commented Apr 18, 2025

Walkthrough

This update introduces a new documentation page detailing the supported Iceberg integration modes in Redpanda, updates navigation and cross-references to include this new content, and refines existing documentation to clarify the configuration and schema translation for Iceberg-enabled topics. The release notes and topic property references are expanded to enumerate new features and modes, including support for a new value_schema_latest mode. The Schema Registry documentation is enhanced with a thorough explanation of the wire format. Several sections are streamlined for clarity, removing redundant environment-specific instructions and reorganizing content for better readability.

Changes

File(s) Change Summary
modules/manage/pages/iceberg/choose-iceberg-mode.adoc Added a new documentation page explaining Iceberg integration modes, configuration, schema translation, and table format mappings for Redpanda topics.
modules/ROOT/nav.adoc Added a navigation entry under "Iceberg" linking to the new "Choose Iceberg Mode" documentation.
modules/reference/pages/properties/topic-properties.adoc Updated redpanda.iceberg.mode topic property documentation: expanded mode descriptions, added new value_schema_latest mode, clarified wire format requirements, and added visual separators.
modules/get-started/pages/release-notes/redpanda.adoc Rewrote and expanded the Iceberg improvements section in release notes to enumerate features and add Avro/Protobuf support details.
modules/manage/pages/iceberg/query-iceberg-topics.adoc Updated cross-reference to point to the new Iceberg mode documentation; removed a phrase for clarity.
modules/manage/pages/schema-reg/schema-reg-overview.adoc Added a new "Wire format" section explaining the Schema Registry message format, serialization/deserialization process, and schema cache interactions.
modules/manage/partials/iceberg/about-iceberg-topics.adoc Consolidated environment-specific instructions, removed the partitioning section, added the new Iceberg mode, and replaced schema translation details with a reference to the new documentation.
modules/manage/partials/iceberg/query-iceberg-topics.adoc Clarified Iceberg table creation, improved explanation of schema registry wire format, updated references, and improved formatting.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Redpanda
    participant SchemaRegistry

    User->>Redpanda: Create or alter topic with redpanda.iceberg.mode
    alt value_schema_id_prefix mode
        User->>SchemaRegistry: Register schema (if needed)
        User->>Redpanda: Produce message with Schema Registry wire format
        Redpanda->>Redpanda: Parse message using schema ID from header
        Redpanda->>Iceberg: Map fields to table columns
    else value_schema_latest mode
        User->>SchemaRegistry: Register schema (if needed)
        User->>Redpanda: Produce message (no wire format required)
        Redpanda->>SchemaRegistry: Fetch latest schema for subject
        Redpanda->>Iceberg: Map fields to table columns
    else key_value mode
        User->>Redpanda: Produce message
        Redpanda->>Iceberg: Store key and value as columns
    else disabled mode
        User->>Redpanda: Produce message
        Redpanda->>Iceberg: Iceberg integration disabled
    end
Loading

Poem

In burrows deep, a change took root,
With modes for Iceberg, clear and astute.
No more confusion, the docs now gleam,
Four modes to choose, as smooth as cream.
Wire formats explained, schemas in tow—
This rabbit’s delighted to see knowledge grow!
🐇✨

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1136e9e and b0d933c.

📒 Files selected for processing (2)
  • modules/get-started/pages/release-notes/redpanda.adoc (1 hunks)
  • modules/manage/partials/iceberg/query-iceberg-topics.adoc (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/get-started/pages/release-notes/redpanda.adoc
  • modules/manage/partials/iceberg/query-iceberg-topics.adoc
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (11)
modules/manage/partials/iceberg/query-iceberg-topics.adoc (1)

4-5: Simplify table naming description.

The sentence "Redpanda generates an Iceberg table that has the same name as the topic name." is wordy and repetitive. Consider refactoring to:

Redpanda generates an Iceberg table with the same name as the topic.

This improves readability.

modules/manage/partials/iceberg/about-iceberg-topics.adoc (2)

121-122: Link new modes to the detailed mode guide.

You’ve added value_schema_id_prefix and value_schema_latest modes here, but these entries lack cross‑references to the more detailed configuration and schema‑translation guidance on the new “Choose an Iceberg Mode” page. Consider xref‑linking each mode name to that page (e.g., xref:manage/iceberg/choose-iceberg-mode.adoc[value_schema_latest]).


139-139: Include link to Schema Registry doc.

The step to register a schema is clear, but you may want to xref the exact Schema Registry API or UI page (e.g., xref:manage:schema-reg/schema-reg-overview.adoc[Schema Registry wire format]) so users know where to go next.

modules/manage/pages/schema-reg/schema-reg-overview.adoc (4)

7-7: Consider relocating this paragraph.

The new sentence on message exchange sits just above the design overview. It might fit more naturally under the “Serialization format” section to maintain topical flow.


36-36: Clarify default port note.

You’ve added “Redpanda Schema Registry uses the default port 8081.” To highlight this, consider wrapping it in an AsciiDoc [NOTE] block for greater visibility.


50-56: Unify conditional serialization blocks.

The non‑cloud and cloud variants for the serializer description are identical except for minor naming differences. Consider merging them into one block using xref macros or a single conditional, to reduce duplication.


60-61: Use precise terminology for prefixing.

Instead of “pads the beginning of the message,” you may want to say “prepends the magic byte and schema ID to the message payload” to avoid ambiguity.

modules/manage/pages/iceberg/choose-iceberg-mode.adoc (4)

3-3: Trim page categories.

You’ve listed six categories—consider narrowing this to the most relevant (e.g., Iceberg and Integration) to avoid over‑categorization.


36-37: Clarify “message payload header.”

There’s no separate header wrapper—this is simply prefixed data. Consider rephrasing to “embedded at the start of the message payload” for accuracy.


42-43: Link the TTL configuration.

You mention iceberg_latest_schema_cache_ttl_ms—xref the cluster property reference (e.g., xref:reference/cluster-properties.adoc#iceberg_latest_schema_cache_ttl_ms) so users can find details on adjusting this TTL.


67-73: Merge override blocks for clarity.

The ifndef::env-cloud[] and ifdef::env-cloud[] sections are identical. Combining them—or moving the shared content outside the conditional—would simplify maintenance.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f3bed1 and 4e3fca9.

⛔ Files ignored due to path filters (1)
  • modules/shared/images/schema-registry-wire-format.png is excluded by !**/*.png
📒 Files selected for processing (8)
  • modules/ROOT/nav.adoc (1 hunks)
  • modules/get-started/pages/release-notes/redpanda.adoc (1 hunks)
  • modules/manage/pages/iceberg/choose-iceberg-mode.adoc (1 hunks)
  • modules/manage/pages/iceberg/query-iceberg-topics.adoc (1 hunks)
  • modules/manage/pages/schema-reg/schema-reg-overview.adoc (2 hunks)
  • modules/manage/partials/iceberg/about-iceberg-topics.adoc (3 hunks)
  • modules/manage/partials/iceberg/query-iceberg-topics.adoc (2 hunks)
  • modules/reference/pages/properties/topic-properties.adoc (4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - redpanda-docs-preview
  • GitHub Check: Header rules - redpanda-docs-preview
  • GitHub Check: Pages changed - redpanda-docs-preview
🔇 Additional comments (10)
modules/ROOT/nav.adoc (1)

184-184: Added Iceberg mode navigation entry looks correct.

The new Choose Iceberg Mode page link is properly placed under the Iceberg section and follows the existing indentation and naming conventions.

modules/manage/pages/iceberg/query-iceberg-topics.adoc (1)

10-10: Updated cross-reference to the new Iceberg mode page is accurate.

The paragraph now points to the choose-iceberg-mode.adoc page for mode details, which improves discoverability. The removal of the redundant phrase enhances clarity.

modules/manage/partials/iceberg/query-iceberg-topics.adoc (1)

79-80:

✅ Verification successful

Verify anchor for the Schema Registry wire format link.

Please confirm that the target anchor #wire-format exists in modules/manage/pages/schema-reg/schema-reg-overview.adoc. If the actual anchor differs (e.g., #serialization-and-deserialization), update the xref accordingly.


🏁 Script executed:

#!/bin/bash
# Verify the anchor for wire format in the Schema Registry overview
grep -R "wire-format" modules/manage/pages/schema-reg/schema-reg-overview.adoc

Length of output: 159


Continuing verification to locate the actual section anchor:


🏁 Script executed:

#!/bin/bash
# List all explicit Asciidoc anchors
grep -R "\[\[" -n modules/manage/pages/schema-reg/schema-reg-overview.adoc

# List all section headings
grep -R "^=+" -n modules/manage/pages/schema-reg/schema-reg-overview.adoc

# Search for serialization-related headings
grep -R "[sS]erialization" -n modules/manage/pages/schema-reg/schema-reg-overview.adoc

# Search for any occurrence of “format”
grep -R "format" -n modules/manage/pages/schema-reg/schema-reg-overview.adoc

Length of output: 5430


Anchor Confirmed – No Change Required

The == Wire format heading in modules/manage/pages/schema-reg/schema-reg-overview.adoc (line 38) auto‑generates the wire‑format anchor. The existing

xref:manage:schema-reg/schema-reg-overview.adoc#wire-format[Schema Registry wire format]

is therefore correct.

modules/manage/partials/iceberg/about-iceberg-topics.adoc (3)

22-22: Verify the manifest file format.

You’ve streamlined the manifest files section to state they are in JSON format. Please confirm that Redpanda’s Iceberg integration indeed emits JSON manifest files (the Iceberg spec defaults to Avro manifests).


136-137: Good addition: cross‑reference to the new guide.

Linking out to choose-iceberg-mode.adoc here helps users find the in‑depth mode explanations.


166-166: Validate the schema‑translation cross‑reference.

Ensure that the anchor #schema-types-translation exists in the target page (choose-iceberg-mode.adoc) so the link resolves correctly.

modules/manage/pages/schema-reg/schema-reg-overview.adoc (1)

38-47: Great addition of the wire-format section.

This new “Wire format” section clearly defines the magic byte and schema ID prefix. It fills the previous documentation gap regarding message framing.

modules/manage/pages/iceberg/choose-iceberg-mode.adoc (3)

1-5: File header looks good.

The title, description, and single-source tag are correctly set for this new page.


14-16: Clear introduction.

The link to the about-iceberg-topics page and the property description provides good context.


52-56: Verify CLI syntax.

Please confirm that the --topic-config= and --set flags shown here match the latest rpk version’s syntax, as these occasionally change between releases.

Comment on lines 65 to 69
Iceberg-enabled topics now support the following:

- xref:manage:iceberg/about-iceberg-topics.adoc#use-custom-partitioning[Custom partitioning] for improved query performance.
- Snapshot expiry.
- xref:manage:iceberg/about-iceberg-topics.adoc#manage-dead-letter-queue[Dead-letter queue] for invalid records.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Link snapshot expiry to its documentation anchor.

The bullet for Snapshot expiry is missing a cross-reference. Please add an xref to the corresponding section in about-iceberg-topics.adoc (e.g., #snapshot-expiry) for consistency with the other bullets.

Comment on lines +211 to +212
| sfixed64 | int
| string | string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Correct Protobuf fixed‑width mapping.

Currently sfixed64 is mapped to int; it should map to long.

-| sfixed64 | int
+| sfixed64 | long
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| sfixed64 | int
| string | string
| sfixed64 | long
| string | string

| string | string
| record | struct
| array | list
| maps | list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix Avro map mapping.

The table lists Avro maps → Iceberg list, but Iceberg supports native map types. Please update to:

-| maps    | list
+| maps    | map
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| maps | list
| maps | map

@@ -7,7 +7,7 @@
include::shared:partial$enterprise-license.adoc[]
====

When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic xref:manage:iceberg/about-iceberg-topics.adoc#enable-iceberg-integration[Iceberg mode] and whether you've registered a schema for the topic in the xref:manage:schema-reg/schema-reg-overview.adoc[Redpanda Schema Registry]. In either mode, you do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda.
When you access Iceberg topics from a data lakehouse or other Iceberg-compatible tools, how you consume the data depends on the topic xref:manage:iceberg/choose-iceberg-mode.adoc[Iceberg mode] and whether you've registered a schema for the topic in the xref:manage:schema-reg/schema-reg-overview.adoc[Redpanda Schema Registry]. You do not need to rely on complex ETL jobs or pipelines to access real-time data from Redpanda.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kbatuigas this page has examples using the other mode but no mention of "value_schema_latest" mode at all. Even if we don't have an example it's probably worth a mention how querying works in this mode (essentially the same as with value_schema_id_prefix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added note

@kbatuigas kbatuigas merged commit e68ee61 into main Apr 18, 2025
8 checks passed
@kbatuigas kbatuigas deleted the DOC-1140-Document-value_latest_schema-mode branch April 18, 2025 19:49
@coderabbitai coderabbitai bot mentioned this pull request Aug 6, 2025
4 tasks
@coderabbitai coderabbitai bot mentioned this pull request Aug 22, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants