Unity Catalog config for predictive optimization (#1333)

kbatuigas · web-flow · commit 1126e17af03f · 2025-08-25T10:38:40.000-07:00
diff --git a/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc b/modules/manage/pages/iceberg/iceberg-topics-databricks-unity.adoc
@@ -23,6 +23,23 @@ endif::[]
 * A Databricks workspace in the same region as your S3 bucket. See the https://docs.databricks.com/aws/en/resources/supported-regions#supported-regions-list[list of supported AWS regions^].
 * Unity Catalog enabled in your Databricks workspace. See the https://docs.databricks.com/aws/en/data-governance/unity-catalog/get-started[Databricks documentation^] to set up Unity Catalog for your workspace.
 * https://docs.databricks.com/aws/en/optimizations/predictive-optimization#enable-predictive-optimization[Predictive optimization^] enabled for Unity Catalog.
++
+[NOTE]
+====
+When you enable predictive optimization, you must also set the following configurations in your Databricks workspace. These configurations allow predictive optimization to automatically generate column statistics and carry out background compaction for Iceberg tables:
+
+```sql
+SET spark.databricks.delta.liquid.lazyClustering.backfillStats=true;
+SET spark.databricks.delta.computeStats.autoConflictResolution=true;
+
+/*
+After setting these configurations, you can optionally run OPTIMIZE to 
+immediately trigger compaction and liquid clustering, or let predictive 
+optimization handle it automatically later.
+*/
+OPTIMIZE `<catalog-name>`.redpanda.`<table-name>`;
+```
+====
 * https://docs.databricks.com/aws/en/external-access/admin[External data access^] enabled in your metastore.
 * Workspace admin privileges to complete the steps to create a Unity Catalog storage credential and external location that connects your cluster's Tiered Storage bucket to Databricks.
 
@@ -189,7 +206,7 @@ The following example shows how to query the Iceberg table using SQL in Databric
 [,sql]
 ----
 -- Ensure that the catalog and table name are correctly parsed in case they contain special characters
-SELECT * FROM `<catalog-name>`.redpanda.`<table-name>`;
+SELECT * FROM `<catalog-name>`.redpanda.`<table-name>` LIMIT 10;
 ----
 +
 Your query results should look like the following: