Merge pull request #4545 from ClickHouse/dbt-clickhouse/documentation-updates

Blargian · web-flow · commit c31d64bc5d71 · 2025-10-09T18:16:25.000+02:00
Small improvements on the dbt-clickhouse documentation
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md b/docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md
@@ -55,6 +55,7 @@ your_profile_name:
       tcp_keepalive: [False] # Native client only, specify TCP keepalive configuration. Specify custom keepalive settings as [idle_time_sec, interval_sec, probes].
       custom_settings: [{}] # A dictionary/mapping of custom ClickHouse settings for the connection - default is empty.
       database_engine: '' # Database engine to use when creating new ClickHouse schemas (databases).  If not set (the default), new databases will use the default ClickHouse database engine (usually Atomic).
+      threads: [1] # Number of threads to use when running queries. Before setting it to a number higher than 1, make sure to read the [read-after-write consistency](#read-after-write-consistency) section.
       
       # Native (clickhouse-driver) connection settings
       sync_request_timeout: [5] # Timeout for server ping
@@ -87,6 +88,12 @@ seeds:
 
 ### About the ClickHouse Cluster {#about-the-clickhouse-cluster}
 
+When using a ClickHouse cluster, you need to consider two things:
+- Setting the `cluster` setting.
+- Ensuring read-after-write consistency, especially if you are using more than one `threads`.
+
+#### Cluster Setting {#cluster-setting}
+
 The `cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster. If `cluster` is set in the profile, **all models will be created with the `ON CLUSTER` clause** by default—except for those using a **Replicated** engine. This includes:
 
 - Database creation
@@ -111,11 +118,17 @@ To **opt out** of cluster-based creation for a specific model, add the `disable_
 table and incremental materializations with non-replicated engine will not be affected by `cluster` setting (model would
 be created on the connected node only).
 
-#### Compatibility {#compatibility}
+**Compatibility**
 
 If a model has been created without a `cluster` setting, dbt-clickhouse will detect the situation and run all DDL/DML
 without `on cluster` clause for this model.
 
+#### Read-after-write Consistency {#read-after-write-consistency}
+
+dbt relies on a read-after-insert consistency model. This is not compatible with ClickHouse clusters that have more than one replica if you cannot guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
+- If you are using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
+- If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
+
 ## General information about features {#general-information-about-features}
 
 ### General table configurations {#general-table-configurations}
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/guides.md b/docs/integrations/data-ingestion/etl-tools/dbt/guides.md
@@ -23,7 +23,7 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
 
 <ClickHouseSupportedBadge/>
 
-This section provides guides on setting up dbt and the ClickHouse adapter, as well as an example of using dbt with ClickHouse. The example covers the following:
+This section provides guides on setting up dbt and the ClickHouse adapter, as well as an example of using dbt with ClickHouse using a publicly available IMDB dataset. The example covers the following steps:
 
 1. Creating a dbt project and setting up the ClickHouse adapter.
 2. Defining a model.
@@ -1046,4 +1046,8 @@ dbt provides the ability to load data from CSV files. This capability is not sui
     |Horror |HOR |
     |War    |WAR |
     +-------+----+=
-    ```
+    ```
+
+## Further Information {#further-information}
+
+The previous guides only touch the surface of dbt functionality. Users are recommended to read the excellent [dbt documentation](https://docs.getdbt.com/docs/introduction).
diff --git a/docs/integrations/data-ingestion/etl-tools/dbt/index.md b/docs/integrations/data-ingestion/etl-tools/dbt/index.md
@@ -19,13 +19,13 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
 
 Within dbt, these models can be cross-referenced and layered to allow the construction of higher-level concepts. The boilerplate SQL required to connect models is automatically generated. Furthermore, dbt identifies dependencies between models and ensures they are created in the appropriate order using a directed acyclic graph (DAG).
 
-Dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse). We describe the process for connecting ClickHouse with a simple example based on a publicly available IMDB dataset. We additionally highlight some of the limitations of the current connector.
+dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse).
 
 <TOCInline toc={toc}  maxHeadingLevel={2} />
 
 ## Supported features {#supported-features}
 
-**Supported features**
+List of supported features:
 - [x] Table materialization
 - [x] View materialization
 - [x] Incremental materialization
@@ -42,6 +42,10 @@ Dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https
 - [x] Distributed incremental materialization (experimental)
 - [x] Contracts
 
+All features up to dbt-core 1.9 are supported. We will soon add the features added in dbt-core 1.10.
+
+This adapter is still not available for use inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to make it available soon. Please reach out to support to get more information on this.
+
 ## Concepts {#concepts}
 
 dbt introduces the concept of a model. This is defined as a SQL statement, potentially joining many tables. A model can be "materialized" in a number of ways. A materialization represents a build strategy for the model's select query. The code behind a materialization is boilerplate SQL that wraps your SELECT query in a statement in order to create a new or update an existing relation.
@@ -79,27 +83,32 @@ The following are [experimental features](https://clickhouse.com/docs/en/beta-an
 
 ### Install dbt-core and dbt-clickhouse {#install-dbt-core-and-dbt-clickhouse}
 
+dbt provides several options for installing the command-line interface (CLI), which are detailed [here](https://docs.getdbt.com/dbt-cli/install/overview). We recommend using `pip` to install both dbt and dbt-clickhouse.
+
 ```sh
-pip install dbt-clickhouse
+pip install dbt-core dbt-clickhouse
 ```
 
 ### Provide dbt with the connection details for our ClickHouse instance. {#provide-dbt-with-the-connection-details-for-our-clickhouse-instance}
-Configure `clickhouse` profile in `~/.dbt/profiles.yml` file and provide user, password, schema host properties. The full list of connection configuration options is available in the [Features and configurations](/integrations/dbt/features-and-configurations) page:
+Configure the `clickhouse-service` profile in the `~/.dbt/profiles.yml` file and provide the schema, host, port, user, and password properties. The full list of connection configuration options is available in the [Features and configurations](/integrations/dbt/features-and-configurations) page:
 ```yaml
-clickhouse:
+clickhouse-service:
   target: dev
   outputs:
     dev:
       type: clickhouse
-      schema: <target_schema>
-      host: <host>
-      port: 8443 # use 9440 for native
-      user: default
-      password: <password>
-      secure: True
+      schema: [ default ] # ClickHouse database for dbt models
+
+      # Optional
+      host: [ localhost ]
+      port: [ 8123 ]  # Defaults to 8123, 8443, 9000, 9440 depending on the secure and driver settings 
+      user: [ default ] # User for all database operations
+      password: [ <empty string> ] # Password for the user
+      secure: True  # Use TLS (native protocol) or HTTPS (http protocol)
 ```
 
 ### Create a dbt project {#create-a-dbt-project}
+You can now use this profile in one of your existing projects or create a new one using:
 
 ```sh
 dbt init project_name
@@ -108,20 +117,12 @@ dbt init project_name
 Inside `project_name` dir, update your `dbt_project.yml` file to specify a profile name to connect to the ClickHouse server.
 
 ```yaml
-profile: 'clickhouse'
+profile: 'clickhouse-service'
 ```
 
 ### Test connection {#test-connection}
 Execute `dbt debug` with the CLI tool to confirm whether dbt is able to connect to ClickHouse. Confirm the response includes `Connection test: [OK connection ok]` indicating a successful connection.
 
-We assume the use of the dbt CLI for the following examples. This adapter is still not available for usage inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to get it available soon. Please reach out to support to get more info on this.
-
-dbt offers a number of options for CLI installation. Follow the instructions described[ here](https://docs.getdbt.com/dbt-cli/install/overview). At this stage install dbt-core only. We recommend the use of `pip` to install both dbt and dbt-clickhouse.
-
-```bash
-pip install dbt-clickhouse
-```
-
 Go to the [guides page](/integrations/dbt/guides) to learn more about how to use dbt with ClickHouse.
 
 ## Troubleshooting Connections {#troubleshooting-connections}
@@ -137,16 +138,12 @@ If you encounter issues connecting to ClickHouse from dbt, make sure the followi
 
 The current ClickHouse adapter for dbt has several limitations users should be aware of:
 
-1. The adapter currently materializes models as tables using an `INSERT TO SELECT`. This effectively means data duplication. Very large datasets (PB) can result in extremely long run times, making some models unviable. Aim to minimize the number of rows returned by any query, utilizing GROUP BY where possible. Prefer models which summarize data over those which simply perform a transform whilst maintaining row counts of the source.
-2. To use Distributed tables to represent a model, users must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
-3. When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
-4. Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements. <!-- TODO review this limitation, looks like the issue was already closed and the fix was introduced in 24.10 -->
-
-Further Information
-
-The previous guides only touch the surface of dbt functionality. Users are recommended to read the excellent [dbt documentation](https://docs.getdbt.com/docs/introduction).
-
-Additional configuration for the adapter is described [here](https://github.com/silentsokolov/dbt-clickhouse#model-configuration).
+- The plugin uses syntax that requires ClickHouse version 25.3 or newer. We do not test older versions of Clickhouse. We also do not currently test Replicated tables.
+- Different runs of the `dbt-adapter` may collide if they are run at the same time as internally they can use the same table names for the same operations. For more information, check the issue [#420](https://github.com/ClickHouse/dbt-clickhouse/issues/420).
+- The adapter currently materializes models as tables using an [INSERT INTO SELECT](https://clickhouse.com/docs/sql-reference/statements/insert-into#inserting-the-results-of-select). This effectively means data duplication if the run is executed again. Very large datasets (PB) can result in extremely long run times, making some models unviable. To improve performance, use ClickHouse Materialized Views by implementing the view as `materialized: materialization_view`. Additionally, aim to minimize the number of rows returned by any query by utilizing `GROUP BY` where possible. Prefer models that summarize data over those that simply transform while maintaining row counts of the source.
+- To use Distributed tables to represent a model, users must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
+- When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
+- Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements. <!-- TODO review this limitation, looks like the issue was already closed and the fix was introduced in 24.10 -->
 
 ## Fivetran {#fivetran}