Skip to content

Commit c31d64b

Browse files
authored
Merge pull request #4545 from ClickHouse/dbt-clickhouse/documentation-updates
Small improvements on the dbt-clickhouse documentation
2 parents 87458ea + 97d617d commit c31d64b

File tree

3 files changed

+47
-33
lines changed

3 files changed

+47
-33
lines changed

docs/integrations/data-ingestion/etl-tools/dbt/features-and-configurations.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ your_profile_name:
5555
tcp_keepalive: [False] # Native client only, specify TCP keepalive configuration. Specify custom keepalive settings as [idle_time_sec, interval_sec, probes].
5656
custom_settings: [{}] # A dictionary/mapping of custom ClickHouse settings for the connection - default is empty.
5757
database_engine: '' # Database engine to use when creating new ClickHouse schemas (databases). If not set (the default), new databases will use the default ClickHouse database engine (usually Atomic).
58+
threads: [1] # Number of threads to use when running queries. Before setting it to a number higher than 1, make sure to read the [read-after-write consistency](#read-after-write-consistency) section.
5859

5960
# Native (clickhouse-driver) connection settings
6061
sync_request_timeout: [5] # Timeout for server ping
@@ -87,6 +88,12 @@ seeds:
8788

8889
### About the ClickHouse Cluster {#about-the-clickhouse-cluster}
8990

91+
When using a ClickHouse cluster, you need to consider two things:
92+
- Setting the `cluster` setting.
93+
- Ensuring read-after-write consistency, especially if you are using more than one `threads`.
94+
95+
#### Cluster Setting {#cluster-setting}
96+
9097
The `cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster. If `cluster` is set in the profile, **all models will be created with the `ON CLUSTER` clause** by default—except for those using a **Replicated** engine. This includes:
9198

9299
- Database creation
@@ -111,11 +118,17 @@ To **opt out** of cluster-based creation for a specific model, add the `disable_
111118
table and incremental materializations with non-replicated engine will not be affected by `cluster` setting (model would
112119
be created on the connected node only).
113120

114-
#### Compatibility {#compatibility}
121+
**Compatibility**
115122

116123
If a model has been created without a `cluster` setting, dbt-clickhouse will detect the situation and run all DDL/DML
117124
without `on cluster` clause for this model.
118125

126+
#### Read-after-write Consistency {#read-after-write-consistency}
127+
128+
dbt relies on a read-after-insert consistency model. This is not compatible with ClickHouse clusters that have more than one replica if you cannot guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
129+
- If you are using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
130+
- If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
131+
119132
## General information about features {#general-information-about-features}
120133

121134
### General table configurations {#general-table-configurations}

docs/integrations/data-ingestion/etl-tools/dbt/guides.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
2323

2424
<ClickHouseSupportedBadge/>
2525

26-
This section provides guides on setting up dbt and the ClickHouse adapter, as well as an example of using dbt with ClickHouse. The example covers the following:
26+
This section provides guides on setting up dbt and the ClickHouse adapter, as well as an example of using dbt with ClickHouse using a publicly available IMDB dataset. The example covers the following steps:
2727

2828
1. Creating a dbt project and setting up the ClickHouse adapter.
2929
2. Defining a model.
@@ -1046,4 +1046,8 @@ dbt provides the ability to load data from CSV files. This capability is not sui
10461046
|Horror |HOR |
10471047
|War |WAR |
10481048
+-------+----+=
1049-
```
1049+
```
1050+
1051+
## Further Information {#further-information}
1052+
1053+
The previous guides only touch the surface of dbt functionality. Users are recommended to read the excellent [dbt documentation](https://docs.getdbt.com/docs/introduction).

docs/integrations/data-ingestion/etl-tools/dbt/index.md

Lines changed: 27 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,13 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
1919

2020
Within dbt, these models can be cross-referenced and layered to allow the construction of higher-level concepts. The boilerplate SQL required to connect models is automatically generated. Furthermore, dbt identifies dependencies between models and ensures they are created in the appropriate order using a directed acyclic graph (DAG).
2121

22-
Dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse). We describe the process for connecting ClickHouse with a simple example based on a publicly available IMDB dataset. We additionally highlight some of the limitations of the current connector.
22+
dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse).
2323

2424
<TOCInline toc={toc} maxHeadingLevel={2} />
2525

2626
## Supported features {#supported-features}
2727

28-
**Supported features**
28+
List of supported features:
2929
- [x] Table materialization
3030
- [x] View materialization
3131
- [x] Incremental materialization
@@ -42,6 +42,10 @@ Dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https
4242
- [x] Distributed incremental materialization (experimental)
4343
- [x] Contracts
4444

45+
All features up to dbt-core 1.9 are supported. We will soon add the features added in dbt-core 1.10.
46+
47+
This adapter is still not available for use inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to make it available soon. Please reach out to support to get more information on this.
48+
4549
## Concepts {#concepts}
4650

4751
dbt introduces the concept of a model. This is defined as a SQL statement, potentially joining many tables. A model can be "materialized" in a number of ways. A materialization represents a build strategy for the model's select query. The code behind a materialization is boilerplate SQL that wraps your SELECT query in a statement in order to create a new or update an existing relation.
@@ -79,27 +83,32 @@ The following are [experimental features](https://clickhouse.com/docs/en/beta-an
7983

8084
### Install dbt-core and dbt-clickhouse {#install-dbt-core-and-dbt-clickhouse}
8185

86+
dbt provides several options for installing the command-line interface (CLI), which are detailed [here](https://docs.getdbt.com/dbt-cli/install/overview). We recommend using `pip` to install both dbt and dbt-clickhouse.
87+
8288
```sh
83-
pip install dbt-clickhouse
89+
pip install dbt-core dbt-clickhouse
8490
```
8591

8692
### Provide dbt with the connection details for our ClickHouse instance. {#provide-dbt-with-the-connection-details-for-our-clickhouse-instance}
87-
Configure `clickhouse` profile in `~/.dbt/profiles.yml` file and provide user, password, schema host properties. The full list of connection configuration options is available in the [Features and configurations](/integrations/dbt/features-and-configurations) page:
93+
Configure the `clickhouse-service` profile in the `~/.dbt/profiles.yml` file and provide the schema, host, port, user, and password properties. The full list of connection configuration options is available in the [Features and configurations](/integrations/dbt/features-and-configurations) page:
8894
```yaml
89-
clickhouse:
95+
clickhouse-service:
9096
target: dev
9197
outputs:
9298
dev:
9399
type: clickhouse
94-
schema: <target_schema>
95-
host: <host>
96-
port: 8443 # use 9440 for native
97-
user: default
98-
password: <password>
99-
secure: True
100+
schema: [ default ] # ClickHouse database for dbt models
101+
102+
# Optional
103+
host: [ localhost ]
104+
port: [ 8123 ] # Defaults to 8123, 8443, 9000, 9440 depending on the secure and driver settings
105+
user: [ default ] # User for all database operations
106+
password: [ <empty string> ] # Password for the user
107+
secure: True # Use TLS (native protocol) or HTTPS (http protocol)
100108
```
101109
102110
### Create a dbt project {#create-a-dbt-project}
111+
You can now use this profile in one of your existing projects or create a new one using:
103112
104113
```sh
105114
dbt init project_name
@@ -108,20 +117,12 @@ dbt init project_name
108117
Inside `project_name` dir, update your `dbt_project.yml` file to specify a profile name to connect to the ClickHouse server.
109118

110119
```yaml
111-
profile: 'clickhouse'
120+
profile: 'clickhouse-service'
112121
```
113122
114123
### Test connection {#test-connection}
115124
Execute `dbt debug` with the CLI tool to confirm whether dbt is able to connect to ClickHouse. Confirm the response includes `Connection test: [OK connection ok]` indicating a successful connection.
116125

117-
We assume the use of the dbt CLI for the following examples. This adapter is still not available for usage inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to get it available soon. Please reach out to support to get more info on this.
118-
119-
dbt offers a number of options for CLI installation. Follow the instructions described[ here](https://docs.getdbt.com/dbt-cli/install/overview). At this stage install dbt-core only. We recommend the use of `pip` to install both dbt and dbt-clickhouse.
120-
121-
```bash
122-
pip install dbt-clickhouse
123-
```
124-
125126
Go to the [guides page](/integrations/dbt/guides) to learn more about how to use dbt with ClickHouse.
126127

127128
## Troubleshooting Connections {#troubleshooting-connections}
@@ -137,16 +138,12 @@ If you encounter issues connecting to ClickHouse from dbt, make sure the followi
137138

138139
The current ClickHouse adapter for dbt has several limitations users should be aware of:
139140

140-
1. The adapter currently materializes models as tables using an `INSERT TO SELECT`. This effectively means data duplication. Very large datasets (PB) can result in extremely long run times, making some models unviable. Aim to minimize the number of rows returned by any query, utilizing GROUP BY where possible. Prefer models which summarize data over those which simply perform a transform whilst maintaining row counts of the source.
141-
2. To use Distributed tables to represent a model, users must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
142-
3. When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
143-
4. Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements. <!-- TODO review this limitation, looks like the issue was already closed and the fix was introduced in 24.10 -->
144-
145-
Further Information
146-
147-
The previous guides only touch the surface of dbt functionality. Users are recommended to read the excellent [dbt documentation](https://docs.getdbt.com/docs/introduction).
148-
149-
Additional configuration for the adapter is described [here](https://github.com/silentsokolov/dbt-clickhouse#model-configuration).
141+
- The plugin uses syntax that requires ClickHouse version 25.3 or newer. We do not test older versions of Clickhouse. We also do not currently test Replicated tables.
142+
- Different runs of the `dbt-adapter` may collide if they are run at the same time as internally they can use the same table names for the same operations. For more information, check the issue [#420](https://github.com/ClickHouse/dbt-clickhouse/issues/420).
143+
- The adapter currently materializes models as tables using an [INSERT INTO SELECT](https://clickhouse.com/docs/sql-reference/statements/insert-into#inserting-the-results-of-select). This effectively means data duplication if the run is executed again. Very large datasets (PB) can result in extremely long run times, making some models unviable. To improve performance, use ClickHouse Materialized Views by implementing the view as `materialized: materialization_view`. Additionally, aim to minimize the number of rows returned by any query by utilizing `GROUP BY` where possible. Prefer models that summarize data over those that simply transform while maintaining row counts of the source.
144+
- To use Distributed tables to represent a model, users must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
145+
- When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
146+
- Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements. <!-- TODO review this limitation, looks like the issue was already closed and the fix was introduced in 24.10 -->
150147

151148
## Fivetran {#fivetran}
152149

0 commit comments

Comments
 (0)