You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
custom_settings: [{}] # A dictionary/mapping of custom ClickHouse settings for the connection - default is empty.
57
57
database_engine: ''# Database engine to use when creating new ClickHouse schemas (databases). If not set (the default), new databases will use the default ClickHouse database engine (usually Atomic).
58
+
threads: [1] # Number of threads to use when running queries. Before setting it to a number higher than 1, make sure to read the [read-after-write consistency](#read-after-write-consistency) section.
58
59
59
60
# Native (clickhouse-driver) connection settings
60
61
sync_request_timeout: [5] # Timeout for server ping
@@ -87,6 +88,12 @@ seeds:
87
88
88
89
### About the ClickHouse Cluster {#about-the-clickhouse-cluster}
89
90
91
+
When using a ClickHouse cluster, you need to consider two things:
92
+
- Setting the `cluster` setting.
93
+
- Ensuring read-after-write consistency, especially if you are using more than one `threads`.
94
+
95
+
#### Cluster Setting {#cluster-setting}
96
+
90
97
The `cluster` setting in profile enables dbt-clickhouse to run against a ClickHouse cluster. If `cluster` is set in the profile, **all models will be created with the `ON CLUSTER` clause** by default—except for those using a **Replicated** engine. This includes:
91
98
92
99
- Database creation
@@ -111,11 +118,17 @@ To **opt out** of cluster-based creation for a specific model, add the `disable_
111
118
table and incremental materializations with non-replicated engine will not be affected by `cluster` setting (model would
112
119
be created on the connected node only).
113
120
114
-
#### Compatibility {#compatibility}
121
+
**Compatibility**
115
122
116
123
If a model has been created without a `cluster` setting, dbt-clickhouse will detect the situation and run all DDL/DML
dbt relies on a read-after-insert consistency model. This is not compatible with ClickHouse clusters that have more than one replica if you cannot guarantee that all operations will go to the same replica. You may not encounter problems in your day-to-day usage of dbt, but there are some strategies depending on your cluster to have this guarantee in place:
129
+
- If you are using a ClickHouse Cloud cluster, you only need to set `select_sequential_consistency: 1` in your profile's `custom_settings` property. You can find more information about this setting [here](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
130
+
- If you are using a self-hosted cluster, make sure all dbt requests are sent to the same ClickHouse replica. If you have a load balancer on top of it, try using some `replica aware routing`/`sticky sessions` mechanism to be able to always reach the same replica. Adding the setting `select_sequential_consistency = 1` in clusters outside ClickHouse Cloud is [not recommended](https://clickhouse.com/docs/operations/settings/settings#select_sequential_consistency).
131
+
119
132
## General information about features {#general-information-about-features}
120
133
121
134
### General table configurations {#general-table-configurations}
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/etl-tools/dbt/guides.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
23
23
24
24
<ClickHouseSupportedBadge/>
25
25
26
-
This section provides guides on setting up dbt and the ClickHouse adapter, as well as an example of using dbt with ClickHouse. The example covers the following:
26
+
This section provides guides on setting up dbt and the ClickHouse adapter, as well as an example of using dbt with ClickHouse using a publicly available IMDB dataset. The example covers the following steps:
27
27
28
28
1. Creating a dbt project and setting up the ClickHouse adapter.
29
29
2. Defining a model.
@@ -1046,4 +1046,8 @@ dbt provides the ability to load data from CSV files. This capability is not sui
1046
1046
|Horror |HOR |
1047
1047
|War |WAR |
1048
1048
+-------+----+=
1049
-
```
1049
+
```
1050
+
1051
+
## Further Information {#further-information}
1052
+
1053
+
The previous guides only touch the surface of dbt functionality. Users are recommended to read the excellent [dbt documentation](https://docs.getdbt.com/docs/introduction).
Copy file name to clipboardExpand all lines: docs/integrations/data-ingestion/etl-tools/dbt/index.md
+27-30Lines changed: 27 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,13 +19,13 @@ import ClickHouseSupportedBadge from '@theme/badges/ClickHouseSupported';
19
19
20
20
Within dbt, these models can be cross-referenced and layered to allow the construction of higher-level concepts. The boilerplate SQL required to connect models is automatically generated. Furthermore, dbt identifies dependencies between models and ensures they are created in the appropriate order using a directed acyclic graph (DAG).
21
21
22
-
Dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse). We describe the process for connecting ClickHouse with a simple example based on a publicly available IMDB dataset. We additionally highlight some of the limitations of the current connector.
22
+
dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https://github.com/ClickHouse/dbt-clickhouse).
23
23
24
24
<TOCInlinetoc={toc}maxHeadingLevel={2} />
25
25
26
26
## Supported features {#supported-features}
27
27
28
-
**Supported features**
28
+
List of supported features:
29
29
-[x] Table materialization
30
30
-[x] View materialization
31
31
-[x] Incremental materialization
@@ -42,6 +42,10 @@ Dbt is compatible with ClickHouse through a [ClickHouse-supported adapter](https
All features up to dbt-core 1.9 are supported. We will soon add the features added in dbt-core 1.10.
46
+
47
+
This adapter is still not available for use inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to make it available soon. Please reach out to support to get more information on this.
48
+
45
49
## Concepts {#concepts}
46
50
47
51
dbt introduces the concept of a model. This is defined as a SQL statement, potentially joining many tables. A model can be "materialized" in a number of ways. A materialization represents a build strategy for the model's select query. The code behind a materialization is boilerplate SQL that wraps your SELECT query in a statement in order to create a new or update an existing relation.
@@ -79,27 +83,32 @@ The following are [experimental features](https://clickhouse.com/docs/en/beta-an
79
83
80
84
### Install dbt-core and dbt-clickhouse {#install-dbt-core-and-dbt-clickhouse}
81
85
86
+
dbt provides several options for installing the command-line interface (CLI), which are detailed [here](https://docs.getdbt.com/dbt-cli/install/overview). We recommend using `pip` to install both dbt and dbt-clickhouse.
87
+
82
88
```sh
83
-
pip install dbt-clickhouse
89
+
pip install dbt-core dbt-clickhouse
84
90
```
85
91
86
92
### Provide dbt with the connection details for our ClickHouse instance. {#provide-dbt-with-the-connection-details-for-our-clickhouse-instance}
87
-
Configure `clickhouse` profile in `~/.dbt/profiles.yml` file and provide user, password, schema host properties. The full list of connection configuration options is available in the [Features and configurations](/integrations/dbt/features-and-configurations) page:
93
+
Configure the `clickhouse-service` profile in the `~/.dbt/profiles.yml` file and provide the schema, host, port, user, and password properties. The full list of connection configuration options is available in the [Features and configurations](/integrations/dbt/features-and-configurations) page:
88
94
```yaml
89
-
clickhouse:
95
+
clickhouse-service:
90
96
target: dev
91
97
outputs:
92
98
dev:
93
99
type: clickhouse
94
-
schema: <target_schema>
95
-
host: <host>
96
-
port: 8443# use 9440 for native
97
-
user: default
98
-
password: <password>
99
-
secure: True
100
+
schema: [ default ] # ClickHouse database for dbt models
101
+
102
+
# Optional
103
+
host: [ localhost ]
104
+
port: [ 8123 ] # Defaults to 8123, 8443, 9000, 9440 depending on the secure and driver settings
105
+
user: [ default ] # User for all database operations
106
+
password: [ <empty string> ] # Password for the user
107
+
secure: True # Use TLS (native protocol) or HTTPS (http protocol)
100
108
```
101
109
102
110
### Create a dbt project {#create-a-dbt-project}
111
+
You can now use this profile in one of your existing projects or create a new one using:
103
112
104
113
```sh
105
114
dbt init project_name
@@ -108,20 +117,12 @@ dbt init project_name
108
117
Inside `project_name` dir, update your `dbt_project.yml` file to specify a profile name to connect to the ClickHouse server.
109
118
110
119
```yaml
111
-
profile: 'clickhouse'
120
+
profile: 'clickhouse-service'
112
121
```
113
122
114
123
### Test connection {#test-connection}
115
124
Execute `dbt debug` with the CLI tool to confirm whether dbt is able to connect to ClickHouse. Confirm the response includes `Connection test: [OK connection ok]` indicating a successful connection.
116
125
117
-
We assume the use of the dbt CLI for the following examples. This adapter is still not available for usage inside [dbt Cloud](https://docs.getdbt.com/docs/dbt-cloud/cloud-overview), but we expect to get it available soon. Please reach out to support to get more info on this.
118
-
119
-
dbt offers a number of options for CLI installation. Follow the instructions described[ here](https://docs.getdbt.com/dbt-cli/install/overview). At this stage install dbt-core only. We recommend the use of `pip` to install both dbt and dbt-clickhouse.
120
-
121
-
```bash
122
-
pip install dbt-clickhouse
123
-
```
124
-
125
126
Go to the [guides page](/integrations/dbt/guides) to learn more about how to use dbt with ClickHouse.
@@ -137,16 +138,12 @@ If you encounter issues connecting to ClickHouse from dbt, make sure the followi
137
138
138
139
The current ClickHouse adapter for dbt has several limitations users should be aware of:
139
140
140
-
1. The adapter currently materializes models as tables using an `INSERT TO SELECT`. This effectively means data duplication. Very large datasets (PB) can result in extremely long run times, making some models unviable. Aim to minimize the number of rows returned by any query, utilizing GROUP BY where possible. Prefer models which summarize data over those which simply perform a transform whilst maintaining row counts of the source.
141
-
2. To use Distributed tables to represent a model, users must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
142
-
3. When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
143
-
4. Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements. <!-- TODO review this limitation, looks like the issue was already closed and the fix was introduced in 24.10 -->
144
-
145
-
Further Information
146
-
147
-
The previous guides only touch the surface of dbt functionality. Users are recommended to read the excellent [dbt documentation](https://docs.getdbt.com/docs/introduction).
148
-
149
-
Additional configuration for the adapter is described [here](https://github.com/silentsokolov/dbt-clickhouse#model-configuration).
141
+
- The plugin uses syntax that requires ClickHouse version 25.3 or newer. We do not test older versions of Clickhouse. We also do not currently test Replicated tables.
142
+
- Different runs of the `dbt-adapter` may collide if they are run at the same time as internally they can use the same table names for the same operations. For more information, check the issue [#420](https://github.com/ClickHouse/dbt-clickhouse/issues/420).
143
+
- The adapter currently materializes models as tables using an [INSERT INTO SELECT](https://clickhouse.com/docs/sql-reference/statements/insert-into#inserting-the-results-of-select). This effectively means data duplication if the run is executed again. Very large datasets (PB) can result in extremely long run times, making some models unviable. To improve performance, use ClickHouse Materialized Views by implementing the view as `materialized: materialization_view`. Additionally, aim to minimize the number of rows returned by any query by utilizing `GROUP BY` where possible. Prefer models that summarize data over those that simply transform while maintaining row counts of the source.
144
+
- To use Distributed tables to represent a model, users must create the underlying replicated tables on each node manually. The Distributed table can, in turn, be created on top of these. The adapter does not manage cluster creation.
145
+
- When dbt creates a relation (table/view) in a database, it usually creates it as: `{{ database }}.{{ schema }}.{{ table/view id }}`. ClickHouse has no notion of schemas. The adapter therefore uses `{{schema}}.{{ table/view id }}`, where `schema` is the ClickHouse database.
146
+
- Ephemeral models/CTEs don't work if placed before the `INSERT INTO` in a ClickHouse insert statement, see https://github.com/ClickHouse/ClickHouse/issues/30323. This should not affect most models, but care should be taken where an ephemeral model is placed in model definitions and other SQL statements. <!-- TODO review this limitation, looks like the issue was already closed and the fix was introduced in 24.10 -->
0 commit comments