-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat(backend): postgres integration #12379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @kaikaila. Thanks for your PR. I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
🚫 This command cannot be processed. Only organization members or owners can use the commands. |
cd1d08b to
85498ed
Compare
|
Currently, both MySQL and PGX setups use the DB superuser for all KFP operations, which is why client_manager.go contains a “create database if not exist” step here. From a security standpoint, would it be preferable to:
If the team agrees, I can propose a follow-up PR to refactor accordingly. |
|
I'm fine with this, I don't think it's great that KFP tries to create a database (or a bucket frankly) fyi @mprahl / @droctothorpe |
|
Thanks, @HumairAK — totally agree on the security point. |
09fd370 to
1e0caa8
Compare
|
yes that is fine |
4d33821 to
e6c943c
Compare
Question about the PostgreSQL test workflow organizationCurrent situationThe V2 integration tests for PostgreSQL logically belong in a "PostgreSQL counterpart" to legacy-v2-api-integration-tests.yml Question: What's the recommended workflow organization for PostgreSQL tests?Should I:
Would love guidance on the long-term vision for test workflow organization, especially from @nsingla |
0948a18 to
591d9d6
Compare
591d9d6 to
5dd2a8a
Compare
5dd2a8a to
7edde91
Compare
| @@ -0,0 +1,171 @@ | |||
| // Copyright 2025 The Kubeflow Authors | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought - Postgres uses $1,$2... while MySQL uses ?. When query fragments are built separately and concatenated, placeholder indices can be wrong, causing runtime errors or misbound parameters.
Suggestion is to add a check: add unit tests that build complex queries from fragments (including empty optional fragments and IN-lists) and assert both final SQL and the parameter slice order for both dialects.
5a4cefc to
f68e26c
Compare
|
Just a note I have been working on similar functionality for Postgres support. I have modified the code so it no longer uses Squirrel and SQL generator but uses native GORM through out, this allows for CamelCase support in Table names in many cases has resulted in significantly simpler queries on the DB. By maintaining CamelCase scenarios where reflection is used to generate predicates functions as is. All test cases are passing and would be great if we could look at potentially merging with the work that has been done here. Through the use of native GORM should also be easier to add support for other databases if required. Currently inline with master branch. |
adbfbdb to
1a21b29
Compare
|
Thanks for your comment. There are two main reasons why I decided not to use GORM for the store package:
Because complex queries are the dominant pattern here, I prioritized consistency of implementation — instead of mixing approaches, all queries in the store package are written using Squirrel, a transparent SQL builder that allows precise control, easy unit testing, and clear debugging. That said, I'm open to evaluate where GORM provides advantages in practice. You mentioned that you have been working on similar functionality using native GORM which passed all tests. Could you please share:
I’d be happy to review them and consider merging our efforts. |
Signed-off-by: kaikaila <[email protected]>
Signed-off-by: kaikaila <[email protected]>
…iveExperiment - Extract repeated subquery SQL into resourceReferenceSubquery variable - Unify code style: consistently use SetMap() throughout - Add detailed comments explaining PostgreSQL $N placeholder handling - Simplify error messages optimization according to sanchesoon's suggestion Signed-off-by: kaikaila <[email protected]>
1a21b29 to
2362839
Compare
| image_registry: ${{ needs.build.outputs.IMAGE_REGISTRY }} | ||
| forward_port: "true" | ||
| - name: Port-forward Postgres | ||
| run: kubectl -n "$POSTGRES_NAMESPACE" port-forward svc/postgres-service ${{ env.DB_PORT }}:${{ env.DB_PORT }} --address=${{ env.DB_FORWARD_IP }} & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why you had to do -address=${{ env.DB_FORWARD_IP }}, by default it is forwarded to localhost which typically resolves to 127.0.0.1, could you not have omitted this and just used localhost? this is what we do for all other port forwards
| @@ -0,0 +1,111 @@ | |||
| name: API Server Tests - Postgres | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like us to explore whether we can leverage the existing api-server-tests.yml, this would allow us to continue to focus api tests in one workflow. Can we also use the ./.github/actions/test-and-report action like the other jobs in api-server-tests.yml, this way we can continue to have good test reports generated for the postgresql case.
If we need to ignore upgrade tests, we can update the action to take in optional --label-filter values
cc @nsingla
| @@ -0,0 +1,94 @@ | |||
| name: KFP API Integration v1 tests - Postgres | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments as for api-server-test-Postgres.yml, my preference is to consolidate into the api-serer-tests.yml workflow
cc @nsingla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| kubectl -n kubeflow wait --for condition=Available --timeout=10m deployment/mysql | ||
| kubectl -n kubeflow wait --for condition=Available --timeout=3m deployment/metadata-grpc-deployment | ||
|
|
||
| .PHONY: dev-kind-cluster-pg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be more intuitive for the user if we just updated dev-kind-cluster target to accept a DATABASE env var (kind of like CONTAINER_ENGINE) and allow us to specify mysql/postgres, and we use mysql by default.
We should do the same for kind-cluster-agnostic target, and update the docs accordingly.
So then I just need to do:
# For dev
make -C backend dev-kind-cluster DATABASE=postgres
# For users
make -C backend kind-cluster-agnostic DATABASE=postgresThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HumairAK Thanks for the suggestion! I agree that dev-kind-clusterwithDATABASE parameter is better than a separate dev-kind-cluster-pg target.
On the other hand, I'd like to clarify the design intent for kind-cluster-agnostic before adding the parameter.
Since kind-cluster-agnostic appears in frontend/README.md but not in backend/README.md, I assume it's intended as a black-box backend for frontend development.
If that's the case, should we expose backend configuration parameters like DATABASE? Currently we don't expose other backend configurations e.g. object store, cache, MLMD version, etc.
3 Possible Approaches
Option 1: No configuration parameters - ⭐️My favorite
- Frontend developers just run:
make -C backend kind-cluster-agnostic - Always uses sensible defaults to simplify user experience.
Option 2: Expose all backend configuration parameters
- Add
DATABASE,OBJECT_STORE,CACHE_BACKEND, etc. - If we expose one, we should expose all for consistency.
Option 3: Expose only DATABASE parameter
- Question: Why is
DATABASEspecial compared to other backend configs?
That said, I'm happy to implement whichever approach you think best serves the user experience. What's your preference?
| subsets: | ||
| - addresses: | ||
| - ip: 172.17.0.1 # docker0 bridge ip | ||
| - ip: 192.168.65.254 # host.docker.internal IPv4 (for Mac Docker Desktop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for this change? doesn't this break it for linux machines? same question for: manifests/kustomize/env/dev-kind-postgresql/forward-local-api-endpoint.yaml
if we want to offer support for docker desktop, then we can split into a separate overlay?
| query = query.Where( | ||
| sq.Eq{ | ||
| "pipelines.Namespace": filterContext.ReferenceKey.ID, | ||
| fmt.Sprintf("%s.%s", q("pipelines"), q("Namespace")): filterContext.ID, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to change filterContext.ReferenceKey.ID -> filterContext.ID ?
| opts.SetQuote(s.dbDialect.QuoteIdentifier) | ||
| q := s.dbDialect.QuoteIdentifier | ||
| qb := s.dbDialect.QueryBuilder() | ||
| subQuery := qb.Select("t1.pvid, t1.pid").FromSelect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we need additional quotes for the identifiers like t1.pvid / rn / etc.?
| errorF := func(err error) ([]*model.Run, int, string, error) { | ||
| return nil, 0, "", util.NewInternalServerError(err, "Failed to list runs: %v", err) | ||
| } | ||
| opts.SetQuote(s.dbDialect.QuoteIdentifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it feels strange to set this here, this is something we need to set for opts based on the dialect for every case right? not just runs? maybe this should be set when we first initialize opts, what do you think?
| // Add a metric as a new field to the select clause by join the passed-in SQL query with run_metrics table. | ||
| // With the metric as a field in the select clause enable sorting on this metric afterwards. | ||
| // TODO(jingzhang36): example of resulting SQL query and explanation for it. | ||
| func (s *RunStore) addSortByRunMetricToSelect(sqlBuilder sq.SelectBuilder, opts *list.Options) sq.SelectBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per your earlier comment, this is no longer used, so we can remove this yes?
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1. dialect.go: fmt.Sprintf for sql string 2. merge review diff from Humair Signed-off-by: kaikaila <[email protected]>
cbac1ef to
35d877a
Compare
Summary
This PR adds full PostgreSQL (pgx driver) support to Kubeflow Pipelines backend, enabling users to choose between MySQL and PostgreSQL as the metadata database. The implementation introduces a clean dialect abstraction layer and includes a major query optimization that benefits both database backends.
Key achievements
✅ Complete PostgreSQL integration for API Server and Cache Server, addressing #7512, #9813
✅ All CI tests passing (MySQL + PostgreSQL).
✅ Significant performance improvement for ListRuns queries. This PR is expected to address the root causes behind #10778, #10230, #9780, #9701
✅ Zero breaking changes - backward compatible with existing MySQL deployments
What Changed
Problem
SQL syntax was tightly coupled to MySQL.
Solution
Introduced a DBDialect interface that encapsulates database-specific behavior
Identifier quoting (MySQL backticks vs PostgreSQL double quotes)
Placeholder styles (? vs $1, $2, ...)
Aggregation functions (GROUP_CONCAT vs string_agg)
Concatenation syntax (CONCAT() vs ||)
Files
backend/src/apiserver/common/sql/dialect/dialect.gobackend/src/apiserver/storage/sql_dialect_util.gobackend/src/apiserver/storage/list_filters.goAll storage layer code now uses
This ensures queries work correctly across MySQL, PostgreSQL, and SQLite (for tests).
The original ListRuns query called
addMetricsResourceReferencesAndTaskswhich performed a 3-layer LEFT JOIN with GROUP BY on all columns, includingLONGTEXTfields likePipelineSpecManifestWorkflowSpecManifestetc. This caused slow response times for large datasets.Layers 1-3: LEFT JOIN only on PrimaryKey
UUID+ aggregated columns (refs, tasks, metrics)Final layer: INNER JOIN back to run_details to fetch
LONGTEXTcolumnsEliminates GROUP BY on LONGTEXT columns entirely. Expected substantial performance improvements for deployments with large pipeline specifications, though formal load testing has not yet been conducted.
manifests/kustomize/env/platform-agnostic-postgresql/manifests/kustomize/env/dev-kind-postgresql/manifests/kustomize/third-party/postgresql/Configuration is symmetric to existing MySQL manifests for consistency.
Created CI-specific Kustomize overlays to ensure tests use locally built images from the Kind registry instead of pulling official images from ghcr.io:
.github/resources/manifests/standalone/postgresql/kfp-cache-serverimage override to.github/resources/manifests/standalone/base/kustomization.yamlapi-server-test-Postgres.ymlintegration-tests-v1-postgres.ymlPostgreSQL tests cover the core cache enabled/disabled matrix.
Testing
Unit Tests
23 test files modified/added
New test coverage: dialect_test.go, list_filters_test.go, sql_dialect_util_test.go
All existing tests updated to use dialect abstraction
Integration Tests
✅ V1 API integration tests (PostgreSQL)
✅ V2 API integration tests (PostgreSQL, cache enabled/disabled)
✅ Existing MySQL tests remain green
Migration Guide
kubectl apply -k manifests/kustomize/env/platform-agnostic-postgresqlNo action required. This PR is fully backward compatible.
make -C backend dev-kind-cluster-pgThis PR continues from #12063.