CLOUDP-328217: Automation agent password secret #566

filipcirtog · 2025-10-31T21:15:12Z

Summary

If a deployment is moved to a different project, the automation agent password will be re-generated, triggering a password change in the automation plan.

This will cause a deadlock in a sharded cluster due to the multiple components requiring automation. However, it will not cause issues in replicasets.

This is a blocker for migrating projects in sharded deployments.

Proof of Work

For SCRAM (the only auth mechanism who re-generates a pwd), we now save the automation agent's password in a secret. During migration, the stored secret is utilized to preserve the password ,ensuring project migration possible.

Observed problems

For LDAP (Sharded + Replica), the following tests are failing, even though the only modification made is updating the MongoDB resource's project reference, with no other changes applied. To help further investigation, I have commented out certain code in the tests (which can make them fail) so the issue can be consistently reproduced. While the deployment returns to the "running" state, the users are missing from the automation configuration.

Checklist

Have you linked a jira ticket and/or is the ticket in the title?
Have you checked whether your jira ticket required DOCSP changes?
Have you added changelog file?
- use skip-changelog label if not needed
- refer to Changelog files and Release Notes section in CONTRIBUTING.md for more details

github-actions · 2025-10-31T21:16:23Z

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.6.0 Release Notes

New Features

MongoDBCommunity: Added support to configure custom cluster domain via newly introduced spec.clusterDomain resource field. If spec.clusterDomain is not set, environment variable CLUSTER_DOMAIN is used as cluster domain. If the environment variable CLUSTER_DOMAIN is also not set, operator falls back to cluster.local as default cluster domain.
Helm Chart: Introduced two new helm fields operator.podSecurityContext and operator.securityContext that can be used to configure securityContext for Operator deployment through Helm Chart.
MongoDBSearch:
- Switched to gRPC and mTLS for internal communication between mongod and mongot.
  - Since MCK 1.4 the mongod and mongot processess communicated using the MongoDB Wire Protocol and used keyfile authentication. This release switches that to gRPC with mTLS authentication. gRPC will allow for load-balancing search queries against multiple mongot processes in the future, and mTLS decouples the internal cluster authentication mode and credentials among mongod processes from the connection to the mongot process. The Operator will automatically enable gRPC for existing and new workloads, and will enable mTLS authentication if both Database Server and MongoDBSearch resource are configured for TLS.
- Exposed configuration settings for mongot's prometheus metrics endpoint.
  - By default, if spec.prometheus field is not provided then metrics endpoint in mongot is disabled. This is a breaking change. Previously the metrics endpoing was always enabled on port 9946.
  - To enable prometheus metrics endpoint specify empty spec.prometheus: field. It will enable metrics endpoint on a default port (9946). To change the port, set it in spec.prometheus.port field.
- Simplified MongoDB Search setup: Removed the custom Search Coordinator polyfill (a piece of compatibility code previously needed to add the required permissions), as MongoDB 8.2.0 and later now include the necessary permissions via the built-in searchCoordinator role.
- Updated the default mongodb/mongodb-search image version to 0.55.0. This is the version MCK uses if .spec.version is not specified.
- MongoDB deployments using X509 internal cluster authentication are now supported. Previously MongoDB Search required SCRAM authentication among members of a MongoDB replica set. Note: SCRAM client authentication is still required, this change merely relaxes the requirements on internal cluster authentication.

Bug Fixes

Fixed parsing of the customEnvVars Helm value when values contain = characters.
ReplicaSet: Blocked disabling TLS and changing member count simultaneously. These operations must now be applied separately to prevent configuration inconsistencies.
MongoDBSearch now records the reconciled mongot version in status and exposes it via a dedicated kubectl print column.
Fixed inability to specify cluster-wide privileges in custom roles.

Other Changes

kubectl-mongodb plugin: cosign, the signing tool that is used to sign kubectl-mongodb plugin binaries, has been updated to version 3.0.2. With this change, released binaries will be bundled with .bundle files containing both signature and certificate information. For more information on how to verify signatures using new cosign version please refer to -> https://github.com/sigstore/cosign/blob/v3.0.2/doc/cosign_verify-blob.md

lsierant · 2025-11-07T10:21:09Z

controllers/operator/authentication/authentication.go

 // Configure will configure all the specified authentication Mechanisms. We need to ensure we wait for
 // the agents to reach ready state after each operation as prematurely updating the automation config can cause the agents to get stuck.
-func Configure(conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error {
+func Configure(client kubernetesClient.Client, ctx context.Context, mdbNamespacedName *types.NamespacedName, conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error {


nit: ctx should always be first arg

done! thank you

lsierant · 2025-11-07T10:25:13Z

controllers/operator/authentication/authentication.go

 // Disable disables all authentication mechanisms, and waits for the agents to reach goal state. It is still required to provide
 // automation agent username, password and keyfile contents to ensure a valid Automation Config.
-func Disable(conn om.Connection, opts Options, deleteUsers bool, log *zap.SugaredLogger) error {
+func Disable(client kubernetesClient.Client, ctx context.Context, mdbNamespacedName *types.NamespacedName, conn om.Connection, opts Options, deleteUsers bool, log *zap.SugaredLogger) error {


types.NamespacedName is usually not passed by pointer. Is there a reason it's a pointer here? Is passing nil here a valid case?

no it is not a case in here. I have now changed that. thank you!

lsierant · 2025-11-07T10:26:56Z

...sts/tests/authentication/fixtures/switch-project/replica-set-scram-sha-1-switch-project.yaml

+    authentication:
+      agents:
+        # This may look weird, but without it we'll get this from OpsManager:
+        # Cannot configure SCRAM-SHA-1 without using MONGODB-CR in te Agent Mode","reason":"Cannot configure SCRAM-SHA-1 without using MONGODB-CR in te Agent Mode


could you pls fix the typo in "te" in the mentioned validation btw.?

sure! i have done that!

lsierant · 2025-11-07T10:27:28Z

...sts/tests/authentication/fixtures/switch-project/replica-set-scram-sha-1-switch-project.yaml

+  security:
+    authentication:
+      agents:
+        # This may look weird, but without it we'll get this from OpsManager:


nit: remove "This may look weird" - let's state the why objectively.

btw. SCRAM-SHA-1 is deprecated and it requires some additional legacy enablement with that MONGODB-CR IIRC

lsierant · 2025-11-07T10:30:08Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_ldap_switch_project.py

+    server_certs: str,
+    namespace: str,
+) -> MongoDB:
+    resource = MongoDB.from_yaml(find_fixture(f"switch-project/{MDB_FIXTURE_NAME}.yaml"), namespace=namespace)


could we use some basic fixture to not create redundant yamls? I see you're configuring all the security and auth in the test anyway

I have now utilized the existing fixtures from the auth module, with modifying only their names. would this be ok?

lsierant · 2025-11-07T10:31:42Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+
+    Ensures test isolation in a multi-namespace test environment.
+    """
+    return random_k8s_name(f"{namespace}-project-")


we don't need to randomize it - namespace is randomized in evg anyway. And randomizing it is just making local runs difficult and not possible to re-run.

if you want to have different project names for different deployments then just add a resource name to it: {namespace}-{mdb.name} - it will suffice for making them unique for the test

done! thank you!

lsierant · 2025-11-07T10:32:58Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+
+
+@pytest.fixture(scope="module")
+def replica_set(namespace: str) -> MongoDB:


resource fixture should be scoped to function and have if try_load(resource) applied (look for it in other tests).

We don't have this pattern applied across the board, but we try to use it for newer tests.

lsierant · 2025-11-07T10:35:29Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+        tester.assert_expected_users(0)
+
+    def test_create_secret(self):
+        print(f"creating password for MongoDBUser {self.USER_NAME} in secret/{self.PASSWORD_SECRET_NAME} ")


is this necessary? normally we have the progress visible when running the test, i.e. which test step is currently executing. There is nothing more than that so I think it's redundant

it was indeed redundant! I have now changed that! thank you!

lsierant · 2025-11-07T10:36:25Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+            },
+        )
+
+        replica_set.load()


when you use the pattern with function scope and try_load, you don't need to load it manually in the tests avoiding flakiness errors due to stale object

sure! now everything uses try_load. thank you!

lsierant · 2025-11-07T10:36:58Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+
+        replica_set.load()
+        replica_set["spec"]["opsManager"]["configMapRef"]["name"] = new_project_configmap
+        replica_set.set_version(custom_mdb_version)


the version should be only set in the resource's fixture unless the changing version is part of the test.

done! thank you!

lsierant · 2025-11-07T10:37:28Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+
+    def test_moved_replica_set_connectivity(self):
+        """
+        Verify connectivity to the replica set after switching projects.


the comment is redundant, just name the function name so it's self describing (it's already good)

I have deleted the redundant comments. thank you!

lsierant · 2025-11-07T10:38:19Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+
+    def test_ops_manager_state_correctly_updated_in_moved_replica_set(self, replica_set: MongoDB):
+        """
+        Ensure Ops Manager state is correctly updated in the moved replica set after the project switch.


this comment also is not adding much over the already good function name

same as above. deleted! than you

lsierant · 2025-11-07T10:39:22Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+MDB_RESOURCE_NAME = "replica-set-scram-sha-1-switch-project"
+MDB_FIXTURE_NAME = MDB_RESOURCE_NAME
+
+CONFIG_MAP_KEYS = {


is this necessary? Those keys won't be ever changed so we can just inline them.

i have inlined them. thanks

lsierant · 2025-11-07T10:46:33Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_x509_switch_project.py

+
+
+@pytest.mark.e2e_replica_set_x509_switch_project
+class TestReplicaSetCreationAndProjectSwitch(KubernetesTester):


are the three or four test classes here any different? Do you think we could extract it as a reusable test class parametrized with resource and auth mechanism?

The way to do this is to have a generic test helper (important: without the pytest.mark annotation!)

class ReplicaSetCreationAndProjectSwitchTestHelper: def test_create_replica_set def test_ops_manager_state_correctly_updated_in_initial_replica_set [...]

And in the actual test files we could have only test functions that simply delegate to the common code:

@pytest.fixture def test_helper() -> ReplicaSetCreationAndProjectSwitchTestHelper: ... configure and return test helper @pytest.mark.e2e_replica_set_x509_switch_project def test_create_replica_set(test_helper: ReplicaSetCreationAndProjectSwitchTestHelper): test_helper.test_create_replica_set() ... etc.

I'm a bit worried that we've added just too much of duplicated code. We've already have a code duplication problem - let's try to not exacerbate it further.

example of using test helper to reduce duplication:

mongodb-kubernetes/docker/mongodb-kubernetes-tests/tests/search/search_enterprise_tls.py

Line 216 in 78e9a13

def test_search_restore_sample_database(mdb: MongoDB):

unfortunately for now we cannot reuse easily whole test classes with the testing steps. The test functions/classes must be defined in each file separately, but we can organize the code to minimize the duplication

I've created ReplicaSetCreationAndProjectSwitchTestHelper and ShardedClusterCreationAndProjectSwitchTestHelper, to enable code reuse across the tests. Thank you!

lsierant · 2025-11-07T10:50:09Z

...ongodb-kubernetes-tests/tests/authentication/sharded_cluster_scram_sha_256_switch_project.py

+    # def test_create_secret(self):
+    #     print(f"creating password for MongoDBUser {self.USER_NAME} in secret/{self.PASSWORD_SECRET_NAME} ")
+
+    #     create_or_update_secret(


either remove or uncomment commented code; if necessary to leave it - explain why is commented

The commented tests are theoretically expected to work in scenarios where moving deployments across projects is supported. For instance, in the case of LDAP, the cluster does reach a running state but the automation config ends up with an empty users array.
If more investigation will be done in this topic, I believe the code will be useful for reproducing the issues

lsierant

Awesome you've added so many e2e tests! But let's try to think how we could minimize the code duplication in there. It looks like all the tests are almost identical.

lucian-tosa · 2025-11-12T15:51:32Z

controllers/om/automation_config.go

+const (
+	autoPwdSecretKey = "automation-agent-password"
+)


This could be moved to our constants.go file

lucian-tosa · 2025-11-12T16:09:35Z

controllers/om/automation_config.go

+			password = ac.Auth.AutoPwd
+		}
+
+		err := EnsureEmptySecret(ctx, k8sClient, secretNamespacedName)


Why create an empty secret then update it? It can be done in one go when the password variable is set

lucian-tosa · 2025-11-12T16:11:06Z

controllers/operator/authentication/authentication.go

 // Configure will configure all the specified authentication Mechanisms. We need to ensure we wait for
 // the agents to reach ready state after each operation as prematurely updating the automation config can cause the agents to get stuck.
-func Configure(conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error {
+func Configure(ctx context.Context, client kubernetesClient.Client, mdbNamespacedName types.NamespacedName, conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error {


nit: the Options struct could be reused to include the name of the resource

lucian-tosa · 2025-11-12T16:47:08Z

docker/mongodb-kubernetes-tests/tests/authentication/helper_sharded_cluster_switch_project.py

+    def test_switch_sharded_cluster_project(self):
+        original_configmap = read_configmap(namespace=self.namespace, name="my-project")
+        new_project_name = f"{self.namespace}-second"
+
+        new_project_configmap = create_or_update_configmap(
+            namespace=self.namespace,
+            name=new_project_name,
+            data={
+                "baseUrl": original_configmap["baseUrl"],
+                "projectName": new_project_name,
+                "orgId": original_configmap["orgId"],
+            },
+        )
+
+        self.sharded_cluster["spec"]["opsManager"]["configMapRef"]["name"] = new_project_configmap
+        self.sharded_cluster.update()
+        self.sharded_cluster.assert_reaches_phase(Phase.Running, timeout=800)


nit: the replicaset and sharded helpers have a few methods in common such as this one

lucian-tosa · 2025-11-12T16:49:48Z

docker/mongodb-kubernetes-tests/tests/authentication/replica_set_scram_sha_1_switch_project.py

+
+
+@pytest.mark.e2e_replica_set_scram_sha_1_switch_project
+class TestReplicaSetCreationAndProjectSwitch(KubernetesTester):


as we talked on the call, please add a method to assert that the automation agent password in the AC has not changed after migrating projects

lucian-tosa · 2025-11-12T17:02:03Z

...ongodb-kubernetes-tests/tests/authentication/sharded_cluster_scram_sha_256_switch_project.py

+    ):
+        test_helper.test_ops_manager_state_with_expected_authentication(expected_users=1)
+
+    def test_ops_manager_state_with_users_correctly_updated_after_switch(


Make sure that this is not flaky. If you suspect it is, then disable it, and add a comment

lucian-tosa · 2025-11-12T17:03:13Z

.evergreen.yml

+      - e2e_replica_set_x509_switch_project
+      - e2e_replica_set_ldap_switch_project
+      - e2e_sharded_cluster_ldap_switch_project


Since these don't use the password secret, better to disable them. For now, they only test whether project migrations work, but we don't fully support that yet.

feature: create secret for agent password

58f831d

filipcirtog mentioned this pull request Oct 31, 2025

CLOUDP-328217: Automation agent password secret #560

Closed

3 tasks

filipcirtog added 2 commits November 3, 2025 09:28

improvments

219e880

tests + lint

c0a80c1

filipcirtog requested review from Julien-Ben, lsierant and lucian-tosa November 4, 2025 14:04

filipcirtog marked this pull request as ready for review November 4, 2025 14:05

filipcirtog requested a review from a team as a code owner November 4, 2025 14:05

filipcirtog added 2 commits November 6, 2025 10:47

tests: refactoring+improvements

5db3adb

lint

863e347

lsierant reviewed Nov 7, 2025

View reviewed changes

lsierant requested changes Nov 7, 2025

View reviewed changes

filipcirtog added 2 commits November 10, 2025 11:09

remove new fixtures and use existing ones instead

a913281

add replicaset helper class

4f00574

filipcirtog added 2 commits November 11, 2025 10:51

refactoring helper classes

7032528

arg nit: ctx before client

6d572dc

lucian-tosa requested changes Nov 12, 2025

View reviewed changes



		@pytest.fixture(scope="module")
		def replica_set(namespace: str) -> MongoDB:



		@pytest.mark.e2e_replica_set_x509_switch_project
		class TestReplicaSetCreationAndProjectSwitch(KubernetesTester):



		@pytest.mark.e2e_replica_set_scram_sha_1_switch_project
		class TestReplicaSetCreationAndProjectSwitch(KubernetesTester):

CLOUDP-328217: Automation agent password secret #566

Are you sure you want to change the base?

CLOUDP-328217: Automation agent password secret #566

Uh oh!

Conversation

filipcirtog commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Proof of Work

Observed problems

Checklist

Uh oh!

github-actions bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MCK 1.6.0 Release Notes

New Features

Bug Fixes

Other Changes

Uh oh!

lsierant Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lsierant Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lsierant Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lsierant Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

filipcirtog commented Oct 31, 2025 •

edited

Loading

github-actions bot commented Oct 31, 2025 •

edited

Loading

lsierant Nov 7, 2025 •

edited

Loading

lsierant Nov 7, 2025 •

edited

Loading

lsierant Nov 7, 2025 •

edited

Loading

lsierant Nov 7, 2025 •

edited

Loading

filipcirtog Nov 11, 2025 •

edited

Loading