-
Notifications
You must be signed in to change notification settings - Fork 25
CLOUDP-328217: Automation agent password secret #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
CLOUDP-328217: Automation agent password secret #566
Conversation
MCK 1.6.0 Release NotesNew Features
Bug Fixes
Other Changes
|
| // Configure will configure all the specified authentication Mechanisms. We need to ensure we wait for | ||
| // the agents to reach ready state after each operation as prematurely updating the automation config can cause the agents to get stuck. | ||
| func Configure(conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error { | ||
| func Configure(client kubernetesClient.Client, ctx context.Context, mdbNamespacedName *types.NamespacedName, conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ctx should always be first arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! thank you
| // Disable disables all authentication mechanisms, and waits for the agents to reach goal state. It is still required to provide | ||
| // automation agent username, password and keyfile contents to ensure a valid Automation Config. | ||
| func Disable(conn om.Connection, opts Options, deleteUsers bool, log *zap.SugaredLogger) error { | ||
| func Disable(client kubernetesClient.Client, ctx context.Context, mdbNamespacedName *types.NamespacedName, conn om.Connection, opts Options, deleteUsers bool, log *zap.SugaredLogger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
types.NamespacedName is usually not passed by pointer. Is there a reason it's a pointer here? Is passing nil here a valid case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it is not a case in here. I have now changed that. thank you!
| authentication: | ||
| agents: | ||
| # This may look weird, but without it we'll get this from OpsManager: | ||
| # Cannot configure SCRAM-SHA-1 without using MONGODB-CR in te Agent Mode","reason":"Cannot configure SCRAM-SHA-1 without using MONGODB-CR in te Agent Mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you pls fix the typo in "te" in the mentioned validation btw.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! i have done that!
| security: | ||
| authentication: | ||
| agents: | ||
| # This may look weird, but without it we'll get this from OpsManager: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove "This may look weird" - let's state the why objectively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw. SCRAM-SHA-1 is deprecated and it requires some additional legacy enablement with that MONGODB-CR IIRC
| server_certs: str, | ||
| namespace: str, | ||
| ) -> MongoDB: | ||
| resource = MongoDB.from_yaml(find_fixture(f"switch-project/{MDB_FIXTURE_NAME}.yaml"), namespace=namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use some basic fixture to not create redundant yamls? I see you're configuring all the security and auth in the test anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now utilized the existing fixtures from the auth module, with modifying only their names. would this be ok?
| Ensures test isolation in a multi-namespace test environment. | ||
| """ | ||
| return random_k8s_name(f"{namespace}-project-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to randomize it - namespace is randomized in evg anyway. And randomizing it is just making local runs difficult and not possible to re-run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to have different project names for different deployments then just add a resource name to it: {namespace}-{mdb.name} - it will suffice for making them unique for the test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! thank you!
|
|
||
|
|
||
| @pytest.fixture(scope="module") | ||
| def replica_set(namespace: str) -> MongoDB: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource fixture should be scoped to function and have if try_load(resource) applied (look for it in other tests).
We don't have this pattern applied across the board, but we try to use it for newer tests.
| tester.assert_expected_users(0) | ||
|
|
||
| def test_create_secret(self): | ||
| print(f"creating password for MongoDBUser {self.USER_NAME} in secret/{self.PASSWORD_SECRET_NAME} ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? normally we have the progress visible when running the test, i.e. which test step is currently executing. There is nothing more than that so I think it's redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was indeed redundant! I have now changed that! thank you!
| }, | ||
| ) | ||
|
|
||
| replica_set.load() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when you use the pattern with function scope and try_load, you don't need to load it manually in the tests avoiding flakiness errors due to stale object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! now everything uses try_load. thank you!
|
|
||
| replica_set.load() | ||
| replica_set["spec"]["opsManager"]["configMapRef"]["name"] = new_project_configmap | ||
| replica_set.set_version(custom_mdb_version) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the version should be only set in the resource's fixture unless the changing version is part of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! thank you!
|
|
||
| def test_moved_replica_set_connectivity(self): | ||
| """ | ||
| Verify connectivity to the replica set after switching projects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is redundant, just name the function name so it's self describing (it's already good)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have deleted the redundant comments. thank you!
|
|
||
| def test_ops_manager_state_correctly_updated_in_moved_replica_set(self, replica_set: MongoDB): | ||
| """ | ||
| Ensure Ops Manager state is correctly updated in the moved replica set after the project switch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment also is not adding much over the already good function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above. deleted! than you
| MDB_RESOURCE_NAME = "replica-set-scram-sha-1-switch-project" | ||
| MDB_FIXTURE_NAME = MDB_RESOURCE_NAME | ||
|
|
||
| CONFIG_MAP_KEYS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this necessary? Those keys won't be ever changed so we can just inline them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have inlined them. thanks
|
|
||
|
|
||
| @pytest.mark.e2e_replica_set_x509_switch_project | ||
| class TestReplicaSetCreationAndProjectSwitch(KubernetesTester): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the three or four test classes here any different? Do you think we could extract it as a reusable test class parametrized with resource and auth mechanism?
The way to do this is to have a generic test helper (important: without the pytest.mark annotation!)
class ReplicaSetCreationAndProjectSwitchTestHelper:
def test_create_replica_set
def test_ops_manager_state_correctly_updated_in_initial_replica_set
[...]
And in the actual test files we could have only test functions that simply delegate to the common code:
@pytest.fixture
def test_helper() -> ReplicaSetCreationAndProjectSwitchTestHelper:
... configure and return test helper
@pytest.mark.e2e_replica_set_x509_switch_project
def test_create_replica_set(test_helper: ReplicaSetCreationAndProjectSwitchTestHelper):
test_helper.test_create_replica_set()
... etc.
I'm a bit worried that we've added just too much of duplicated code. We've already have a code duplication problem - let's try to not exacerbate it further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
example of using test helper to reduce duplication:
mongodb-kubernetes/docker/mongodb-kubernetes-tests/tests/search/search_enterprise_tls.py
Line 216 in 78e9a13
| def test_search_restore_sample_database(mdb: MongoDB): |
unfortunately for now we cannot reuse easily whole test classes with the testing steps. The test functions/classes must be defined in each file separately, but we can organize the code to minimize the duplication
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created ReplicaSetCreationAndProjectSwitchTestHelper and ShardedClusterCreationAndProjectSwitchTestHelper, to enable code reuse across the tests. Thank you!
| # def test_create_secret(self): | ||
| # print(f"creating password for MongoDBUser {self.USER_NAME} in secret/{self.PASSWORD_SECRET_NAME} ") | ||
|
|
||
| # create_or_update_secret( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either remove or uncomment commented code; if necessary to leave it - explain why is commented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commented tests are theoretically expected to work in scenarios where moving deployments across projects is supported. For instance, in the case of LDAP, the cluster does reach a running state but the automation config ends up with an empty users array.
If more investigation will be done in this topic, I believe the code will be useful for reproducing the issues
lsierant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome you've added so many e2e tests! But let's try to think how we could minimize the code duplication in there. It looks like all the tests are almost identical.
| const ( | ||
| autoPwdSecretKey = "automation-agent-password" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be moved to our constants.go file
| password = ac.Auth.AutoPwd | ||
| } | ||
|
|
||
| err := EnsureEmptySecret(ctx, k8sClient, secretNamespacedName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why create an empty secret then update it? It can be done in one go when the password variable is set
| // Configure will configure all the specified authentication Mechanisms. We need to ensure we wait for | ||
| // the agents to reach ready state after each operation as prematurely updating the automation config can cause the agents to get stuck. | ||
| func Configure(conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error { | ||
| func Configure(ctx context.Context, client kubernetesClient.Client, mdbNamespacedName types.NamespacedName, conn om.Connection, opts Options, isRecovering bool, log *zap.SugaredLogger) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the Options struct could be reused to include the name of the resource
| def test_switch_sharded_cluster_project(self): | ||
| original_configmap = read_configmap(namespace=self.namespace, name="my-project") | ||
| new_project_name = f"{self.namespace}-second" | ||
|
|
||
| new_project_configmap = create_or_update_configmap( | ||
| namespace=self.namespace, | ||
| name=new_project_name, | ||
| data={ | ||
| "baseUrl": original_configmap["baseUrl"], | ||
| "projectName": new_project_name, | ||
| "orgId": original_configmap["orgId"], | ||
| }, | ||
| ) | ||
|
|
||
| self.sharded_cluster["spec"]["opsManager"]["configMapRef"]["name"] = new_project_configmap | ||
| self.sharded_cluster.update() | ||
| self.sharded_cluster.assert_reaches_phase(Phase.Running, timeout=800) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: the replicaset and sharded helpers have a few methods in common such as this one
|
|
||
|
|
||
| @pytest.mark.e2e_replica_set_scram_sha_1_switch_project | ||
| class TestReplicaSetCreationAndProjectSwitch(KubernetesTester): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we talked on the call, please add a method to assert that the automation agent password in the AC has not changed after migrating projects
| ): | ||
| test_helper.test_ops_manager_state_with_expected_authentication(expected_users=1) | ||
|
|
||
| def test_ops_manager_state_with_users_correctly_updated_after_switch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure that this is not flaky. If you suspect it is, then disable it, and add a comment
| - e2e_replica_set_x509_switch_project | ||
| - e2e_replica_set_ldap_switch_project | ||
| - e2e_sharded_cluster_ldap_switch_project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these don't use the password secret, better to disable them. For now, they only test whether project migrations work, but we don't fully support that yet.
Summary
If a deployment is moved to a different project, the automation agent password will be re-generated, triggering a password change in the automation plan.
This will cause a deadlock in a sharded cluster due to the multiple components requiring automation. However, it will not cause issues in replicasets.
This is a blocker for migrating projects in sharded deployments.
Proof of Work
For SCRAM (the only auth mechanism who re-generates a pwd), we now save the automation agent's password in a secret. During migration, the stored secret is utilized to preserve the password ,ensuring project migration possible.
Observed problems
For LDAP (Sharded + Replica), the following tests are failing, even though the only modification made is updating the MongoDB resource's project reference, with no other changes applied. To help further investigation, I have commented out certain code in the tests (which can make them fail) so the issue can be consistently reproduced. While the deployment returns to the "running" state, the users are missing from the automation configuration.
Checklist
skip-changeloglabel if not needed