Skip to content

Conversation

lllamnyp
Copy link
Member

@lllamnyp lllamnyp commented Apr 18, 2025

  • We create several etcd clients in the reconciliation loop, but never had a defer .Close() for them.
  • We erroneously try to configure these clients with etcd-0 instead of etcd-0.etcd-headless.namespace as the url.

Now fixed.

Summary by CodeRabbit

  • Bug Fixes

    • Improved resource cleanup by ensuring all etcd client connections are properly closed, preventing potential resource leaks.
  • New Features

    • Added support for TLS client authentication using certificates and CA secrets for enhanced security.
    • Set a default connection timeout to improve reliability.
  • Refactor

    • Updated endpoint addressing to use fully qualified domain names for improved accuracy and consistency.

* We create several etcd clients in the reconciliation loop, but never
  had a defer .Close() for them.
* We erroneously try to configure these clients with etcd-0 instead of
  etcd-0.etcd-headless.namespace as the url.

Now fixed.

Signed-off-by: Timofei Larkin <[email protected]>
Copy link
Contributor

coderabbitai bot commented Apr 18, 2025

Walkthrough

The changes introduce improved resource management and endpoint formatting in the etcd client handling logic. In the etcd cluster controller, a deferred cleanup ensures that all etcd client connections are closed at the end of the reconciliation process. Additionally, the etcd client factory now closes the cluster client if single-endpoint client creation fails, and it updates endpoint URLs to use a fully qualified DNS format, enhancing clarity and correctness in endpoint addressing. TLS client authentication is supported by loading client certificates and server CA certificates from Kubernetes secrets when security is enabled.

Changes

File(s) Change Summary
internal/controller/etcdcluster_controller.go Added a deferred cleanup function in the Reconcile method to close all etcd clients before returning; simplified deferred closure in configureAuth; added TODO and nolint to updateStatusOnErr.
internal/controller/factory/etcd_client.go Added TLS client authentication support by loading certs from secrets; updated endpoint URLs to use full DNS format; ensured cluster client is closed on single client creation failure; introduced helper functions to parse TLS secrets.

Poem

A cluster of clients, now tidy and neat,
With cleanup deferred, no leaks to defeat.
Endpoints addressed with DNS flair,
Each connection closed with utmost care.
Certs loaded safe from secrets deep,
This rabbit’s code leaps bounds in a sweep!
🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (1.64.8)

level=warning msg="The linter 'exportloopref' is deprecated (since v1.60.2) due to: Since Go1.22 (loopvar) this linter is no longer relevant. Replaced by copyloopvar."
level=error msg="[linters_context] exportloopref: This linter is fully inactivated: it will not produce any reports."
{"Issues":[],"Report":{"Warnings":[{"Text":"The linter 'exportloopref' is deprecated (since v1.60.2) due to: Since Go1.22 (loopvar) this linter is no longer relevant. Replaced by copyloopvar."}],"Linters":[{"Name":"asasalint"},{"Name":"asciicheck"},{"Name":"bidichk"},{"Name":"bodyclose"},{"Name":"canonicalheader"},{"Name":"containedctx"},{"Name":"contextcheck"},{"Name":"copyloopvar"},{"Name":"cyclop"},{"Name":"decorder"},{"Name":"deadcode"},{"Name":"depguard"},{"Name":"dogsled"},{"Name":"dupl","Enabled":true},{"Name":"dupword"},{"Name":"durationcheck"},{"Name":"errcheck","Enabled":true,"EnabledByDefault":true},{"Name":"errchkjson"},{"Name":"errname"},{"Name":"errorlint"},{"Name":"execinquery"},{"Name":"exhaustive"},{"Name":"exhaustivestruct"},{"Name":"exhaustruct"},{"Name":"exportloopref","Enabled":true},{"Name":"exptostd"},{"Name":"forbidigo"},{"Name":"forcetypeassert"},{"Name":"fatcontext"},{"Name":"funlen"},{"Name":"gci"},{"Name":"ginkgolinter"},{"Name":"gocheckcompilerdirectives"},{"Name":"gochecknoglobals"},{"Name":"gochecknoinits"},{"Name":"gochecksumtype"},{"Name":"gocognit"},{"Name":"goconst","Enabled":true},{"Name":"gocritic"},{"Name":"gocyclo","Enabled":true},{"Name":"godot"},{"Name":"godox"},{"Name":"err113"},{"Name":"gofmt","Enabled":true},{"Name":"gofumpt"},{"Name":"goheader"},{"Name":"goimports","Enabled":true},{"Name":"golint"},{"Name":"mnd"},{"Name":"gomnd"},{"Name":"gomoddirectives"},{"Name":"gomodguard"},{"Name":"goprintffuncname"},{"Name":"gosec"},{"Name":"gosimple","Enabled":true,"EnabledByDefault":true},{"Name":"gosmopolitan"},{"Name":"govet","Enabled":true,"EnabledByDefault":true},{"Name":"grouper"},{"Name":"ifshort"},{"Name":"iface"},{"Name":"importas"},{"Name":"inamedparam"},{"Name":"ineffassign","Enabled":true,"EnabledByDefault":true},{"Name":"interfacebloat"},{"Name":"interfacer"},{"Name":"intrange"},{"Name":"ireturn"},{"Name":"lll","Enabled":true},{"Name":"loggercheck"},{"Name":"maintidx"},{"Name":"makezero"},{"Name":"maligned"},{"Name":"mirror"},{"Name":"misspell","Enabled":true},{"Name":"musttag"},{"Name":"nakedret","Enabled":true},{"Name":"nestif"},{"Name":"nilerr"},{"Name":"nilnesserr"},{"Name":"nilnil"},{"Name":"nlreturn"},{"Name":"noctx"},{"Name":"nonamedreturns"},{"Name":"nosnakecase"},{"Name":"nosprintfhostport"},{"Name":"paralleltest"},{"Name":"perfsprint"},{"Name":"prealloc","Enabled":true},{"Name":"predeclared"},{"Name":"promlinter"},{"Name":"protogetter"},{"Name":"reassign"},{"Name":"recvcheck"},{"Name":"revive"},{"Name":"rowserrcheck"},{"Name":"sloglint"},{"Name":"scopelint"},{"Name":"sqlclosecheck"},{"Name":"spancheck"},{"Name":"staticcheck","Enabled":true,"EnabledByDefault":true},{"Name":"structcheck"},{"Name":"stylecheck"},{"Name":"tagalign"},{"Name":"tagliatelle"},{"Name":"tenv"},{"Name":"testableexamples"},{"Name":"testifylint"},{"Name":"testpackage"},{"Name":"thelper"},{"Name":"tparallel"},{"Name":"typecheck","Enabled":true,"EnabledByDefault":true},{"Name":"unconvert","Enabled":true},{"Name":"unparam","Enabled":true},{"Name":"unused","Enabled":true,"EnabledByDefault":true},{"Name":"usestdlibvars"},{"Name":"usetesting"},{"Name":"varcheck"},{"Name":"varnamelen"},{"Name":"wastedassign"},{"Name":"whitespace"},{"Name":"wrapcheck"},{"Name":"wsl"},{"Name":"zerologlint"},{"Name":"nolintlint"}],"Error":"exportloopref: This linter is fully inactivated: it will not produce any reports."}}

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
internal/controller/etcdcluster_controller.go (1)

114-123: Good cleanup implementation to prevent resource leaks

This deferred function properly cleans up all etcd client resources at the end of the reconciliation process. This ensures that both the cluster client and all single-endpoint clients are closed regardless of how the function returns, preventing memory leaks.

However, the return values from the Close() calls are not being checked.

Consider capturing and logging errors from the Close() calls, especially if they could indicate potential issues:

 defer func() {
     if clusterClient != nil {
-        clusterClient.Close()
+        if err := clusterClient.Close(); err != nil {
+            log.Debug(ctx, "error closing cluster client", "error", err)
+        }
     }
     for i := range singleClients {
         if singleClients[i] != nil {
-            singleClients[i].Close()
+            if err := singleClients[i].Close(); err != nil {
+                log.Debug(ctx, "error closing single client", "index", i, "error", err)
+            }
         }
     }
 }()
🧰 Tools
🪛 golangci-lint (1.64.8)

116-116: Error return value of clusterClient.Close is not checked

(errcheck)


120-120: Error return value of (*go.etcd.io/etcd/client/v3.Client).Close is not checked

(errcheck)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b6e4ea9 and 0716ee0.

📒 Files selected for processing (2)
  • internal/controller/etcdcluster_controller.go (1 hunks)
  • internal/controller/factory/etcd_client.go (2 hunks)
🧰 Additional context used
🪛 golangci-lint (1.64.8)
internal/controller/etcdcluster_controller.go

116-116: Error return value of clusterClient.Close is not checked

(errcheck)


120-120: Error return value of (*go.etcd.io/etcd/client/v3.Client).Close is not checked

(errcheck)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test on k8s latest version
  • GitHub Check: test on k8s previous version
  • GitHub Check: test-e2e on k8s latest version
  • GitHub Check: test on k8s penultimate version
  • GitHub Check: test-e2e on k8s penultimate version
  • GitHub Check: test-e2e on k8s previous version
  • GitHub Check: pre-commit
🔇 Additional comments (2)
internal/controller/factory/etcd_client.go (2)

32-32: Good fix for preventing resource leaks

Adding clusterClient.Close() ensures that if any single-endpoint client creation fails, the cluster client is properly closed. This prevents resource leaks, addressing one of the memory leak issues mentioned in the PR objectives.


60-60: Improved endpoint URL formatting with proper FQDN

This update properly constructs the endpoint URLs as fully qualified domain names using Kubernetes DNS naming conventions. This fixes the incorrect URL issue mentioned in the PR objectives, changing from simple hostnames to a format that includes the service name and namespace for proper resolution.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
internal/controller/factory/etcd_client.go (1)

119-132: CA certificate parsing function could improve error handling

The parseTLSSecretCA function is generally well-implemented, but the error handling for PEM decoding could be improved.

Consider enhancing the PEM decoding error handling:

-	block, _ := pem.Decode(caPemBytes)
+	block, rest := pem.Decode(caPemBytes)
 	if block == nil {
 		err = fmt.Errorf("failed to decode PEM bytes")
 		return
 	}
+	if len(rest) > 0 {
+		// Optional: Log or handle remaining data if there are multiple certificates
+	}

This change captures the remaining data and allows you to handle cases where the PEM might contain multiple certificates.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0716ee0 and 92c9f35.

📒 Files selected for processing (1)
  • internal/controller/factory/etcd_client.go (3 hunks)
🧰 Additional context used
🪛 golangci-lint (1.64.8)
internal/controller/factory/etcd_client.go

36-36: Error return value of clusterClient.Close is not checked

(errcheck)

🪛 ast-grep (0.31.1)
internal/controller/factory/etcd_client.go

[warning] 80-80: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: test-e2e on k8s penultimate version
  • GitHub Check: test-e2e on k8s latest version
  • GitHub Check: test-e2e on k8s previous version
  • GitHub Check: pre-commit
🔇 Additional comments (3)
internal/controller/factory/etcd_client.go (3)

64-64: Good improvement in endpoint URL formatting

The URL format has been correctly updated to use the fully qualified domain name, which addresses the issue with incorrect client URLs mentioned in the PR.


66-66: Good addition of dial timeout

Setting a default dial timeout of 5 seconds is a good practice to prevent clients from hanging indefinitely when connecting to unavailable endpoints.


103-117: Well-structured TLS certificate parsing function

The parseTLSSecret function is well implemented with proper error checking and clear error messages.

Signed-off-by: Timofei Larkin <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
internal/controller/factory/etcd_client.go (1)

69-99: 🛠️ Refactor suggestion

Missing TLS MinVersion in the configuration

The TLS configuration is missing a MinVersion setting, which could expose the connection to older, less secure TLS protocol versions.

Add the MinVersion setting to the TLS configuration:

 		etcdClient.TLS = &tls.Config{}
+		etcdClient.TLS.MinVersion = tls.VersionTLS13
 		etcdClient.TLS.Certificates = []tls.Certificate{cert}

This ensures that only TLS 1.3 or newer will be used, which offers better security.

🧰 Tools
🪛 ast-grep (0.31.1)

[warning] 81-81: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

🧹 Nitpick comments (1)
internal/controller/factory/etcd_client.go (1)

120-133: Improve error handling in parseTLSSecretCA

The function structure is good, but there's a missing error check after pem.Decode.

 	block, _ := pem.Decode(caPemBytes)
 	if block == nil {
 		err = fmt.Errorf("failed to decode PEM bytes")
 		return
 	}
+	if block.Type != "CERTIFICATE" {
+		err = fmt.Errorf("PEM block is not a certificate (type: %s)", block.Type)
+		return
+	}
 	ca, err = x509.ParseCertificate(block.Bytes)

This additional check ensures the PEM block is actually a certificate before attempting to parse it.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92c9f35 and d76c473.

📒 Files selected for processing (2)
  • internal/controller/etcdcluster_controller.go (3 hunks)
  • internal/controller/factory/etcd_client.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/etcdcluster_controller.go
🧰 Additional context used
🪛 ast-grep (0.31.1)
internal/controller/factory/etcd_client.go

[warning] 81-81: MinVersionis missing from this TLS configuration. By default, TLS 1.2 is currently used as the minimum when acting as a client, and TLS 1.0 when acting as a server. General purpose web applications should default to TLS 1.3 with all other protocols disabled. Only where it is known that a web server must support legacy clients with unsupported an insecure browsers (such as Internet Explorer 10), it may be necessary to enable TLS 1.0 to provide support. AddMinVersion: tls.VersionTLS13' to the TLS configuration to bump the minimum version to TLS 1.3.
Context: tls.Config{}
Note: [CWE-327]: Use of a Broken or Risky Cryptographic Algorithm [OWASP A03:2017]: Sensitive Data Exposure [OWASP A02:2021]: Cryptographic Failures [REFERENCES]
https://owasp.org/Top10/A02_2021-Cryptographic_Failures

(missing-ssl-minversion-go)

⏰ Context from checks skipped due to timeout of 90000ms (7)
  • GitHub Check: test-e2e on k8s previous version
  • GitHub Check: test-e2e on k8s latest version
  • GitHub Check: test-e2e on k8s penultimate version
  • GitHub Check: test on k8s latest version
  • GitHub Check: test on k8s penultimate version
  • GitHub Check: test on k8s previous version
  • GitHub Check: pre-commit
🔇 Additional comments (4)
internal/controller/factory/etcd_client.go (4)

36-37: Correctly handled client close for resource cleanup

Good to see the comment added to explain why the error is ignored. The nolint:errcheck directive is an acceptable approach here since the primary concern is resource cleanup rather than handling the error.

This change helps fix the memory leak issue mentioned in the PR objectives by ensuring the cluster client is closed if any single-endpoint client creation fails.


65-65: Correct URL format with fully qualified domain name

The endpoint URL now properly uses the fully qualified domain name format (name.headless-service.namespace.svc:port) instead of just the hostname. This addresses the connectivity issue mentioned in the PR objectives.


67-67: Added reasonable dial timeout

Adding a 5-second dial timeout is a good practice to prevent hanging connections when etcd nodes are unavailable.


104-118: Well-structured TLS secret parsing function

This helper function correctly extracts the TLS certificate and key from the Kubernetes secret. The error handling is comprehensive, with descriptive error messages.

@lllamnyp lllamnyp self-assigned this Apr 22, 2025
Copy link
Member

@kvaps kvaps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lllamnyp lllamnyp merged commit 4bc99b5 into main Apr 22, 2025
9 checks passed
@lllamnyp lllamnyp deleted the fix/address-and-gc branch April 22, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants