Skip to content

Conversation

VietND96
Copy link
Member

@VietND96 VietND96 commented Apr 2, 2025

User description

Thanks for contributing to Selenium!
A PR well described will help maintainers to quickly review and merge it

Before submitting your PR, please check our contributing guidelines.
Avoid large PRs, help reviewers by making them as simple and short as possible.

Motivation and Context

The javadoc of Guava's CacheBuilder recommends moving to Caffeine. So, besides concrete issues, it makes sense to switch.

https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/package-info.java#L16-L17

More details here: https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L49

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • I have read the contributing document.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

PR Type

Enhancement


Description

  • Migrate from Guava's CacheBuilder to Caffeine for improved performance

  • Add custom cache weigher with memory-aware sizing in GraphqlHandler

  • Implement proper resource cleanup with AutoCloseable interface

  • Update dependency versions (GraphQL, Netty, OpenTelemetry, Log4j)


Diagram Walkthrough

flowchart LR
  A["Guava CacheBuilder"] --> B["Caffeine Cache"]
  B --> C["Custom Weigher"]
  B --> D["Memory-aware Sizing"]
  B --> E["AutoCloseable Support"]
  F["Dependency Updates"] --> G["GraphQL 24.1"]
  F --> H["Netty 4.2.3"]
  F --> I["OpenTelemetry 1.52.0"]
Loading

File Walkthrough

Relevant files
Enhancement
GraphqlHandler.java
Migrate GraphQL handler to Caffeine cache                               

java/src/org/openqa/selenium/grid/graphql/GraphqlHandler.java

  • Replace Guava cache with Caffeine cache implementation
  • Add custom QueryCacheWeigher for memory-aware cache sizing
  • Implement AutoCloseable interface for proper resource cleanup
  • Add query size validation with bypass for oversized queries
+77/-20 
LocalNode.java
Update LocalNode caching to use Caffeine                                 

java/src/org/openqa/selenium/grid/node/local/LocalNode.java

  • Replace Guava CacheBuilder with Caffeine for session caches
  • Update removal listener lambda syntax for Caffeine API
  • Simplify cache invalidation logic in stopAllSessions method
  • Remove ExecutionException handling due to Caffeine's different API
+35/-46 
Dependencies
MODULE.bazel
Update build dependencies and add Caffeine                             

MODULE.bazel

  • Add Caffeine dependency (3.2.2)
  • Update GraphQL Java to version 24.1
  • Update Google Java Format to 1.28.0
  • Update various dependency versions (Netty, OpenTelemetry, Log4j)
+7/-6     
maven_install.json
Maven dependency resolution updates                                           

java/maven_install.json

  • Add Caffeine artifact with checksums and version info
  • Update dependency resolution hashes
  • Add new Netty codec modules and update versions
  • Update OpenTelemetry and other dependency versions
+234/-127
Configuration changes
BUILD.bazel
Add Caffeine to GraphQL build deps                                             

java/src/org/openqa/selenium/grid/graphql/BUILD.bazel

  • Add Caffeine artifact dependency to build configuration
+1/-0     
BUILD.bazel
Add Caffeine to LocalNode build deps                                         

java/src/org/openqa/selenium/grid/node/local/BUILD.bazel

  • Add Caffeine artifact dependency to build configuration
+1/-0     

@VietND96 VietND96 added the B-grid Everything grid and server related label Apr 2, 2025
@VietND96 VietND96 requested a review from joerg1985 April 2, 2025 00:18
@diemol
Copy link
Member

diemol commented Apr 2, 2025

Can you share some context for this? Why do we need this?

@VietND96
Copy link
Member Author

VietND96 commented Apr 2, 2025

This relates to #15370 where we try to see how caffeine could resolve something better than guava.

@ben-manes
Copy link

Hard to tell from the thread as an outside observer, but is that thread’s issue that removal notifications occur sometime after expiration? In guava/caffeine the default is to evict as part of maintenance triggered by other activity, so the explicit cleanUp calls are required for promptness. Caffeine offers a scheduler option for a thread that does this based on the next ttl event. Since the removalListener is called asynchronously, caffeine also offers an evictionListener if you need it as part of the eviction’s atomic map operation.

Neither provides linearizable invalidateAll so in-flight loads would be skipped. Guava’s loads are never linearizable whereas Caffeine’s are except for that method because otherwise we’d need to fork ConcurrentHashMap which suppresses initial loads during a traverse.

but I don’t know if that’s helpful or if I grok the issues

@joerg1985
Copy link
Member

@diemol the javadoc of Guava's CacheBuilder does recommend to move to Caffeine. So beside concrete issues it makes sense for me to switch.

https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/package-info.java#L16-L17

More details here: https://github.com/google/guava/blob/master/guava/src/com/google/common/cache/CacheBuilder.java#L49

@diemol
Copy link
Member

diemol commented Jul 5, 2025

@VietND96, should we proceed with this?

@VietND96
Copy link
Member Author

VietND96 commented Jul 7, 2025

I will resume to check this PR in this release

@VietND96 VietND96 requested a review from joerg1985 July 21, 2025 10:25
@VietND96 VietND96 marked this pull request as ready for review July 24, 2025 01:53
@VietND96 VietND96 requested a review from joerg1985 July 24, 2025 01:53
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Memory Calculation

The custom weigher implementation uses hardcoded overhead estimates (200 bytes for CompletableFuture, 500 bytes for PreparsedDocumentEntry) which may not accurately reflect actual memory usage across different JVM implementations and versions. This could lead to ineffective cache sizing.

public int weigh(String key, CompletableFuture<PreparsedDocumentEntry> value) {
  // Estimate memory usage including CompletableFuture overhead
  long keyWeight = key.length() * 2; // UTF-16 encoding
  long futureOverhead = 200; // Estimated CompletableFuture overhead
  long documentOverhead = 500; // Estimated PreparsedDocumentEntry overhead

  long totalWeight = keyWeight + futureOverhead + documentOverhead;

  // Bounds checking to prevent single entries from dominating cache
  return (int) Math.min(totalWeight, maxSingleEntryWeight);
}
Logic Change

The session timeout logic has changed from checking notification.wasEvicted() to only checking cause == RemovalCause.EXPIRED. This removes the broader eviction check and may not handle all timeout scenarios that were previously covered.

if (cause == RemovalCause.EXPIRED) {
  // Session is timing out, stopping it by sending a DELETE
  LOG.log(Level.INFO, () -> String.format("Session id %s timed out, stopping...", id));
  span.setStatus(Status.CANCELLED);
  span.addEvent(String.format("Stopping the the timed session %s", id), attributeMap);
} else {
  LOG.log(Level.INFO, () -> String.format("Session id %s is stopping on demand...", id));
  span.addEvent(String.format("Stopping the session %s on demand", id), attributeMap);
}
if (cause == RemovalCause.EXPIRED) {
Resource Management

The close method cancels pending CompletableFutures with interruption, which could potentially leave GraphQL operations in an inconsistent state. The aggressive cancellation approach may cause issues if operations are in critical sections.

public void close() {
  // Cancel pending CompletableFutures
  cache
      .asMap()
      .values()
      .forEach(
          future -> {
            if (!future.isDone()) {
              future.cancel(true);
            }
          });

  // Invalidate and clean cache
  cache.invalidateAll();
  cache.cleanUp();
}

Copy link
Contributor

qodo-merge-pro bot commented Jul 24, 2025

PR Code Suggestions ✨

Latest suggestions up to 87249f2

CategorySuggestion                                                                                                                                    Impact
General
Improve cache weight calculation accuracy

The weigher implementation has a potential issue where it doesn't account for
the actual PreparsedDocumentEntry content size, only estimates overhead. This
could lead to inaccurate cache weight calculations and suboptimal eviction
behavior.

java/src/org/openqa/selenium/grid/graphql/GraphqlHandler.java [260-272]

 @Override
 public int weigh(String key, CompletableFuture<PreparsedDocumentEntry> value) {
-  // Estimate memory usage including CompletableFuture overhead
+  // Base weight calculation
   long keyWeight = key.length() * 2; // UTF-16 encoding
   long futureOverhead = 200; // Estimated CompletableFuture overhead
-  long documentOverhead = 500; // Estimated PreparsedDocumentEntry overhead
+  
+  // Try to get actual document size if available, otherwise use estimate
+  long documentWeight = 500; // Default estimate
+  if (value.isDone() && !value.isCompletedExceptionally()) {
+    try {
+      PreparsedDocumentEntry entry = value.getNow(null);
+      if (entry != null && entry.getDocument() != null) {
+        // Rough estimate based on document string representation
+        documentWeight = entry.getDocument().toString().length() * 2 + 500;
+      }
+    } catch (Exception ignored) {
+      // Fall back to estimate if we can't get actual size
+    }
+  }
 
-  long totalWeight = keyWeight + futureOverhead + documentOverhead;
-
-  // Bounds checking to prevent single entries from dominating cache and integer overflow
+  long totalWeight = keyWeight + futureOverhead + documentWeight;
   long boundedWeight = Math.min(totalWeight, maxSingleEntryWeight);
   return (int) Math.min(boundedWeight, Integer.MAX_VALUE);
 }
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the cache weigher uses a fixed estimate and proposes a valid improvement to calculate a more accurate weight from the completed future, which would improve cache eviction behavior.

Medium
  • Update

Previous suggestions

✅ Suggestions up to commit ea55a8a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent integer overflow in weigher
Suggestion Impact:The suggestion was directly implemented in the commit. The code was modified to add overflow protection by introducing a boundedWeight variable and using Math.min with Integer.MAX_VALUE before casting to int, exactly as suggested.

code diff:

-      // Bounds checking to prevent single entries from dominating cache
-      return (int) Math.min(totalWeight, maxSingleEntryWeight);
+      // Bounds checking to prevent single entries from dominating cache and integer overflow
+      long boundedWeight = Math.min(totalWeight, maxSingleEntryWeight);
+      return (int) Math.min(boundedWeight, Integer.MAX_VALUE);

The weigher implementation has potential integer overflow issues when casting
long to int. Large cache weights could overflow and return negative values,
causing unpredictable cache behavior. Add overflow protection by checking if the
value exceeds Integer.MAX_VALUE before casting.

java/src/org/openqa/selenium/grid/graphql/GraphqlHandler.java [260-270]

 public int weigh(String key, CompletableFuture<PreparsedDocumentEntry> value) {
   // Estimate memory usage including CompletableFuture overhead
   long keyWeight = key.length() * 2; // UTF-16 encoding
   long futureOverhead = 200; // Estimated CompletableFuture overhead
   long documentOverhead = 500; // Estimated PreparsedDocumentEntry overhead
 
   long totalWeight = keyWeight + futureOverhead + documentOverhead;
 
   // Bounds checking to prevent single entries from dominating cache
-  return (int) Math.min(totalWeight, maxSingleEntryWeight);
+  long boundedWeight = Math.min(totalWeight, maxSingleEntryWeight);
+  
+  // Prevent integer overflow when casting to int
+  return (int) Math.min(boundedWeight, Integer.MAX_VALUE);
 }

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential integer overflow when casting a long to an int, which could occur on systems with very large heap sizes, and provides a robust fix.

Medium
General
Use explicit charset for bytes
Suggestion Impact:The suggestion was directly implemented. The code was changed from query.getBytes().length to query.getBytes(StandardCharsets.UTF_8).length, and the necessary import for StandardCharsets was added.

code diff:

+import java.nio.charset.StandardCharsets;
 import java.time.Duration;
 import java.util.HashMap;
 import java.util.Map;
@@ -114,7 +115,7 @@
                   String query = executionInput.getQuery();
 
                   // Query size validation with bypass for oversized queries
-                  if (query.getBytes().length > MAX_QUERY_SIZE_BYTES) {
+                  if (query.getBytes(StandardCharsets.UTF_8).length > MAX_QUERY_SIZE_BYTES) {

Using getBytes() without specifying charset can lead to platform-dependent
behavior and incorrect size calculations. Since GraphQL queries are typically
UTF-8 encoded, explicitly specify the charset to ensure consistent byte length
calculation across different platforms.

java/src/org/openqa/selenium/grid/graphql/GraphqlHandler.java [116-121]

 // Query size validation with bypass for oversized queries
-if (query.getBytes().length > MAX_QUERY_SIZE_BYTES) {
+if (query.getBytes(StandardCharsets.UTF_8).length > MAX_QUERY_SIZE_BYTES) {
   // Bypass cache for oversized queries to prevent cache pollution
   return CompletableFuture.supplyAsync(
       () -> computeFunction.apply(executionInput));
 }

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that query.getBytes() uses the platform's default charset, which can lead to inconsistent behavior; explicitly using StandardCharsets.UTF_8 ensures correctness and portability.

Medium

@VietND96 VietND96 changed the title [grid] Migrate cache builder to use Caffeine [grid] Migrate from Guava's CacheBuilder to Caffeine Jul 24, 2025
@VietND96 VietND96 requested a review from diemol July 24, 2025 07:15
Copy link
Member

@joerg1985 joerg1985 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @VietND96

@VietND96 VietND96 merged commit 08507ee into trunk Jul 26, 2025
32 of 33 checks passed
@VietND96 VietND96 deleted the migrate-caffeine branch July 26, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-grid Everything grid and server related Review effort 4/5
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants