[grid] Restructuring classes have stateful data and improve Node health checks in LocalDistributor #16151

VietND96 · 2025-08-09T17:45:09Z

User description

🔗 Related Issues

Part of SeleniumHQ/docker-selenium#1849
Restructuring classes to get ready for the next step in the implementation of the external datastore support

💥 What does this PR do?

No new functions added or removed, just restructuring.

Separation of Complex Logic

Breaking down functions in a single LocalDistributor class
- Delegating node management logic to LocalNodeRegistry
- Session distribution logic remains under LocalDistributor
- GridModel is updated to LocalGridModel

Clean Separation of Concerns
By restructuring these classes to bring all stateful functions under distributor/local, try to create a clear distinction between:

Interface definitions that define behavior (GridModel, NodeRegistry)
Local implementations that store state in-memory (LocalGridModel, LocalNodeRegistry, LocalDistributor).

Get ready for external data store support

Implementation of new packages (e.g., distributor/redis or distributor/jdbc) without modifying core components
Common interface definitions shared across all implementations
Plug-and-play storage backends through consistent interfaces

Get ready for high availability support
Once the external datastore can be implemented, it would enable multiple distributor replicas to run simultaneously:

Externalize all state to shared storage systems (Redis, JDBC, etc.)
Implement distributed locking for coordinated operations between replicas

Architectural Improvements

Dependency Inversion: Higher-level modules depend on abstractions, not concrete implementations
Extension Points: Clear extension points for new storage backends
Testability: Easier mocking of components for testing
Scalability: Path to horizontal scaling of distributor component

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

Cleanup (formatting, renaming)
Bug fix (backwards compatible)
New feature (non-breaking change which adds functionality and tests!)
Breaking change (fix or feature that would cause existing functionality to change)

PR Type

Enhancement

Description

Restructure LocalDistributor to separate node management from session distribution
Extract node registry logic into LocalNodeRegistry interface
Convert GridModel to abstract class with LocalGridModel implementation
Prepare architecture for external datastore support

Diagram Walkthrough

flowchart LR
  A["LocalDistributor"] --> B["NodeRegistry Interface"]
  B --> C["LocalNodeRegistry"]
  C --> D["LocalGridModel"]
  A --> E["Session Distribution Logic"]
  F["GridModel Abstract"] --> D

File Walkthrough

Relevant files

Enhancement

GridModel.java `Convert GridModel to abstract interface` java/src/org/openqa/selenium/grid/distributor/GridModel.java Convert concrete class to abstract class Define abstract methods for node management operations Remove all implementation details and state management Add comprehensive method documentation	+70/-489
NodeRegistry.java `Add NodeRegistry interface definition` java/src/org/openqa/selenium/grid/distributor/NodeRegistry.java Create new interface for node registry operations Define methods for node lifecycle management Include health checking and availability management Add session and slot management methods	+148/-0
LocalDistributor.java `Delegate node management to NodeRegistry` java/src/org/openqa/selenium/grid/distributor/local/LocalDistributor.java Replace direct node management with NodeRegistry delegation Remove GridModel instantiation and node storage Simplify session creation and slot reservation logic Clean up health check and node lifecycle code	+38/-293
LocalGridModel.java `Implement LocalGridModel with original logic` java/src/org/openqa/selenium/grid/distributor/local/LocalGridModel.java Move GridModel implementation from abstract to concrete class Implement all abstract methods with original logic Maintain thread-safe node state management Handle session and slot operations	+544/-0
LocalNodeRegistry.java `Implement LocalNodeRegistry with node management` java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java Implement NodeRegistry interface with local storage Manage node lifecycle and health checking Handle event bus integration and scheduling Provide thread-safe node operations	+536/-0

qodo-merge-pro · 2025-08-09T17:45:50Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ❌ 1849 - Not compliant Non-compliant requirements: Return a clear, catchable error when proxy server refuses connections Provide a programmatic detection mechanism for proxy refusal without DOM scraping Python-specific handling or cross-language consistency
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Possible Issue The method `getNodeFromURI` now relies on `nodeRegistry.getAvailableNodes()` which filters out DOWN/DRAINING nodes. If a session is created and the node transitions to a non-available state quickly, looking up by URI may fail to find the node for cleanup, leading to leaked sessions. protected Node getNodeFromURI(URI uri) { Lock readLock = this.lock.readLock(); readLock.lock(); try { Set<NodeStatus> nodes = nodeRegistry.getAvailableNodes(); Optional<NodeStatus> nodeStatus = nodes.stream().filter(node -> node.getExternalUri().equals(uri)).findFirst(); return nodeStatus.map(status -> nodeRegistry.getNode(status.getNodeId())).orElse(null); } finally { Concurrency Health checks are scheduled both via a global fixed-rate task and additionally per-node scheduling stored in `scheduledHealthChecks`. Ensure no duplicate execution or race with `runHealthChecks()` that iterates `allChecks` concurrently; verify visibility and cancellation semantics to avoid double-running checks. nodes.put(node.getId(), node); model.add(initialNodeStatus); ScheduledFuture<?> future = nodeHealthCheckService.scheduleAtFixedRate( GuardedRunnable.guard(healthCheck), healthcheckInterval.toMillis(), healthcheckInterval.toMillis(), TimeUnit.MILLISECONDS); allChecks.put(node.getId(), healthCheck); scheduledHealthChecks.put(node.getId(), future); } finally { Behavior Change `isNotSupported` no longer checks node availability explicitly. It depends on `getAvailableNodes()` to filter. Confirm that `getAvailableNodes()` semantics match previous UP-only behavior in all cases to avoid offering capacity from nodes that are not actually available. private boolean isNotSupported(Capabilities caps) { return getAvailableNodes().stream().noneMatch(node -> node.hasCapability(caps, slotMatcher)); }

qodo-merge-pro · 2025-08-09T17:46:57Z

PR Code Suggestions ✨

Latest suggestions up to f604877

Category	Suggestion	Impact
Incremental ^[*]	✅ ~~Process all health check batches~~ Suggestion Impact: The commit replaced the limited parallelStream().limit(maxConcurrentBatches) approach with a custom executor-based processing that submits all batches, ensuring every batch is processed while controlling concurrency. This addresses the issue of batches being dropped. code diff: @@ -402,6 +412,42 @@ } catch (RuntimeException e) { return false; } + } + + private void processBatchesInParallel(List<List<Runnable>> batches) { + if (batches.isEmpty()) { + return; + } + + // Process all batches with controlled parallelism + batches.forEach( + batch -> + nodeHealthCheckExecutor.submit( + () -> + batch.parallelStream() + .forEach( + r -> { + try { + r.run(); + } catch (Throwable t) { + LOG.log( + getDebugLogLevel(), + "Health check execution failed in batch", + t); + } + }))); + } + + private static List<List<Runnable>> partition(List<Runnable> list, int size) { + List<List<Runnable>> batches = new ArrayList<>(); + if (list.isEmpty() \|\| size <= 0) { + return batches; + } + for (int i = 0; i < list.size(); i += size) { + int end = Math.min(i + size, list.size()); + batches.add(new ArrayList<>(list.subList(i, end))); + } + return batches; } Ensure all batches are processed by removing the `limit(maxConcurrentBatches)` which drops excess batches, or implement a proper throttling mechanism that still visits every batch java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [402-422] private void processBatchesInParallel(List<List<Runnable>> batches, int maxConcurrentBatches) { if (batches.isEmpty()) { return; } - // Process batches with controlled parallelism batches.parallelStream() - .limit(maxConcurrentBatches) .forEach( batch -> batch.parallelStream() .forEach( r -> { try { r.run(); } catch (Throwable t) { - LOG.log( - getDebugLogLevel(), "Health check execution failed in batch", t); + LOG.log(getDebugLogLevel(), "Health check execution failed in batch", t); } })); } `[Suggestion processed]` Suggestion importance[1-10]: 9 __ Why: This suggestion identifies a critical bug where `limit(maxConcurrentBatches)` would cause health checks for some nodes to be skipped entirely, and the proposed fix correctly ensures all batches are processed.	High
Learned best practice	✅ ~~Simplify heartbeat timeout math~~ Suggestion Impact: The commit implemented the inverse change of the suggestion: it replaced the simplified single multiplication by half the multiplier with the chained multipliedBy(...).dividedBy(2) form. This shows the code area and logic were directly affected, albeit in the opposite direction of the suggestion. code diff: Instant lostTime = - lastTouched.plus(node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER / 2)); + lastTouched.plus( + node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER).dividedBy(2)); Use consistent and clearer time calculations; avoid chained multiply/divide on Duration which can be error-prone. Replace with a single multiplication by half the multiplier to match intent and readability. java/src/org/openqa/selenium/grid/distributor/local/LocalGridModel.java [245-247] Instant lostTime = - lastTouched.plus( - node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER).dividedBy(2)); + lastTouched.plus(node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER / 2)); `[Suggestion processed]` Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Fix syntax errors, typos, and naming inconsistencies to maintain code quality.	Low
Learned best practice	Guard updates for unknown nodes Validate that the node exists before updating its availability to avoid operating on unknown nodes. Add a guard that checks `nodes.containsKey(id)` and log/return early if not found. java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [274-289] public void updateNodeAvailability(URI nodeUri, NodeId id, Availability availability) { Require.nonNull("Node URI", nodeUri); Require.nonNull("Node ID", id); Require.nonNull("Availability", availability); Lock writeLock = lock.writeLock(); writeLock.lock(); try { + if (!nodes.containsKey(id)) { + LOG.log(Level.FINE, String.format("Ignoring availability update for unknown node %s", id)); + return; + } LOG.log( getDebugLogLevel(), String.format("Health check result for %s was %s", nodeUri, availability)); model.setAvailability(id, availability); model.updateHealthCheckCount(id, availability); } finally { writeLock.unlock(); } } Apply / Chat Suggestion importance[1-10]: 5 __ Why: Relevant best practice - Add null checks and validate parameters before use to prevent NullPointerException and similar runtime errors.	Low
Update

Previous suggestions

✅ Suggestions up to commit 7ef687a

Category	Suggestion	Impact
Incremental ^[*]	✅ ~~Cancel tasks and free resources~~ Suggestion Impact: The commit modified close() to acquire the write lock, clear maps, and close RemoteNodes under the lock. While it did not cancel ScheduledFutures (it removed the scheduling and the scheduledHealthChecks map entirely), it addressed the resource/leak concern by clearing allChecks and nodes and properly closing RemoteNodes during shutdown. code diff: @@ -535,5 +571,25 @@ @Override public void close() { LOG.info("Shutting down LocalNodeRegistry"); + Lock writeLock = lock.writeLock(); + writeLock.lock(); + try { + allChecks.clear(); + nodes + .values() + .forEach( + n -> { + if (n instanceof RemoteNode) { + try { + ((RemoteNode) n).close(); + } catch (Exception e) { + LOG.log(Level.WARNING, "Unable to close node properly: " + e.getMessage()); + } + } + }); + nodes.clear(); + } finally { + writeLock.unlock(); + } } Ensure scheduled tasks are cancelled and resources are released on close; cancel per-node futures and clear maps to prevent thread and memory leaks java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [535-539] @Override public void close() { LOG.info("Shutting down LocalNodeRegistry"); + Lock writeLock = lock.writeLock(); + writeLock.lock(); + try { + scheduledHealthChecks.values().forEach(f -> { + if (f != null) { + f.cancel(false); + } + }); + scheduledHealthChecks.clear(); + allChecks.clear(); + nodes.values().forEach(n -> { + if (n instanceof RemoteNode) { + try { + ((RemoteNode) n).close(); + } catch (Exception e) { + LOG.log(Level.WARNING, "Unable to close node properly: " + e.getMessage()); + } + } + }); + nodes.clear(); + } finally { + writeLock.unlock(); + } } `[Suggestion processed]` Suggestion importance[1-10]: 9 __ Why: The suggestion correctly points out a critical resource leak where scheduled health checks are not cancelled on close, which would prevent proper shutdown and cause memory leaks.	High
Incremental ^[*]	Eliminate duplicate healthcheck scheduling Prevent running both global periodic health checks and per-node scheduled checks simultaneously; choose one mechanism to avoid double healthcheck execution causing noisy state flaps java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [132-136] -this.nodeHealthCheckService.scheduleAtFixedRate( - GuardedRunnable.guard(this::runHealthChecks), - healthcheckInterval.toMillis(), - healthcheckInterval.toMillis(), - TimeUnit.MILLISECONDS); +// Use only per-node scheduled health checks; remove global scheduler +// this.nodeHealthCheckService.scheduleAtFixedRate( +// GuardedRunnable.guard(this::runHealthChecks), +// healthcheckInterval.toMillis(), +// healthcheckInterval.toMillis(), +// TimeUnit.MILLISECONDS); Suggestion importance[1-10]: 8 __ Why: This suggestion correctly identifies that health checks are scheduled twice, which could lead to race conditions and performance issues, and proposes a valid fix.	Medium
Learned best practice	Add parameter null check Validate that the provided `id` is not null before accessing the map to avoid potential NullPointerException. Add a guard clause using `Require.nonNull` or an explicit null check. java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [367-369] public Node getNode(NodeId id) { + Require.nonNull("Node ID", id); return nodes.get(id); } Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Add null checks and validate parameters before use to prevent NullPointerException.	Low
General	Clarify purge timeout calculations Avoid the chained arithmetic for half the timeout; it is easy to misread and risks integer rounding surprises if the multiplier changes. Compute the half-timeout explicitly to preserve intent and clarity. This also prevents accidental overflow if the multiplier is adjusted. java/src/org/openqa/selenium/grid/distributor/local/LocalGridModel.java [245-249] -Instant lostTime = - lastTouched.plus( - node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER).dividedBy(2)); -Instant deadTime = - lastTouched.plus(node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER)); +Duration heartbeat = node.getHeartbeatPeriod(); +Duration halfTimeout = heartbeat.multipliedBy(PURGE_TIMEOUT_MULTIPLIER).dividedBy(2); +Duration fullTimeout = heartbeat.multipliedBy(PURGE_TIMEOUT_MULTIPLIER); +Instant lostTime = lastTouched.plus(halfTimeout); +Instant deadTime = lastTouched.plus(fullTimeout); Suggestion importance[1-10]: 4 __ Why: The suggestion correctly proposes extracting complex `Duration` calculations into named variables, which improves code readability and maintainability.	Low

✅ Suggestions up to commit c7aeee4

Category	Suggestion	Impact
High-level	Duplicated health-check scheduling Node health checks are scheduled both inside LocalNodeRegistry (per-node ScheduledFuture) and also via a fixed-rate executor created in LocalDistributor previously; this restructuring removed the latter but retained periodic scheduling plus a global runHealthChecks, leading to potential double execution if extended later or misused. Consolidate to a single mechanism (either per-node ScheduledFuture or a single periodic sweep) to avoid duplicate checks, race conditions, and resource leaks across different registry implementations. Examples: java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [132-136] this.nodeHealthCheckService.scheduleAtFixedRate( GuardedRunnable.guard(this::runHealthChecks), healthcheckInterval.toMillis(), healthcheckInterval.toMillis(), TimeUnit.MILLISECONDS); java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [208-215] ScheduledFuture<?> future = nodeHealthCheckService.scheduleAtFixedRate( GuardedRunnable.guard(healthCheck), healthcheckInterval.toMillis(), healthcheckInterval.toMillis(), TimeUnit.MILLISECONDS); allChecks.put(node.getId(), healthCheck); scheduledHealthChecks.put(node.getId(), future); Solution Walkthrough: Before: class LocalNodeRegistry { LocalNodeRegistry(...) { // A global sweep of all health checks is scheduled this.nodeHealthCheckService.scheduleAtFixedRate( GuardedRunnable.guard(this::runHealthChecks), ...); } public void add(Node node) { // ... // An individual health check is also scheduled for each new node ScheduledFuture<?> future = nodeHealthCheckService.scheduleAtFixedRate( GuardedRunnable.guard(healthCheck), ...); scheduledHealthChecks.put(node.getId(), future); } } After: class LocalNodeRegistry { LocalNodeRegistry(...) { // Keep the global sweep of all health checks this.nodeHealthCheckService.scheduleAtFixedRate( GuardedRunnable.guard(this::runHealthChecks), ...); } public void add(Node node) { // ... // Remove the duplicated, per-node scheduling allChecks.put(node.getId(), healthCheck); // The global sweep will pick up the new check from `allChecks` } } Suggestion importance[1-10]: 9 __ Why: The suggestion correctly identifies a significant design flaw in `LocalNodeRegistry` where health checks are scheduled twice, leading to redundant execution, increased load, and potential race conditions.	High
Possible issue	✅ ~~Use concurrent maps for health checks~~ Suggestion Impact: The commit uses ConcurrentHashMap for allChecks (line 15 shows it defined as ConcurrentHashMap). Additionally, it removes the scheduled health checks map entirely and related scheduling/cancellation logic, which eliminates the thread-safety concern for that map. Thus, the suggestion to use concurrent maps impacted the commit: one map was made concurrent and the other was removed. code diff: @@ -83,7 +83,6 @@ private final GridModel model; private final Map<NodeId, Node> nodes; private final Map<NodeId, Runnable> allChecks = new ConcurrentHashMap<>(); - private final Map<NodeId, ScheduledFuture<?>> scheduledHealthChecks = new ConcurrentHashMap<>(); private final ReadWriteLock lock = new ReentrantReadWriteLock(/* fair */ true); private final ScheduledExecutorService nodeHealthCheckService; private final Duration purgeNodesInterval; @@ -205,14 +204,7 @@ try { nodes.put(node.getId(), node); model.add(initialNodeStatus); - ScheduledFuture<?> future = - nodeHealthCheckService.scheduleAtFixedRate( - GuardedRunnable.guard(healthCheck), - healthcheckInterval.toMillis(), - healthcheckInterval.toMillis(), - TimeUnit.MILLISECONDS); allChecks.put(node.getId(), healthCheck); - scheduledHealthChecks.put(node.getId(), future); Access to `allChecks` and `scheduledHealthChecks` occurs under locks but the maps themselves are non-concurrent. Replace them with concurrent maps to avoid accidental unsynchronized access paths and reduce risk of race conditions during listener callbacks. This is critical since listeners and scheduled tasks can run concurrently. java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [85-86] -private final Map<NodeId, Runnable> allChecks = new HashMap<>(); -private final Map<NodeId, ScheduledFuture<?>> scheduledHealthChecks = new HashMap<>(); +private final Map<NodeId, Runnable> allChecks = new ConcurrentHashMap<>(); +private final Map<NodeId, ScheduledFuture<?>> scheduledHealthChecks = new ConcurrentHashMap<>(); Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that using `ConcurrentHashMap` for `allChecks` and `scheduledHealthChecks` improves robustness and maintainability in a highly concurrent class, reducing the risk of future errors.	Low
Possible issue	✅ ~~Fix integer division in timing~~ Suggestion Impact: The commit updated the lostTime calculation to multiply by PURGE_TIMEOUT_MULTIPLIER before dividing by 2, preventing integer truncation. The deadTime line remained unchanged in the patch hunk, but the key issue (lostTime integer division) was addressed. code diff: Instant lostTime = - lastTouched.plus(node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER / 2)); + lastTouched.plus( + node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER).dividedBy(2)); The division `PURGE_TIMEOUT_MULTIPLIER / 2` performs integer division; for odd values it truncates and may degrade timing logic. Compute using long arithmetic to preserve intended scaling, e.g., multiply first or use exact constants. java/src/org/openqa/selenium/grid/distributor/local/LocalGridModel.java [245-248] Instant lostTime = - lastTouched.plus(node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER / 2)); + lastTouched.plus(node.getHeartbeatPeriod().multipliedBy((long) PURGE_TIMEOUT_MULTIPLIER).dividedBy(2)); Instant deadTime = - lastTouched.plus(node.getHeartbeatPeriod().multipliedBy(PURGE_TIMEOUT_MULTIPLIER)); + lastTouched.plus(node.getHeartbeatPeriod().multipliedBy((long) PURGE_TIMEOUT_MULTIPLIER)); `[Suggestion processed]` Suggestion importance[1-10]: 5 __ Why: The suggestion correctly points out that integer division on `PURGE_TIMEOUT_MULTIPLIER` could lead to precision loss, and using long arithmetic or floating-point division would make the timing calculation more robust.	Low
Learned best practice	✅ ~~Cancel tasks and close nodes~~ Suggestion Impact: The close() method was updated to acquire the write lock, clear checks, and close RemoteNode instances. However, cancellation of ScheduledFutures was not implemented; in fact, the code removed tracking of ScheduledFutures entirely. So the commit partially implemented the shutdown/cleanup aspect (closing nodes and clearing checks) but did not cancel scheduled tasks as suggested. code diff: @@ -532,5 +571,25 @@ @Override public void close() { LOG.info("Shutting down LocalNodeRegistry"); + Lock writeLock = lock.writeLock(); + writeLock.lock(); + try { + allChecks.clear(); + nodes + .values() + .forEach( + n -> { + if (n instanceof RemoteNode) { + try { + ((RemoteNode) n).close(); + } catch (Exception e) { + LOG.log(Level.WARNING, "Unable to close node properly: " + e.getMessage()); + } + } + }); + nodes.clear(); + } finally { + writeLock.unlock(); + } } Ensure scheduled tasks are cancelled and nodes are closed on shutdown. Cancel futures in `scheduledHealthChecks`, clear listeners if needed, and close any `RemoteNode` instances to prevent resource leaks. java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [532-535] @Override public void close() { LOG.info("Shutting down LocalNodeRegistry"); + Lock writeLock = lock.writeLock(); + writeLock.lock(); + try { + // Cancel scheduled health checks + for (ScheduledFuture<?> future : scheduledHealthChecks.values()) { + if (future != null) { + future.cancel(false); + } + } + scheduledHealthChecks.clear(); + allChecks.clear(); + // Close remote nodes + for (Node node : nodes.values()) { + if (node instanceof RemoteNode) { + try { + ((RemoteNode) node).close(); + } catch (Exception e) { + LOG.log(Level.WARNING, "Unable to close node properly: " + e.getMessage()); + } + } + } + nodes.clear(); + } finally { + writeLock.unlock(); + } } Suggestion importance[1-10]: 6 __ Why: Relevant best practice - Properly dispose and clean up scheduled tasks and resources to prevent leaks.	Low
Learned best practice	✅ ~~Add parameter null validation~~ Suggestion Impact: The commit added Require.nonNull checks for Node URI, Node ID, and Availability at the start of updateNodeAvailability, matching the suggestion. code diff: + Require.nonNull("Node URI", nodeUri); + Require.nonNull("Node ID", id); + Require.nonNull("Availability", availability); Validate inputs before using them. Add null checks for `nodeUri`, `id`, and `availability` to avoid potential NullPointerExceptions during logging and model updates. java/src/org/openqa/selenium/grid/distributor/local/LocalNodeRegistry.java [292-304] public void updateNodeAvailability(URI nodeUri, NodeId id, Availability availability) { + Require.nonNull("Node URI", nodeUri); + Require.nonNull("Node ID", id); + Require.nonNull("Availability", availability); Lock writeLock = lock.writeLock(); writeLock.lock(); try { LOG.log( getDebugLogLevel(), String.format("Health check result for %s was %s", nodeUri, availability)); model.setAvailability(id, availability); model.updateHealthCheckCount(id, availability); } finally { writeLock.unlock(); } } `[Suggestion processed]` Suggestion importance[1-10]: 5 __ Why: Relevant best practice - Add null checks and validation for parameters before use to prevent NullPointerException.	Low

Signed-off-by: Viet Nguyen Duc <[email protected]>

diemol

Overall, it makes sense to me. Thank you for this.

Please have a look at the AI reviewer suggestions and see if they make sense or not.

Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 · 2025-08-11T11:55:02Z

@diemol on AI review points that I fixed

Use concurrent maps for health checks
Add parameter null validation
Fix integer division in timing

On point "Duplicated health-check scheduling" - I am thinking another improvement on other PR.
Assume the Distributor manages 1000 Nodes, health checks performed on those Nodes would consume high resources. I intend to improve health checks as below strategy

1st sort out and priority list of Nodes which have at least 1 available slot (catch up latest health status of those would help process to distribute new request to available Nodes ASAP and reliably). The rest of the Nodes (most are full slots, or DOWN) will check the following.
Apply Atomic Batch Processing for efficient resource allocation and faster.

Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 requested review from shs96c and diemol August 9, 2025 17:45

selenium-ci added B-grid Everything grid and server related C-java Java Bindings labels Aug 9, 2025

qodo-merge-pro bot added the Review effort 4/5 label Aug 9, 2025

[grid] Restructuring classes have stateful data in LocalDistributor

c7e2984

Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 force-pushed the local-distributor branch from c7aeee4 to c7e2984 Compare August 9, 2025 19:11

Merge branch 'trunk' into local-distributor

579723d

diemol approved these changes Aug 11, 2025

View reviewed changes

Fix review comment

7ef687a

Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 added 3 commits August 11, 2025 20:13

Fix review comment

ef07920

Signed-off-by: Viet Nguyen Duc <[email protected]>

Merge remote-tracking branch 'origin/trunk' into local-distributor

b27e15c

Improve Node health checks by batch runs

bf1ab2c

Signed-off-by: Viet Nguyen Duc <[email protected]>

VietND96 force-pushed the local-distributor branch from f604877 to bf1ab2c Compare August 11, 2025 14:57

VietND96 changed the title ~~[grid] Restructuring classes have stateful data in LocalDistributor~~ [grid] Restructuring classes have stateful data and improve Node health checks in LocalDistributor Aug 11, 2025

VietND96 merged commit 2dad5fa into trunk Aug 11, 2025
31 checks passed

VietND96 deleted the local-distributor branch August 11, 2025 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[grid] Restructuring classes have stateful data and improve Node health checks in LocalDistributor #16151

[grid] Restructuring classes have stateful data and improve Node health checks in LocalDistributor #16151

Uh oh!

VietND96 commented Aug 9, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

qodo-merge-pro bot commented Aug 9, 2025

Uh oh!

qodo-merge-pro bot commented Aug 9, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

diemol left a comment

Uh oh!

VietND96 commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[grid] Restructuring classes have stateful data and improve Node health checks in LocalDistributor #16151

[grid] Restructuring classes have stateful data and improve Node health checks in LocalDistributor #16151

Uh oh!

Conversation

VietND96 commented Aug 9, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

🔗 Related Issues

💥 What does this PR do?

🔧 Implementation Notes

💡 Additional Considerations

🔄 Types of changes

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot commented Aug 9, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

diemol left a comment

Choose a reason for hiding this comment

Uh oh!

VietND96 commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

VietND96 commented Aug 9, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Aug 9, 2025 •

edited

Loading