Skip to content

Conversation

zhangkun83
Copy link
Contributor

This is a revised version of #5503 (62b03fd), which was rolled back in f8d0868. The newer version passes SubchannelStateListener to Subchannel.start() instead of SubchannelCreationArgs, which allows us to remove the Subchannel argument from the listener, which works as a solution for #5676.

LoadBalancers that call the old createSubchannel() will get start() implicitly called with a listener that passes updates to the deprecated LoadBalancer.handleSubchannelState(). Those who call the new createSubchannel() will have to call start() explicitly.

This PR has 3 commits:

  1. Roll forward the original change.
  2. Revert GRPCLB back to the old API. See commit message for the reason.
  3. The revision.

Nothing new is introduced in commits 1 and 2. Reviewers should only need to review commit 3.

zhangkun83 added 2 commits May 9, 2019 13:47
It's a pain to migrate the SubchannelPool to the new API.  Since
CachedSubchannelHelper is on the way, it's easier to switch to it when
it's ready.

Keeping GRPCLB with the old API would also confirm the backward
compatibility.
@zhangkun83 zhangkun83 requested a review from ejona86 May 10, 2019 06:36
Also stop passing Subchannel to SubchannelStateListener.  This is part
of the bigger change to remove Subchannel from all arguments on the
LoadBalancer API, so that wrapping of Subchannel won't cause identity
problems.  Details in grpc#5676
@zhangkun83 zhangkun83 force-pushed the subchannelstatelistener2 branch from b3a9b3e to 98710a3 Compare May 10, 2019 07:04
@zhangkun83
Copy link
Contributor Author

zhangkun83 commented May 10, 2019

There is a caveat, that if the parent LB and the child LB are using different versions of APIs, the parent won't be able to intercept Subchannel states, like what HealthCheckingLoadBalancerFactory is doing.

  1. Parent LB is on the old API and the child is on the new: child passes listener to Subchannel.start(), which the parent is not intercepting, because the parent is intercepting LoadBalancer.handleSubchannelState(). Subchannel state updates go to the child listener intact.
  2. Parent LB is on the new API and the child is on the old: child calls the old createSubchannel() which the parent is not intercepting, because the parent is intercepting the new createSubchannel() and returning a wrapped Subchannel that will intercept start(). ChannelImpl converts the old createSubchannel() into a new one followed by an implicit start(), which calls handleSubchannelState() on the parent which is unintercepted. Subchannel state updates go to the child listener intact.
  3. When parent and child use the same version, Subchannel states get correctly intercepted.

I can't think of a way to perfectly solve this. Hopefully and probably nobody other than us is intercepting Subchannel states.

* Starts the Subchannel. Can only be called once.
*
* <p>This method <strong>must be called from the {@link #getSynchronizationContext
* Synchronization Context}</strong>, otherwise it may throw. This is to avoid the race between
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is now a bit out-of-date. For example the race mentioned in "This is to avoid the race ..." is no longer a problem for the caller.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased.

@@ -1164,6 +1148,7 @@ public ChannelLogger getChannelLogger() {

/**
* Asks the Subchannel to create a connection (aka transport), if there isn't an active one.
* Has no effect if {@link #start} has not been called or {@link #shutdown} has been called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should throw if start has not been called. I'd be willing to have a blanket statement for the class that all methods may throw if called before start. We could look at ClientCall for seeing how we worded it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized we can't make the sync-context requirement for this method, because pick-first calls it from the data-path. I have sent out #5736

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion offline, we will keep the sync-context requirement, and all methods except shutdown() should be called after start().

@@ -1529,6 +1550,18 @@ public void run() {

@Override
public void requestConnection() {
syncContext.execute(new Runnable() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed? It seems it should be safe to check started here without going onto the syncContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted per our discussion. There is a happens-after relationship between start() and the other methods, thus started don't need to be read in the syncContext.

@@ -274,13 +279,16 @@ void setServiceName(@Nullable String newServiceName) {
adjustHealthCheck();
}

void updateRawState(ConnectivityStateInfo rawState) {
public void onSubchannelState(ConnectivityStateInfo rawState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't HealthCheckState implementing SubchannelStateListener any more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Made HealthCheckState implement SubchannelStateListener again.

@@ -881,6 +883,13 @@ private void maybeTerminateChannel() {
}
}

// Must be called from syncContext
private void handleInternalSubchannelState(ConnectivityStateInfo newState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to move this method, because you can access the LbHelperImpl field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method doesn't need to be on LbHelperImpl in the first place. What it does is in the scope of the top-level class.

});
}

private void internalShutdown() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is only used in one place. Consider move it to a named inner class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is intended to replace shutdown() when we are ready to turn the warning into an exception. I have added a comment to make that clear.

}

private void internalShutdown() {
syncContext.throwIfNotInThisSynchronizationContext();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's obvious in syncContext already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my response above.

syncContext.execute(new Runnable() {
@Override
public void run() {
listener.onSubchannelState(ConnectivityStateInfo.forNonError(SHUTDOWN));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc says SHUTDOWN can be safely ignored, do we need call onSubchannelState(SHUTDOWN)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The paragraph where this statement is from is the continuation of the previous paragraph, which talks about the need to create a picker for each update. Although a LoadBalancer usually don't need to react to a SHUTDOWN update, the ChannelImpl still need to propagate it for completeness. And in some cases e.g., HealthCheckingLoadBalancer and CachedSubchannelHelper, the SHUTDOWN update is actually consumed for cleanups. I have rephrased the javadoc to prevent confusion.

@zhangkun83
Copy link
Contributor Author

@ejona86 @dapengzhang0 @voidzcy thanks for the review. PTAL.

zhangkun83 added a commit to zhangkun83/grpc-java that referenced this pull request May 16, 2019
We will require Subchannel.requestConnection() to be called from
sync-context (grpc#5722), but SubchannelPicker.requestConnection() is
currently calling it with the assumption of thread-safety.  Actually
SubchannelPicker.requestConnection() is called already from
sync-context by ChannelImpl, it makes more sense to move this method
to LoadBalancer where all other methods are sync-context'ed, rather than
making SubchannelPicker.requestConnection() sync-context'ed and fragmenting
the SubchannelPicker API because pickSubchannel() is thread-safe.

C++ also has the requestConnection() equivalent on their LoadBalancer
interface.
zhangkun83 added a commit that referenced this pull request May 16, 2019
We will require Subchannel.requestConnection() to be called from
sync-context (#5722), but SubchannelPicker.requestConnection() is
currently calling it with the assumption of thread-safety.  Actually
SubchannelPicker.requestConnection() is called already from
sync-context by ChannelImpl, it makes more sense to move this method
to LoadBalancer where all other methods are sync-context'ed, rather than
making SubchannelPicker.requestConnection() sync-context'ed and fragmenting
the SubchannelPicker API because pickSubchannel() is thread-safe.

C++ also has the requestConnection() equivalent on their LoadBalancer
interface.
@voidzcy voidzcy self-requested a review May 17, 2019 23:03
@zhangkun83 zhangkun83 merged commit 7934594 into grpc:master May 17, 2019
@zhangkun83 zhangkun83 deleted the subchannelstatelistener2 branch May 17, 2019 23:37
@lock lock bot locked as resolved and limited conversation to collaborators Aug 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants