Skip to content

Conversation

@tatarintsevsv
Copy link

  • Ignore the absence of the FI_COMPLETION flag if CQ configured without FI_SELECTIVE_COMPLETION flag.
  • Error completions must generated for all operations, including those for which a completion was not requested

ofi_ep_cntr_inc(&u_ep->ep, CNTR_RD);

if (flags & FI_COMPLETION)
if (flags & FI_COMPLETION || u_ep->ep.tx_op_flags & FI_COMPLETION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tx_op_flags is already considered when ucx_proc_rma_msg is called.


if (ucx_req->completion.flags & FI_COMPLETION)
if (ucx_req->ep->ep.tx_op_flags & FI_COMPLETION ||
ucx_req->completion.flags & FI_COMPLETION)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Should use !(u_ep->ep.tx_op_flags & FI_SELECTIVE_COMPLETION) to check for the case that selective completion is not asked for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just be able to check the completion.flags which are set when initializing the request. The op flags are set to force include FI_COMPLETION when FI_SELECTIVE_COMPLETION is not requested so it should be ok. See here

I think the missing thing in the current code is the OR'ing in of the msg flags on the *msg APIs to make sure FI_COMPLETION is set in the req->completion.flags for the non FI_SELECTIVE_COMPLETION case.

@tatarintsevsv Can you explain the selective completion path that is incorrectly writing a completion when it shouldn't be? I don't immediately see it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should just be able to check the completion.flags which are set when initializing the request. The op flags are set to force include FI_COMPLETION when FI_SELECTIVE_COMPLETION is not requested so it should be ok. See here

I think the missing thing in the current code is the OR'ing in of the msg flags on the *msg APIs to make sure FI_COMPLETION is set in the req->completion.flags for the non FI_SELECTIVE_COMPLETION case.

@tatarintsevsv Can you explain the selective completion path that is incorrectly writing a completion when it shouldn't be? I don't immediately see it.

According to manual, in this this case cq records must be generated

fi_ep_bind(ep, &cq->fid, FI_TRANSMIT|FI_RECV);
fi_writemsg(ep, &msg, 0);

but it's so only if we pass FI_COMPLETION to fi_writemsg flags.

As far as I understand, the correct usage of FI_COMPLETION / FI_SELECTIVE_COMPLETION is

fi_ep_bind(ep, &cq->fid, FI_TRANSMIT|FI_RECV|FI_SELECTIVE_COMPLETION);
fi_writemsg(ep, &msg, FI_COMPLETION); // completion must be generated
fi_writemsg(ep, &msg, 0); // completion doesn't matter, will not be generated

and

fi_ep_bind(ep, &cq->fid, FI_TRANSMIT|FI_RECV);
fi_writemsg(ep, &msg, FI_COMPLETION); // completion must be generated
fi_writemsg(ep, &msg, 0); // completion must be generated anyway

Copy link
Contributor

@shijin-aws shijin-aws Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U just need to use flags | util_ep->tx_msg_flags as the internal flags when you implement fi_*msg. tx_msg_flags will be 0 when FI_SELECTIVE_COMPLETION is used when ep bind, it will be FI_COMPLETION when there is no FI_SELECTIVE_COMPLETION

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you check whether that internal flags have FI_COMPLETION to determine whether you want to generate completions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as it pointed by @j-xiong this is not correct - we must adds flags but not override.
fixed with only FI_COMPLETION is taken into account

@tatarintsevsv
Copy link
Author

After looking more closely, it seems this error only occurs with ucx_writemsg and ucx_readmsg. I'll fix the PR.

@tatarintsevsv tatarintsevsv force-pushed the ucx_selective_completion branch from 9eb350d to 8d6b373 Compare October 16, 2025 04:03
@tatarintsevsv
Copy link
Author

can anyone review last changes?

{
return ucx_proc_rma_msg(ep, msg, flags, UCX_DO_WRITE);
struct ucx_ep *u_ep = container_of(ep, struct ucx_ep, ep.ep_fid);
return ucx_proc_rma_msg(ep, msg, flags | u_ep->ep.tx_op_flags, UCX_DO_WRITE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. The flags passed to the msg version override the default op flags, no adds to.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I've fixed it so that only FI_COMPLETION is taken into account in the internal flags.

{
return ucx_proc_rma_msg(ep, msg, flags, UCX_DO_READ);
struct ucx_ep *u_ep = container_of(ep, struct ucx_ep, ep.ep_fid);
return ucx_proc_rma_msg(ep, msg, flags | u_ep->ep.rx_op_flags, UCX_DO_READ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

@zachdworkin
Copy link
Contributor

libfabric ci is failing udp fabtests. Can you please rebase to pickup the changes from #11501 to fix this issue?

@tatarintsevsv tatarintsevsv force-pushed the ucx_selective_completion branch from 1fae2f8 to 0e0953f Compare October 27, 2025 09:35
Copy link
Contributor

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also take a look at your commits and remove Merge commits

Comment on lines 164 to 165
if (u_ep->ep.tx_op_flags & FI_COMPLETION)
flags |= FI_COMPLETION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still incorrect.

The *msg APIs don't take into account the op_flags. They are specifically to override the op_flags so you shouldn't be using the op_flags on the *msg paths. You should only use the u_ep->tx_msg_flags here which only has FI_COMPLETION for the non FI_SELECTIVE_COMPLETION case and are 0 if FI_SELECTIVE_COMPLETION is being used. You should be able to always OR in the tx_msg_flags for the msg APIs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explaining. looks it just flags|msg_flags

- Ignore the absence of the FI_COMPLETION flag for fi_writemsg/fi_readmsg
  if CQ configured without FI_SELECTIVE_COMPLETION flag.
- Error completions must generated for all operations, including those
  for which a completion was not requested

Signed-off-by: Sergey Tatarintsev <[email protected]>
@tatarintsevsv tatarintsevsv force-pushed the ucx_selective_completion branch from 0e0953f to b37faa7 Compare October 28, 2025 03:11
Copy link
Contributor

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{
return ucx_proc_rma_msg(ep, msg, flags, UCX_DO_WRITE);
struct ucx_ep *u_ep = container_of(ep, struct ucx_ep, ep.ep_fid);
return ucx_proc_rma_msg(ep, msg, flags | u_ep->ep.tx_op_flags, UCX_DO_WRITE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be return ucx_proc_rma_msg(ep, msg, flags | u_ep->ep.tx_msg_flags, UCX_DO_WRITE);

{
return ucx_proc_rma_msg(ep, msg, flags, UCX_DO_READ);
struct ucx_ep *u_ep = container_of(ep, struct ucx_ep, ep.ep_fid);
return ucx_proc_rma_msg(ep, msg, flags | u_ep->ep.rx_op_flags, UCX_DO_READ);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be return ucx_proc_rma_msg(ep, msg, flags | u_ep->ep.tx_msg_flags, UCX_DO_READ);
(A read is still a transmitted operation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants