Skip to content

Conversation

pforderique
Copy link
Contributor

Add the Tokenizer base class from which BytePairEncoding and other future Tokenizers will inherit from.

To see the logs from the Cloud Build CI, please join either our discussion or announcement mailing list.

Copy link
Collaborator

@Linchenn Linchenn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@mattsoulanille mattsoulanille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I have a few minor changes, and some longer comments on some things I've discovered about BytePairTokenizer.

Copy link
Member

@mattsoulanille mattsoulanille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a couple of nits.

@mattsoulanille mattsoulanille merged commit 6b94f63 into tensorflow:master Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants