Skip to content

Conversation

leebyron
Copy link
Contributor

The lexer needed some cleanup, I found myself doing this as part of a Unicode RFC, but factoring all that out to make the Unicode RFC PR easier to follow.

  • Always use hexadecimal form for code values.
  • Remove use of isNaN for checking source over-reads.
  • Defines isSourceCharacter
  • Add more documentation and comments, also replaces regex with lexical grammar
  • Simplifies error messages
  • Adds additional tests

@leebyron leebyron requested a review from IvanGoncharov May 19, 2021 21:59
@leebyron leebyron force-pushed the lexer-cleanup branch 2 times, most recently from 09def45 to 2cd8510 Compare May 19, 2021 22:12
leebyron added a commit that referenced this pull request May 19, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 19, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 19, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 19, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 21, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 21, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 27, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 27, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
@leebyron leebyron mentioned this pull request May 27, 2021
leebyron added a commit that referenced this pull request May 28, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request May 28, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
@IvanGoncharov
Copy link
Member

@leebyron It's hard to review since a lot of changes happened simultaneously.
I also spotted some minor issues I like to fix.
So if you don't mind I will extract some of the changes in separate PRs?

@leebyron
Copy link
Contributor Author

leebyron commented Jun 2, 2021

I'm happy to factor some things out if it helps

@IvanGoncharov
Copy link
Member

I'm happy to factor some things out if it helps

@leebyron Thanks, a lot it would be helpful.
I got carried away for a bit so I already extracted 5 commits and create one of my own 😊
I will push those in a branch for you to review.

@IvanGoncharov
Copy link
Member

@leebyron Can you please take a look at #3155?

@leebyron
Copy link
Contributor Author

leebyron commented Jun 2, 2021

@IvanGoncharov - Applied most of your suggested changes and a few other related clarifying changes in 0d6a096

leebyron and others added 2 commits June 3, 2021 09:51
The lexer needed some cleanup, I found myself doing this as part of a Unicode RFC, but factoring all that out to make the Unicode RFC PR easier to follow.

* Always use hexadecimal form for code values.
* Remove use of `isNaN` for checking source over-reads.
* Defines `isSourceCharacter`
* Add more documentation and comments, also replaces regex with lexical grammar
* Simplifies error messages
* Adds additional tests
Co-authored-by: Ivan Goncharov <[email protected]>
leebyron added a commit that referenced this pull request Jun 3, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request Jun 3, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
@IvanGoncharov IvanGoncharov added PR: polish 💅 PR doesn't change public API or any observed behaviour and removed PR: internal 🏠 labels Jun 3, 2021
@IvanGoncharov IvanGoncharov merged commit fd3d8c9 into main Jun 3, 2021
@IvanGoncharov IvanGoncharov deleted the lexer-cleanup branch June 3, 2021 17:34
leebyron added a commit that referenced this pull request Jun 3, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request Jun 3, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request Jul 1, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request Jul 1, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request Jul 1, 2021
Depends on #3115

Implements RFC at graphql/graphql-spec#849.

* Replaces `isSourceCharacter` with `isUnicodeScalarValue`
* Adds `isSupplementaryCodePoint`, used in String, BlockStrings, and Comments to ensure correct lexing of JavaScript's UTF-16 source.
* Updates `printCodePointAt` to correctly print supplementary code points.
* Adds variable-width Unicode escape sequences
* Adds explicit support for legacy JSON-style fixed-width Unicode escape sequence surrogate pairs.
* Adds `printString` to no longer rely on `JSON.stringify`. Borrows some implementation details from Node.js internals for string printing.

  Implements:

  > When producing a {StringValue}, implementations should use escape sequences to
  > represent non-printable control characters (U+0000 to U+001F and U+007F to
  > U+009F). Other escape sequences are not necessary, however an implementation may
  > use escape sequences to represent any other range of code points.

Closes #2449

Co-authored-by: Andreas Marek <[email protected]>
leebyron added a commit that referenced this pull request Sep 1, 2025
Depends on #3115

Implements graphql/graphql-spec#794

Adds:

* DOT punctuator in lexer
* Improvements to lexer errors around misuse of `.`
* Minor improvement to parser core which simplified this addition
* `SchemaCoordinate` node and `isSchemaCoodinate()` predicate
* Support in `print()` and `visit()`
* Added function `parseSchemaCoordinate()` since it is a parser entry
point.
* Added function `resolveSchemaCoordinate()` and
`resolveASTSchemeCoordinate()` which implement the semantics (name
mirrored from `buildASTSchema`) as well as the return type
`GraphQLSchemaElement`

---------

Co-authored-by: Benjie Gillam <[email protected]>
Co-authored-by: Mark Larah <[email protected]>
Co-authored-by: Yaacov Rydzinski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: polish 💅 PR doesn't change public API or any observed behaviour
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants