-
Notifications
You must be signed in to change notification settings - Fork 9
Replace row/column based Location with byte-offsets. #4
Conversation
| @@ -0,0 +1,22 @@ | |||
| [package] | |||
| name = "ruff_text_size" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to move ruff_text_size into this repository because:
- I want to avoid git submodules in our main repository
ruff_text_sizeshould rarely change
29ff5d8 to
b16267a
Compare
|
Yeah I think the biggest question for me here is the branching strategy... What I've done in the past is had |
I don't have any experience with this but we could try to creat a new upstream branch that matches RustPython's main and we merge our changes into our |
|
Using merge seems like the most straightforward approach from all the approaches listed in the friendly forks blog post |
e53251a to
9787548
Compare
|
I haven't fully figured out how we want to keep our branch up to date, but rebasing merges seems promising, which is used by git-windows. It's currently hard to figure out the details without having a real merge to play through. That's why I want to merge this PR and look into documenting the merge strategy later. |
b59d7ef to
53bef1a
Compare
|
I'd like to share I am going to fully merge this changes into RustPython upstream. 2 motivations.
One question is how much RustPython change is required to adapt this to upstream. For now, I am waiting you to find the best design to enhance performance of ruff. Please let me know if the patch is stablized and you are satisfied. |
Oh wow, this is huge for us! I'm happy to give a hand with upstreaming these changes. Something I noticed that requires change is that
The design is finalized and I plan to merge this PR and the PR to ruff by the end of this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there more patches immediately mergable into rustpython?
Otherwise I will try to ship this patch on RustPython late this week and next week.
| pub use crate::constant::*; | ||
| pub use crate::Location; | ||
| pub use ruff_text_size::{TextSize, TextRange}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rustpython_parser::text_size if the only direct dependency is rustpython-parser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I understand the suggestion. Do you want to move ruff_text_size to the rustpython-parser or re-export the types for crates that only depend on rustpython_parser? Having the type in a separate crate does have advantages for us, for example, ruff_diagnostics no longer depends on rustpython_parser, which helps speed up our compile times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure it is only used to replace Location or have more usage in ruff. Thank you!
eb818e2 to
5ffa8a1
Compare
|
Current dependencies on/for this PR: This comment was auto-generated by Graphite. |
8f01c4f to
767772f
Compare
767772f to
a42e77a
Compare
|
Woohoo! |
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9
- Split parser core and compiler core. Fix RustPython#14 - AST int type to `u32` - Updated asdl_rs.py and update_asdl.sh fix RustPython#6 - Use `ruff_python_ast::SourceLocation` for Python source location. Deleted our own Location. - Renamed ast::Located to ast::Attributed to distinguish terms for TextSize and SourceLocation - `ast::<Node>`s for TextSize located ast. `ast::located::<Node>` for Python source located ast. - And also strictly renaming `located` to refer only python location related interfaces. - `SourceLocator` to convert locations. - New `source-code` features of to disable python locations when unnecessary. - Also including fully merging astral-sh/RustPython#4 closes RustPython#9

This PR replaces RustPython's
Location { row: u32, column: u32 }withTextSize, a byte offset from the beginning of the file. The motivation of the change is that:Locationis 8 bytes, whereasTextSizeis only 4 bytes -> an 8 byte size reduction for every LexerItemand 12 bytes forLocated(This PR also changesOption<Location>toTextSize).Other notable changes
Locatednow stores aTextRangewhere theend_locationis no longer optional (exposed vialocated.end()).Itemtype of theLexerfrom(Location, Tok, Location)to(Tok, TextRange)See astral-sh/ruff#3931 for more details and the upstream work necessary in ruff to use byte offsets.
Micro Benchmarks
Created running
cargo bench -p ruff_benchmark --bench parserin the ruff repository.This change improves parsing and visiting performance by about 10%.
Downsides
The main downside of this change is that we now have to maintain our own fork of RustPython and that we would need to convert our AST to RustPython's AST if we ever decide to use RustPython to run e.g. tests or plugins.
Another downside is that this change is based on the assumption that it isn't necessary to compute row/column information for most parsed files, as is the case when linting a project that has adopted ruff.
I'm convinced that the speedups justify the change.
Target branch
This is a bit unclear to me. It could make sense to not merge to
mainor create a newupstreambranch that keepsLocationand is where we pull upstream changes in (or branch off when contributing upstream).