-
Notifications
You must be signed in to change notification settings - Fork 0
[Python] Gradually add type checks to Arrow, initial step #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3548797 to
674f0cd
Compare
db51538 to
67b4698
Compare
3e78327 to
ebb90a2
Compare
python/pyarrow/__lib_pxi/array.pyi
Outdated
| memory_pool: MemoryPool | None = None, | ||
| ) -> BooleanArray: ... | ||
| @overload | ||
| def array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this approach scalable? I notice this doesn't seem to include some extension types such as tensor arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you asking about annotating constructors per type? Overload approach is nice because it expresses the relationship between the argument and return types. We can use a single parameter type union and a single return union, but we then lose the relationship.
FixedShapeTensorArray is on line 4578, I'm not sure about other extension types. If we use type checkers to validate our tests we should get errors if our symbols are not annotated or if annotations disagree with tests. Our test coverage is pretty good and we can assume every symbol is tested at least once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overloads are now removed and docstring sync is automated.
We should also add an automated alert for when symbols don't have annotations but should.
python/pyarrow/__lib_pxi/builder.pyi
Outdated
| from .array import StringArray, StringViewArray | ||
|
|
||
| class StringBuilder(_Weakrefable): | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are docstrings duplicated in the stubs files? Do they need to be kept in sync? Why (and how)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my reply here. I am not yet sure we want to include docstrings here, but if we do we should definitely script their extraction. stubgen --include-docstrings does this but I suspect it doesn't read cython-only symbols.
|
Is the |
b15753e to
024aabc
Compare
It was adopted from |
024aabc to
343acc9
Compare
99c26c3 to
d9adb91
Compare
Would |
Since we already have OTOH, we could put everything in a subdirectory ( |
|
As for docstrings: do they need to be in the repo, or can they be generated when building a wheel file? |
I'll give these a shot, starting with a new folder approach.
I think that's doable and would make for nicer diffs. Tradeof would be docstring generation at build time would make wheel building less robust. Docsting generation is not terribly complex so I'd say it's worth it. |
d9adb91 to
baa30c5
Compare
baa30c5 to
8749967
Compare
8749967 to
1870692
Compare
1870692 to
a0ce53c
Compare
|
Closing this in favor of apache#47609 |
This proposes adding type annotation to pyarrow by adopting pyarrow-stubs into pyarrow. To do so we copy a subset of pyarrow-subs's stubfiles into pyarrow tree. We then add annotation checks for some stubsfiles and some test files and make sure checks pass. Annotation checks should be expanded until all (or most) project files are covered in future work.
PR introduces:
arrow/python/pyarrow/