-
Notifications
You must be signed in to change notification settings - Fork 355
Add ryu float parser. #875
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ahh, I forgot the main, benchmark |
82d04ba to
ad8803a
Compare
ext/json/ext/parser/ryu_json.h
Outdated
| #include <stdbool.h> | ||
| #include <string.h> | ||
|
|
||
| // Detect __builtin_clzll availability (for floor_log2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen simd.h file, but per my poor understanding, __builtin_clzll is not SIMD. I decided to keep it aside from simd.h, but copy the detection pattern. This should be compatible with anything modern (the fast way), except MSVC. It is possible to provide also fast MSVC implementation, but conisdering MSVC Ruby built is rarely used (over other Windows build alternatives), IMHO it is worth of time to maintain.
The gains are indeed pretty significant. The branch needs a bit of a cleanup, unfortunately your fork doesn't allow pushes: So I'll push my cleanup in another branch: master...byroot:json:ryu-float-parser |
|
No idea how to do it :-o, but I have added you to my fork directly @byroot. Feel free to push there. Just please ping me for final review. |
d3f7f06 to
87a6d01
Compare
c07b56a to
062a418
Compare
It's probably because your fork is in an organization. I know organizations can disable this entirely (is the case of Shopify and it's annoying). |
ext/json/ext/parser/parser.c
Outdated
|
|
||
| if (state->cursor == state->end || *state->cursor < '0' || *state->cursor > '9') { | ||
| if (state->cursor == state->end || !rb_isdigit(*state->cursor)) { | ||
| raise_parse_error("invalid number: %s", state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use raise_parse_error_at here, so the error message contain the entire number, not just what is past the exponent. (I'm taking care of it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
After the last few tweeks, we're now closing on 7x speedup: We're still slightly behind The code LGTM at this point, but I've been meaning to add some fuzzing for a while, and that may be worth it here. I also need to dig a bit deeper in the parsing differences for very large numbers (all the |
|
@byroot this is my simple naive fuzzer - https://gist.github.com/simi/d5b4800e9eb050dbefb34f218c8e49b4. |
Co-Authored-By: Jean Boussier <[email protected]>
6bc46f5 to
9c4db31
Compare



As promised @byroot, opening initial draft of porting Ryu raw C float parser.
Original Ryu does 2 passes. Since JSON parser already does 1 pass to explore the data type, first Ryu pass was merged into that one to save one whole additional pass. Once mantissa, exponent and sign are known, it can be passed directly to second pass in
ryu_s2d_from_partsfunction returningdoubleand casted to Ruby VALUE.I did also some simple fuzzing and found out, there is imprecision on some edge cases. I have decided to fix it in the code (that's addition to Ryu, see the additional branch in
json_ryu_parse_float) and fall back in this case to slow parsing. I'm not 100% sure it is actually needed. All original tests are also passing without, not sure this kind of precision is required.I have added also some tests for those edge cases, to make it clear those all works (even it is not simple/asserted to find out in tests which parsing was used).
I have left comments in all files, which were mostly for me, since I'm not well skilled C coder and IEEE 754 expert. Those can be removed, as they doesn't fit the rest of the codebase styling. I have tried to resolve also licensing differences and added my attribution to the parser extraction to make it clear, it was modified. I'm happy to remove that if not needed.
For now this ported only C parser, but there's also JAVA one in Ryu repository if needed.
"fuzzying" results