gh-150638: Improve performance of json.loads and json.load for numeric data#150639
Open
eendebakpt wants to merge 5 commits into
Open
gh-150638: Improve performance of json.loads and json.load for numeric data#150639eendebakpt wants to merge 5 commits into
eendebakpt wants to merge 5 commits into
Conversation
Add a fast path to _match_number_unicode for integers that fit in a 64-bit integer (at most 19 decimal digits): accumulate the value directly into an unsigned long long instead of allocating a PyBytes and calling the generic PyLong_FromString. Positive values use PyLong_FromUnsignedLongLong; negatives within long long range use PyLong_FromLongLong; larger integers fall back to the previous path. For floats and big integers, copy the (always-ASCII) number text into a stack buffer for the common short case to avoid the PyBytes allocation, and call PyOS_string_to_double directly for floats. Benchmarks (optimized free-threaded build): * pyperformance json_loads: 1.06x faster overall * microbench: small int arrays ~2x, 20-int doc 1.48x, mixed dict 1.16x All test_json tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
_match_number_unicode()(the C accelerator behindjson.loads) previously allocated aPyBytesobject for every number, copied the digits into it, and then called the genericPyLong_FromString/PyFloat_FromStringparsers.This PR parses the common cases directly from the already-scanned text.
json.loads, number-heavy document (script below)json.load, same document via file objectbm_json_loadsThe standard
bm_json_loadsdocument is string/dict-dominated, so it gainsless.
Benchmark script