43 Commits

Author SHA1 Message Date
Chanho Lee
1cb24c5ebb Reject non-ASCII digits in JSON numbers (#7982) 2026-05-27 16:40:52 +09:00
Changjoon
fb1218d6ba Accept surrogates in _json.JsonScanner decode path (#7675)
The _json decoder had two failure modes when a Python str value would
contain a lone surrogate (legal per the Python 3 str model):

1. Boundary UnicodeEncodeError: JsonScanner::Callable::call rejected
   any input str with surrogates via try_into_utf8 before scanning
   began.
2. Silent U+FFFD corruption: call_scan_once and parse_object's key
   path called .to_string() on scanstring's Wtf8Buf output, which
   routes through Wtf8::Display (lossy). Array values and dict keys
   decoded from JSON \uXXXX escapes silently became U+FFFD.

Switch JsonScanner's five PyUtf8StrRef signatures to PyStrRef, drop
the entry-point try_into_utf8 call, and feed Wtf8Buf directly to
new_str instead of going through .to_string(). Key memoization now
uses HashMap<Wtf8Buf, PyStrRef> so surrogate-bearing keys survive
interning. parse_number takes &[u8] since JSON numbers are ASCII.

Extends the WTF-8 refactor pattern established in #7673 to the
decoder. machinery::scanstring already returns Wtf8Buf and is
unchanged.

Unmasks test_single_surrogate_decode. 214 tests in test.test_json
pass with no regressions. Decoder output verified byte-identical to
CPython 3.13.4 over 10,000 random fuzz cases (JSON docs containing
random surrogate escapes at root/list/dict positions, compared via
json.dumps(..., ensure_ascii=True, sort_keys=True)).
2026-04-25 05:16:12 +09:00
Changjoon
2e5c2be7fa Accept surrogates in _json.encode_basestring{,_ascii} (#7673)
encode_basestring/encode_basestring_ascii took PyUtf8StrRef, so
json.dumps(str_with_lone_surrogate) raised UnicodeEncodeError at the
Python/Rust boundary before write_json_string ran. CPython's encoder
emits \uXXXX under ensure_ascii=True and passes raw WTF-8 otherwise.

Switch to PyStrRef + s.as_wtf8(), matching scanstring in the same file.
Rewrite write_json_string to accept &Wtf8 and iterate
code_point_indices, emitting \uXXXX for surrogates in ascii mode and
passing their bytes through otherwise. Stop escaping 0x7F in the
ensure_ascii=False path (matches py_encode_basestring). Return Wtf8Buf
via the checked from_bytes so invariant breaks panic instead of UB.

Fuzzing also exposed two pre-existing ESCAPE_CHARS typos: 0x0B was
"\u000" and 0x1B was "\u001" (both missing trailing 'b'). Fixed here.

Verified byte-identical with CPython 3.13.4 over 16 manual + 10,000
random fuzz cases. Full test.test_json: 214 tests, 0 failures, 0
unexpected successes. Unmasks test_ascii_non_printable_encode and
test_single_surrogate_encode. Decoder path is a follow-up.
2026-04-25 00:08:14 +09:00
Changjoon
175f12b664 Fix stack overflow on deeply-nested JSON in json.loads() (#7632)
* Fix stack overflow on deeply-nested JSON in json.loads()

json.loads() on a deeply-nested array or object payload (e.g.
'[' * 50000 + ']' * 50000) overflowed the native Rust stack and
crashed the interpreter process with SIGSEGV. CPython raises
RecursionError on the same input via _Py_EnterRecursiveCall in
Modules/_json.c.

The recursion lives in the mutual call chain:
  JsonScanner::parse_object / parse_array
    -> JsonScanner::call_scan_once
      -> JsonScanner::parse_object / parse_array

Every descent funnels through call_scan_once, so wrapping its body
with vm.with_recursion covers both '{' and '[' paths (and their
mixed nesting) with a single guard.

Before:
  ./rustpython -c "import json; json.loads('[' * 50000 + ']' * 50000)"
    -> SIGSEGV (exit 139)

After:
  -> RecursionError: maximum recursion depth exceeded while
     decoding a JSON object from a string

Verified:
  - extra_tests/snippets/stdlib_json.py: all assertions pass
    (includes 3 new regression cases: array, object, alternating
    nesting at depth 100000)
  - cargo run -- -m test test_json: 214 passed, 0 regressed
    (9 skipped, 13 expected failures, all pre-existing)
  - depth 500000 no longer crashes (RecursionError)
  - shallow parsing unchanged

* Enable test_highly_nested_objects_decoding

Per @ShaharNaveh's review on #7632: this test was previously marked
`@unittest.skip("TODO: RUSTPYTHON; crashes")` because json.loads
would SIGSEGV on the 500_000-deep input. The recursion-guard added
in this PR makes it raise RecursionError like CPython, so the skip
decorator can be removed.

  $ cargo run -- -m unittest \
        test.test_json.test_recursion.TestCRecursion.test_highly_nested_objects_decoding \
        test.test_json.test_recursion.TestPyRecursion.test_highly_nested_objects_decoding
  ...
  Ran 2 tests in 0.825s
  OK

  $ cargo run -- -m test test_json
  Ran 214 tests (7 skipped, 13 expected failures) — all pass.
2026-04-20 21:52:17 +09:00
ShaharNaveh
f73df6a102 Update test_json from 3.14.3 2026-02-10 21:00:40 +09:00
Jeong, YunWon
100b870175 Implement UTF-32 encode/decode and fix UTF-16 empty encode
- Add UTF-32, UTF-32-LE, UTF-32-BE encode/decode in _pycodecs.py
- Register utf_32 codec functions in codecs.rs via delegate_pycodecs
- Fix PyUnicode_EncodeUTF16 returning "" instead of [] for empty input
- Remove resolved expectedFailure decorators in test_codecs.py
- Add failure reasons to remaining expectedFailure comments
2026-02-02 12:50:34 +09:00
Jeong, YunWon
8f7b1343bc mark and unmark successful/failing tests 2026-01-18 20:00:15 +09:00
Lee Dogeon
5242ff5243 Bump json to 3.14.2 (#6774) 2026-01-18 19:16:48 +09:00
Lee Dogeon
ef871d227e Update json module to 3.13.11 (#6743) 2026-01-16 21:38:15 +09:00
Lee Dogeon
3a702ac772 Improve json.loads performance (#6704)
* Parse JSON in Rust

* Reuse key when decoding JSON

* Unmark resolved test

* Parse null/true/false directly in call_scan_once

Parse JSON constants (null, true, false) directly in Rust within
call_scan_once() instead of falling back to Python scan_once.
This reduces Python-Rust boundary crossings for array/object values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Parse numbers directly in call_scan_once

Parse JSON numbers starting with digits (0-9) directly in Rust within
call_scan_once() by reusing the existing parse_number() method.
This reduces Python-Rust boundary crossings for array/object values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Parse NaN/Infinity/-Infinity in call_scan_once

Parse special JSON constants (NaN, Infinity, -Infinity) and negative
numbers directly in Rust within call_scan_once(). This handles:
- 'N' -> NaN via parse_constant callback
- 'I' -> Infinity via parse_constant callback
- '-' -> -Infinity or negative numbers via parse_constant/parse_number

This reduces Python-Rust boundary crossings for array/object values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Correct wrong index access

* Leave more flame span

* Refactor json scanstring with byte index

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-15 19:53:46 +09:00
Shahar Naveh
ceb7046bc4 Fix int respect sys.set_int_max_str_digits value (#6094) 2025-08-21 13:14:10 +09:00
ShaharNaveh
a5b240aab8 skip crashing test 2025-07-25 16:13:07 +02:00
Shahar Naveh
c497061290 Update json from 3.13.5 (#6007)
* Update `json` from 3.13.5

* Update `test_json` from 3.13.5
2025-07-20 18:44:46 +09:00
Noa
0a07cd931f Fix more surrogate crashes 2025-03-26 23:12:21 -05:00
Noa
a86126419c Fix remaining tests 2025-03-25 19:05:12 -05:00
Jeong Yunwon
2f4000b239 mark faiing tests from test_json 2022-07-19 01:33:15 +09:00
CPython developers
413e8250f0 Update {test_}json from CPython 3.10.5 2022-07-19 01:33:15 +09:00
Dean Li
6f98288e84 test: use import_helper 2021-11-29 21:03:02 +08:00
Dean Li
5ee4fb899b test: use os_helper 2021-11-28 20:51:32 +08:00
Jeong YunWon
913b78ca44 Revert "Merge pull request #3433 from deantvv/test-update"
This reverts commit 9fa5c5ac66, reversing
changes made to e7fa32c687.
2021-11-17 17:06:51 +09:00
Dean Li
49a5805d11 test: use os_helper 2021-11-13 02:18:33 +00:00
Padraic Fanning
e5acfc3a67 Clean up skip in test_json.test_speedups 2021-10-18 22:09:36 -04:00
Padraic Fanning
05f3ef557b Clean up skip in test_json.test_decode 2021-10-18 22:09:17 -04:00
Jeong YunWon
22322fafe7 Merge pull request #2506 from fanninpm/more-expected-failures
Unskip more tests (follow-up from #2443)
2021-02-28 18:34:43 +09:00
Padraic Fanning
f1152a345c Unskip test(s) in test_json.test_unicode 2021-02-25 21:41:02 -05:00
Padraic Fanning
63c3a306c4 Unskip tests in test_json.test_scanstring 2021-02-25 21:39:08 -05:00
Padraic Fanning
4a485c2c70 Unskip test(s) in test_json.test_fail 2021-02-25 21:35:13 -05:00
Noah
491c4e775b Fix json.scanstring unicode handling 2021-02-20 21:04:30 -06:00
Padraic Fanning
6a21d3ce3d Explain test_bytes_decode skip 2021-02-07 15:49:42 -05:00
Padraic Fanning
a5bc2bb909 Explain test_overflow skip 2021-02-07 15:49:42 -05:00
Padraic Fanning
961472e6fe Explain test_bad_escapes skip 2021-02-07 15:49:42 -05:00
Padraic Fanning
c283f64a96 Explain test_surrogates skip 2021-02-07 15:49:42 -05:00
Padraic Fanning
d696eac3a7 Unskip test_truncated_input 2021-02-07 15:49:42 -05:00
Padraic Fanning
c22ecc7347 Explain test_failures skip 2021-02-07 15:49:42 -05:00
Noah
1f4f407d5d Implement json.decoder.scanstring in Rust 2020-10-04 13:04:43 -05:00
Noah
0876c19c04 Unskip test_json.test_tool 2020-08-03 13:20:06 -05:00
Noah
b1aa11bf9e Uncomment some things that were dependent on proper subprocess 2020-06-21 16:47:41 -05:00
Noah
0fb79e1086 Implement _json.encode_basestring{,_ascii} 2020-06-06 15:33:29 -05:00
Noah
84b71c9563 Enable doctest in test_json 2020-05-05 12:23:58 -05:00
Noah
316ee37b38 Mark unsupported tests for _json 2020-04-28 13:45:53 -05:00
Noah
d92cebd953 Unskip tests that depend on \N 2020-04-14 13:06:32 -05:00
Noah
95d12d02ae Mark failing tests for test_json 2020-04-08 12:26:28 -05:00
Noah
9d136d6450 Add test.test_json from CPython 3.8.2 2020-04-08 12:16:19 -05:00