RustPython

mirror of https://github.com/RustPython/RustPython.git synced 2026-06-02 19:39:49 +09:00

Files

Changjoon fb1218d6ba Accept surrogates in _json.JsonScanner decode path (#7675 )

The _json decoder had two failure modes when a Python str value would
contain a lone surrogate (legal per the Python 3 str model):

1. Boundary UnicodeEncodeError: JsonScanner::Callable::call rejected
   any input str with surrogates via try_into_utf8 before scanning
   began.
2. Silent U+FFFD corruption: call_scan_once and parse_object's key
   path called .to_string() on scanstring's Wtf8Buf output, which
   routes through Wtf8::Display (lossy). Array values and dict keys
   decoded from JSON \uXXXX escapes silently became U+FFFD.

Switch JsonScanner's five PyUtf8StrRef signatures to PyStrRef, drop
the entry-point try_into_utf8 call, and feed Wtf8Buf directly to
new_str instead of going through .to_string(). Key memoization now
uses HashMap<Wtf8Buf, PyStrRef> so surrogate-bearing keys survive
interning. parse_number takes &[u8] since JSON numbers are ASCII.

Extends the WTF-8 refactor pattern established in #7673 to the
decoder. machinery::scanstring already returns Wtf8Buf and is
unchanged.

Unmasks test_single_surrogate_decode. 214 tests in test.test_json
pass with no regressions. Decoder output verified byte-identical to
CPython 3.13.4 over 10,000 random fuzz cases (JSON docs containing
random surrogate escapes at root/list/dict positions, compared via
json.dumps(..., ensure_ascii=True, sort_keys=True)).

2026-04-25 05:16:12 +09:00

__init__.py

Update test_json from 3.14.3

2026-02-10 21:00:40 +09:00

__main__.py

Add test.test_json from CPython 3.8.2

2020-04-08 12:16:19 -05:00

test_decode.py

Update test_json from 3.14.3

2026-02-10 21:00:40 +09:00

test_default.py

Update test_json from 3.14.3