Files
RustPython/Lib/test/test_json
Changjoon fb1218d6ba Accept surrogates in _json.JsonScanner decode path (#7675)
The _json decoder had two failure modes when a Python str value would
contain a lone surrogate (legal per the Python 3 str model):

1. Boundary UnicodeEncodeError: JsonScanner::Callable::call rejected
   any input str with surrogates via try_into_utf8 before scanning
   began.
2. Silent U+FFFD corruption: call_scan_once and parse_object's key
   path called .to_string() on scanstring's Wtf8Buf output, which
   routes through Wtf8::Display (lossy). Array values and dict keys
   decoded from JSON \uXXXX escapes silently became U+FFFD.

Switch JsonScanner's five PyUtf8StrRef signatures to PyStrRef, drop
the entry-point try_into_utf8 call, and feed Wtf8Buf directly to
new_str instead of going through .to_string(). Key memoization now
uses HashMap<Wtf8Buf, PyStrRef> so surrogate-bearing keys survive
interning. parse_number takes &[u8] since JSON numbers are ASCII.

Extends the WTF-8 refactor pattern established in #7673 to the
decoder. machinery::scanstring already returns Wtf8Buf and is
unchanged.

Unmasks test_single_surrogate_decode. 214 tests in test.test_json
pass with no regressions. Decoder output verified byte-identical to
CPython 3.13.4 over 10,000 random fuzz cases (JSON docs containing
random surrogate escapes at root/list/dict positions, compared via
json.dumps(..., ensure_ascii=True, sort_keys=True)).
2026-04-25 05:16:12 +09:00
..
2026-02-10 21:00:40 +09:00
2026-02-10 21:00:40 +09:00
2026-02-10 21:00:40 +09:00
2025-07-20 18:44:46 +09:00
2026-02-10 21:00:40 +09:00
2025-07-20 18:44:46 +09:00
2025-07-20 18:44:46 +09:00
2025-07-20 18:44:46 +09:00
2025-07-20 18:44:46 +09:00
2026-02-10 21:00:40 +09:00
2026-02-10 21:00:40 +09:00