14 Commits

Author SHA1 Message Date
Changjoon
fb1218d6ba Accept surrogates in _json.JsonScanner decode path (#7675)
The _json decoder had two failure modes when a Python str value would
contain a lone surrogate (legal per the Python 3 str model):

1. Boundary UnicodeEncodeError: JsonScanner::Callable::call rejected
   any input str with surrogates via try_into_utf8 before scanning
   began.
2. Silent U+FFFD corruption: call_scan_once and parse_object's key
   path called .to_string() on scanstring's Wtf8Buf output, which
   routes through Wtf8::Display (lossy). Array values and dict keys
   decoded from JSON \uXXXX escapes silently became U+FFFD.

Switch JsonScanner's five PyUtf8StrRef signatures to PyStrRef, drop
the entry-point try_into_utf8 call, and feed Wtf8Buf directly to
new_str instead of going through .to_string(). Key memoization now
uses HashMap<Wtf8Buf, PyStrRef> so surrogate-bearing keys survive
interning. parse_number takes &[u8] since JSON numbers are ASCII.

Extends the WTF-8 refactor pattern established in #7673 to the
decoder. machinery::scanstring already returns Wtf8Buf and is
unchanged.

Unmasks test_single_surrogate_decode. 214 tests in test.test_json
pass with no regressions. Decoder output verified byte-identical to
CPython 3.13.4 over 10,000 random fuzz cases (JSON docs containing
random surrogate escapes at root/list/dict positions, compared via
json.dumps(..., ensure_ascii=True, sort_keys=True)).
2026-04-25 05:16:12 +09:00
Changjoon
2e5c2be7fa Accept surrogates in _json.encode_basestring{,_ascii} (#7673)
encode_basestring/encode_basestring_ascii took PyUtf8StrRef, so
json.dumps(str_with_lone_surrogate) raised UnicodeEncodeError at the
Python/Rust boundary before write_json_string ran. CPython's encoder
emits \uXXXX under ensure_ascii=True and passes raw WTF-8 otherwise.

Switch to PyStrRef + s.as_wtf8(), matching scanstring in the same file.
Rewrite write_json_string to accept &Wtf8 and iterate
code_point_indices, emitting \uXXXX for surrogates in ascii mode and
passing their bytes through otherwise. Stop escaping 0x7F in the
ensure_ascii=False path (matches py_encode_basestring). Return Wtf8Buf
via the checked from_bytes so invariant breaks panic instead of UB.

Fuzzing also exposed two pre-existing ESCAPE_CHARS typos: 0x0B was
"\u000" and 0x1B was "\u001" (both missing trailing 'b'). Fixed here.

Verified byte-identical with CPython 3.13.4 over 16 manual + 10,000
random fuzz cases. Full test.test_json: 214 tests, 0 failures, 0
unexpected successes. Unmasks test_ascii_non_printable_encode and
test_single_surrogate_encode. Decoder path is a follow-up.
2026-04-25 00:08:14 +09:00
ShaharNaveh
f73df6a102 Update test_json from 3.14.3 2026-02-10 21:00:40 +09:00
Jeong, YunWon
100b870175 Implement UTF-32 encode/decode and fix UTF-16 empty encode
- Add UTF-32, UTF-32-LE, UTF-32-BE encode/decode in _pycodecs.py
- Register utf_32 codec functions in codecs.rs via delegate_pycodecs
- Fix PyUnicode_EncodeUTF16 returning "" instead of [] for empty input
- Remove resolved expectedFailure decorators in test_codecs.py
- Add failure reasons to remaining expectedFailure comments
2026-02-02 12:50:34 +09:00
Lee Dogeon
5242ff5243 Bump json to 3.14.2 (#6774) 2026-01-18 19:16:48 +09:00
Lee Dogeon
ef871d227e Update json module to 3.13.11 (#6743) 2026-01-16 21:38:15 +09:00
Shahar Naveh
c497061290 Update json from 3.13.5 (#6007)
* Update `json` from 3.13.5

* Update `test_json` from 3.13.5
2025-07-20 18:44:46 +09:00
Jeong YunWon
22322fafe7 Merge pull request #2506 from fanninpm/more-expected-failures
Unskip more tests (follow-up from #2443)
2021-02-28 18:34:43 +09:00
Padraic Fanning
f1152a345c Unskip test(s) in test_json.test_unicode 2021-02-25 21:41:02 -05:00
Noah
491c4e775b Fix json.scanstring unicode handling 2021-02-20 21:04:30 -06:00
Padraic Fanning
6a21d3ce3d Explain test_bytes_decode skip 2021-02-07 15:49:42 -05:00
Noah
d92cebd953 Unskip tests that depend on \N 2020-04-14 13:06:32 -05:00
Noah
95d12d02ae Mark failing tests for test_json 2020-04-08 12:26:28 -05:00
Noah
9d136d6450 Add test.test_json from CPython 3.8.2 2020-04-08 12:16:19 -05:00