* sqlite3: fix Blob.__setitem__ value range validation
Previously, assigning an out-of-range integer (negative or > 255) or an
integer too large for i64 (e.g. 2**65) to a Blob index raised OverflowError
instead of ValueError.
Mirror CPython's ass_subscript_index logic:
- Convert the value via to_i64(), treating any overflow as -1
- Validate the result is in [0, 255], raising ValueError("byte must be in range(0, 256)") otherwise
- Separate deletion error messages: "item deletion" for index, "slice deletion" for slice
* sqlite3: fix Blob.__setitem__ negative-step slice write
In the step != 1 branch of Blob.ass_subscript, the loop used
i_in_temp += step as usize
where step is isize. For negative steps (e.g. step = -2),
(-2isize) as usize = 18446744073709551614
causing an out-of-bounds panic whenever slice_len >= 2.
Fix: use SaturatedSliceIter (already used by the read path) to iterate
over the correct absolute blob indices, then map each index back to a
temp buffer offset via abs_idx - range_start.
Also fix a Clippy lint: replace
val < 0 || val > 255
with the idiomatic
!(0..=255).contains(&val)
Add a regression test in extra_tests/snippets/stdlib_sqlite.py that
exercises blob[9:0:-2] (negative step, slice_len=5).
* fix: guard blob negative-step snippet from CPython 3.11 bug
* style: add blank line after import sys in stdlib_sqlite snippet (ruff)
* Update extra_tests/snippets/stdlib_sqlite.py
---------
Co-authored-by: Jeong, YunWon <69878+youknowone@users.noreply.github.com>
* Do not call `import socket` on each send()/recv() when using rustls
Use method references cached during socket creation.
* Implement reading of at most one TLS record from socket
Previous algorithm didn't take into account that recv() may return less
data than requested even for blocking sockets.
* Remove special handling of rustls "buffer full" errors
First of all, existing code does not really work and this leads to an
infinite loop: https://github.com/RustPython/RustPython/issues/7891
Second, this should never happen when rustls used properly (wrt
wants_read() and wants_write()) and thus all such errors are
implementation bugs that must be properly fixed.
* Replace own TlsConnection with rustls::Connection
* Fix waiting on a socket
1) Ensure that socket_wait() called from TLS glue code allows threads
2) Ensure that socket_wait() called from TLS glue code properly handles
EINTR on *nix
3) Ensure that select() or poll() error conditions are checked
4) Use poll() on *nix so socket descriptor values are not limited
* Remove dead code from rustls glue
* Do not present rustls errors as OSError(0, "Success")
* Remove infinite loop "detection" from rustls glue
TLS handshake cannot be infinite. Any infinite loop here is a serious
bug in implementation and should be fixed properly.
This code triggers in some cases (very short reads) with misleading
`ssl_error.SSLWantReadError: The operation did not complete (read)`.
* Add test for 1-byte max recv in TLS client
* Add regression test for https://github.com/RustPython/RustPython/issues/7891
* Fix constants in rustls glue code
* Deduplicate verify flags / record-size constants
* Larger "max encrypted TLS record length"
* PyBytes.title should be ASCII-only.
* Use icu_casemap over unicode-casing for titles
`icu_casemap` is consistently maintained, official, and tracks the
latest Unicode versions. RustPython is also using other `icu4x` crates,
so using `icu_casemap` is more consistent.
As with islower and isupper, tracking the latest Unicode version is
important because character definitions shift over time which causes
discrepancies between RustPython and CPython.
This commit fixes title().
* Use icu_casemap for capitalize()
I dropped unicode-casing because it's cleaner to use icu4x for
everything. `icu4x` will also stay up to date whereas unicode-casing
will need to be periodically updated with new Unicode tables. Dropping
unicode-casing also removes some binary bloat due to the tables.
`capitalize()` mimics CPython behavior more closely now as well.
Notably, I implemented CPython's sigma edge case handler.
* Match CPython's title() exactly
CPython's `builtin_compile_impl` (Python/bltinmodule.c) accepts only
optimize ∈ {-1, 0, 1, 2}; anything else raises
`ValueError("compile(): invalid optimize value")`. The previous logic
only validated via `i32::try_into::<u8>()`, which silently accepts every
value in [0, 255], so `compile(..., optimize=3)`, `optimize=99`, etc.
were silently truncated to a u8. The error wording also had the wrong
word order.
Replace the cast-based check with a `match` against the spec range.
Adjacent: the unrecognised-flags message used American spelling
("unrecognized") and missed the colon separator. CPython uses British
"unrecognised" with a colon — match it.
Verified byte-identical with CPython 3.14.4 across 12 boundary values
for optimize and 4 cases for flags. Preserves the existing OverflowError
path for `optimize=1 << 1000` (raised at the ArgPrimitiveIndex<i32>
conversion layer, before this check).
CPython raises `OverflowError("cannot convert float infinity to integer")`
and `ValueError("cannot convert float NaN to integer")` from
`Objects/floatobject.c::float___trunc___impl` and friends. The exception
type name is added by Python's traceback display layer; the message
itself should not duplicate it.
`try_to_bigint` was producing
`OverflowError("OverflowError: cannot convert ...")` etc., which made
`repr(e)` and any code path that inspects `str(e)` diverge from CPython.
Affects all 5 callers of `try_to_bigint`: `__int__`, `__floor__`,
`__ceil__`, `__round__` (no-arg), `__trunc__` — i.e. `int(x)`,
`math.floor/ceil/trunc(x)`, `round(x)` for non-finite floats.
Verified byte-identical with CPython 3.14.4 across 14 affected sites.
* Round float at the decimal level to match CPython's _Py_dg_dtoa
CPython's `float.__round__` (Objects/floatobject.c) routes through
`_Py_dg_dtoa` and rounds at the decimal level. The previous
`round_float_digits` multiplied by 10**ndigits and rounded at the
IEEE 754 binary level, which diverges for values that aren't exactly
representable. For example, 2.675 stores as 2.67499...; dtoa correctly
rounds it down to 2.67, but `(2.675 * 100.0).round() / 100.0` lands on
2.68 because the multiplication produces a phantom 267.5 tie that
round-half-to-even snaps up.
Rust's `{:.*}` float formatting uses dtoa-style algorithms (Grisu3 +
Dragon4 fallback) and matches CPython's `_Py_dg_dtoa` byte-for-byte.
Replace the multiply-then-round step with `format!` + `parse` for
ndigits >= 0. The ndigits < 0 path is unchanged because dividing
typical inputs by 10**|ndigits| produces genuine ties rather than
synthesizing them.
Verified byte-identical with CPython 3.14.4 over a 108-case random
fuzz plus targeted half-tie probes. Unmasks
`test_float.RoundTestCase.test_matches_float_format` and
`test_previous_round_bugs`.
* Use #[expect] with reason for float_cmp suppression
Co-authored-by: ShaharNaveh <50263213+ShaharNaveh@users.noreply.github.com>
---------
Co-authored-by: ShaharNaveh <50263213+ShaharNaveh@users.noreply.github.com>
float_ops::divmod, mod_, and floordiv each carried their own conversion
from Rust's dividend-sign `%` to CPython's divisor-sign convention. Both
divmod and mod_ mishandled the zero-remainder case where the dividend
is a non-zero exact multiple of the divisor (e.g. divmod(6.0, -3.0),
6.0 % -3.0): the sign-correction branch fired on a zero remainder and
produced (-3.0, -3.0) and -3.0 respectively, violating the magnitude
invariant 0 <= abs(r) < abs(b). divmod also leaked the wrong signed-
zero quotient when the true quotient was zero (divmod(-1.0, -2.0)
returned (-0.0, -1.0) instead of (+0.0, -1.0)).
These are independent bugs in two functions, but both come from the
same root cause: zero-remainder needs a separate path from the sign-
correction branch.
Mirror CPython's `_float_div_mod` (Objects/floatobject.c) by making
divmod the canonical implementation and turning mod_ and floordiv into
thin wrappers. divmod(a, b) == (a // b, a % b) now holds by
construction.
Closes#7722
PR #7727 added an overflow guard in float_pow's negative-base
non-integer-exponent branch (which delegates to complex_pow). The
remaining else branch — covering positive base or negative-base
integer exponent — directly returned v1.powf(v2) without inspecting
the result, so finite inputs that produced an out-of-range value
silently leaked f64::INFINITY (or -INFINITY) instead of raising.
Examples that now raise OverflowError as in CPython:
pow(2.0, 2000)
pow(10.0, 400)
pow(-2.0, 2001)
pow(0.5, -2000)
pow(1e150, 3)
Mirror the inline overflow guard already used in complex.rs::complex_pow:
when the result is infinite but neither input is, raise OverflowError.
The both-finite check preserves intentional infinities like
pow(inf, 2.0).
Closes#7729
* Fix del obj.__dict__ to match CPython behavior (issue #5355)
* Address CodeRabbit concerns: fix GC clearing and improve thread safety of lazy __dict__ recreation
* Fix del obj.__dict__: improve GC safety, implement lazy re-creation in setattr, and enable passing CPython tests
* Restore expectedFailure for test_has_inline_values
* Fix ObjExt::new call site to include has_dict parameter
* Remove stray test.py to avoid CI syntax errors
* Remove debug txt files and clean test_class.py comments
* Delete Lib/test/test_class.py
* Restore test_class.py with correct changes (remove expectedFailure, no deletion)
* Fix clippy warnings: remove unused into_inner, collapse nested if-let
* Fix rustfmt formatting and ruff PEP 8 E302 blank line
* Align __dict__ error messages and ensure safety for function/partial objects
* Fix compilation errors: change &self to &Py<Self> in __dict__ methods
* Fix compilation errors: resolve borrow-after-move and replace transpose on PySetterValue
- type.rs: Replace invalid .transpose() on PySetterValue with explicit
match on Assign/Delete variants in subtype_set_dict
- function.rs: Fix borrow-after-move in set___dict__ by capturing class
name before downcast; use as_object() for instance_dict/set_dict calls
- _functools.rs: Same borrow-after-move fix and as_object() calls for
PyPartial's __dict__ getter/setter
* Fix compilation errors: resolve borrow-after-move and replace transpose on PySetterValue
* Fix snippet formatting and mark test_remote as expected failure
* Fix test_remote by removing HAS_DICT flag from function type
* Fix lint formatting error
* Remove unnecessary print statement in test_del_dict
* Fix trailing newlines in snippet test
* Trigger CI
* Align __dict__ generic setter behavior
* Move __dict__ deletion tests to relevant snippets
---------
Co-authored-by: Jeong, YunWon <jeong@youknowone.org>
* Enforce int_max_str_digits on int-to-str conversions
The str-to-int direction already enforced sys.get_int_max_str_digits()
via bytes_to_int; the int-to-str direction did not. CPython 3.14 enforces
both per PEP 644.
Adds check_int_to_str_digits helper in builtins::int (bit-count fast path
+ digit upper-bound from log10(2)), wired into the four Python-level
entry points: repr, the str fast path in protocol::object, int.__format__
(decimal/n/empty spec only — binary bases x/o/b are exempt per CPython),
and the DecimalD/I/U branches of vm::cformat for both str % and bytes %.
Unmasks 8 expectedFailure tests across test_int (max_str_digits, DoS
prevention, int_from_other_bases — each mirrored in IntSubclass),
test_ast (test_repr_large_input_crash) and test_reprlib (test_numbers).
Boundary cases (4299/4300/4301 digits at limit=4300) match CPython 3.14.4.
* Skip int-to-str DoS test on platforms without time.get_clock_info
The test_denial_of_service_prevented_int_to_str regression test uses
support.Stopwatch, which calls time.get_clock_info('monotonic'). In
RustPython that function is gated to unix/windows targets only, so on
wasm32-wasip1 it surfaces as AttributeError and breaks the wasm-wasi CI.
Guard the test with skipUnless(hasattr(time, 'get_clock_info'), ...) so
it runs everywhere it can and is skipped on wasm.
Also narrow is_decimal_int_format to Number(Case::Lower): 'N' is rejected
by format_int as UnknownFormatCode, so excluding it preserves that error
path instead of intercepting it with the digit-limit check.
* Add TODO: RUSTPYTHON marker to skipUnless reason
scripts/update_lib uses TODO: RUSTPYTHON markers inside unittest
decorator reason strings to identify and migrate custom RustPython
patches across CPython library updates.
* Use expectedFailureIf for wasm get_clock_info gap
skipUnless silently hides the test forever; expectedFailureIf surfaces
unexpected success once RustPython implements time.get_clock_info on
wasm, prompting marker removal.
* Report invalid \uXXXX escape position at the u character
CPython's json decoder reports the position of the `u` specifier
when a \uXXXX escape fails to parse, but RustPython was reporting
the preceding `\`. For surrogate-pair cases (\uXXXX\uYYYY) the
second call was passing char_offset + next_char_i + 1, which
lands on the first hex digit of the first escape -- unrelated to
the actual failure site.
Pass next_char_i (position of the primary `u`) to the primary
decode_unicode call, and capture the second `u`'s char index from
the next_tuple peek to pass to the surrogate-pair decode_unicode
call.
Verified: 13 targeted probes across invalid-hex, short, and pair
cases now all match CPython positions. test.test_json 214 tests
pass with no regressions.
* Add regression test for invalid \uXXXX escape position
* Use raise AssertionError instead of assert False (B011)
* Match CPython error type for non-ASCII struct format arguments
Struct() raised the wrong exception type when the format argument
contained non-ASCII characters:
- str input with non-ASCII char: RustPython raised UnicodeDecodeError
with an empty message; CPython raises UnicodeEncodeError as if
format.encode('ascii') had been called directly.
- bytes input with non-ASCII byte: same wrong UnicodeDecodeError;
CPython passes the bytes through to the format parser, which then
errors with struct.error("bad char in struct format").
Restructure IntoStructFormatBytes::try_from_object to:
- raise UnicodeEncodeError("ascii", s, start, start+1, "ordinal not
in range(128)") for non-ASCII str, with start computed as the
first non-ASCII code point position (matching CPython's natural
encoding-error format);
- raise struct.error("bad char in struct format") for non-ASCII bytes,
produced via the existing new_struct_error helper.
Probed byte-identical with CPython 3.14.4 for both cases. Full
test.test_struct (43 tests) passes with no regressions. Sanity-tested
all standard format/pack/unpack/calcsize call shapes remain unchanged.
* Add regression test for non-ASCII format string error types
* Use raise AssertionError instead of assert False (B011)
This PR fixes a regression from my last islower/isupper patch.
Python's Bytes doesn't assume an encoding, so methods like islower
should only consider ASCII casing.
I updated islower/isupper for UTF-8 and WTF-8 to match CPython more
closely. The two functions now use the same properties as CPython and
now match CPython exactly.
I updated the unit tests to pass on Python 3.15. Unicode updates
sometimes cause properties to shift. I previously tested everything on
Python 3.14, but that lead to failures that I assumed were bugs but were
actually due to Unicode differences. For example, U+0295 is a lower case
letter in older Unicode versions but is NOT in newer versions.
One of the new tests is disabled on Python 3.14 for now because it will
fail in CI till CI is bumped to 3.15.
* Fix process abort on large float format precision
Formatting a float with large precision (>= ~65535) aborted the
interpreter instead of raising a Python exception. CPython handles
the same input by returning a clean string.
# Before
./rustpython -c "print(f'{1.5:.1000000}')"
thread 'main' panicked at crates/literal/src/float.rs:135:
Formatting argument out of range (exit 101, abort)
# After
./rustpython -c "print(f'{1.5:.1000000}')"
1.5
Root cause: Rust's `format!("{:.*}", n, x)` panics when `n`
exceeds the fmt runtime's internal precision limit. `format_fixed`
already caps `n` at u16::MAX, but `format_general` and
`format_exponent` (and the `%` branch in `crates/common/src/format.rs`)
passed user-supplied precision straight through to `format!`.
Fix:
* Introduce `FMT_MAX_PRECISION` + `clamp_fmt_precision()` in
crates/literal/src/float.rs. Cap is `u16::MAX - 1` because
`{:.*e}` hits a second panic (`ndigits > 0` in core flt2dec)
at exactly u16::MAX; the smaller value covers both paths.
* Apply the helper to `format_fixed` (replacing the existing
ad-hoc cap), `format_exponent` (entry), and `format_general`
(three separate format! calls with saturating arithmetic on
derived precision values).
* Apply the helper in the `FormatType::Percentage` branch in
crates/common/src/format.rs.
This is harmless for all normal inputs — f64 carries only ~17
significant digits, so precision beyond 65K is padding zeros at
best. Complex-number and old-style `%`-formatting paths transitively
benefit because they dispatch to the same library functions.
Verified:
* cargo run -- -m test test_float test_fstring test_format:
144 passed, 0 regressed.
* extra_tests/snippets/builtin_format.py: all assertions pass,
including 7 new regression cases covering e / E / g / G / f /
% at precision 1_000_000.
* Probed with 10 magnitude values (0, ±1.5, ±inf, nan, 1e-300,
1e300, f64::MAX, 5e-324) x 4 format types = 40 combinations,
plus precision 0/1/2 boundary, complex formatting, old-style
`%` formatting, and combined specs (fill/align/sign/grouping/
zero-pad). All return clean strings; no process abort.
* Address CodeRabbit review: split cap + drop redundant clamp
Two refinements after CodeRabbit review:
1. Drop the redundant `format!("{:.*}", precision + 1, base)` in
`format_general`'s scientific branch. It was a no-op pre-fix
(magnitude is `.abs()`-ed at the caller, so `base` has no sign
and its length was exactly `precision + 1`), but after I added
the cap it turned into an active truncate — dropping 1 char of
precision at the cap boundary. Reuse `base` directly and extract
`exp_precision` for reuse by `decimal_point_or_empty`.
2. Split the cap into two helpers.
`FMT_MAX_PRECISION = u16::MAX` — for plain `{:.*}` (format_fixed,
%-branch, format_general's
non-scientific branch).
`FMT_MAX_EXP_PRECISION = u16::MAX - 1` — for `{:.*e}` (format_exponent,
format_general's scientific
entry).
The second value is one lower because `{:.*e}` trips an additional
`ndigits > 0` assertion in `core::num::flt2dec` at exactly
`u16::MAX`. The first commit used the tighter cap uniformly,
which silently regressed `format_fixed` by 1 char at
`precision == u16::MAX` (it previously capped at exactly that
value). Two helpers restore byte-identical CPython parity for
fixed / percent / general-non-scientific paths up through
`precision == u16::MAX`.
Verification:
* precision 5 .. 65534: 360 outputs byte-identical to CPython
across 8 magnitudes x 9 precisions x 5 types.
* precision == 65535: f / g / G / % now match CPython (0 diff).
e / E remain 1 char shorter — unavoidable
within the `u16::MAX - 1` exp cap.
* precision > 65535: output stops at cap; CPython emits full
padding — same design divergence as before.
* No panic regression: f-string default, e/E, g/G, %, f at
precision 1_000_000 all return cleanly.
* Test suite: test_float + test_fstring + test_format,
162 passed, 0 regressed.
* Fix ruff format: single-line precision clamp
* Address @youknowone review: byte-identical CPython parity at boundary
Per review comment on `extra_tests/snippets/builtin_format.py:209`:
the patch declares `FMT_MAX_PRECISION = u16::MAX`, so the tests must
cover 65535 and 65536 and demonstrate CPython parity at the boundary.
The previous version only avoided panic — at the cap it silently
truncated 1 char short of CPython for e / E, and thousands of chars
short for f / % at precision beyond the cap. This commit restores
byte-identical CPython output at every precision up to the format-
spec parser's own `i32::MAX` ceiling.
Fix: pad the Rust-format result with '0's up to the user-requested
precision.
Why this is correct, not a workaround: IEEE 754 double has at most
~767 significant decimal digits; past that, every digit is
deterministically '0' in both CPython and the native Rust output.
Our cap (65534 for exp, 65535 for plain) sits far above 767, so
appending zeros reconstructs precisely what CPython would have
produced. Verified on hard inputs: `1e-100`, `5e-324` (subnormal
boundary), `f64::MAX`, mixed magnitudes — the last 100 chars of
Rust-format output at precision 65534 are all '0' for every case.
Changes:
* `format_fixed`: after format!(), extend with (precision - capped)
'0' chars before appending the optional decimal point.
* `format_exponent`: same, applied to the parsed mantissa before
reassembling with the exponent marker.
* `FormatType::Percentage` branch: same. Also fixed a bug the
boundary audit surfaced: the finite-input overflow guard used
`return Ok("inf%")`, which bypasses the outer sign handler.
Changed to a match-arm value so `format_sign_and_align` still
runs and produces "-inf%" for `-f64::MAX`, matching CPython.
Verification:
* 7 magnitudes × 5 precisions × 6 format types = 210 comparisons
against CPython at precisions {65534, 65535, 65536, 100000,
200000}. All 210 byte-identical.
* Gap audit (complex formatting, old-style % formatting, negative
magnitudes, -0.0, combined specs with fill / sign / alternate /
grouping) at boundary precisions. All but 20 byte-identical.
The 20 remaining diffs all stem from a pre-existing
complex-imaginary-part repr bug (`1e100j` expands to 100 '0's
in RustPython vs CPython's `1e+100j`) which reproduces on
upstream main without any part of this patch and is out of
scope here.
* `cargo run -- -m test test_float test_fstring test_format`:
162 passed, 0 regressed.
* `extra_tests/snippets/builtin_format.py` now pins exact
expected strings at 65534 / 65535 / 65536 / 1_000_000 for
every format type, plus the `f64::MAX × 100 → 'inf%'`
overflow case.
* `cargo fmt --check`: pass.
* Clarify boundary test labels + add past-cap depth assertions
Rename the boundary-test section so the three precision points per
format type are labeled below / at / past the cap inline, making the
"past MAX_PRECISION" unhappy-case coverage explicit. Add len-based
assertions at precision 1_000_000 for f, e, and % to exercise the
cap-then-pad path at a depth far beyond the boundary.
* Fix complex repr to use scientific notation for large integer-valued components
repr of a complex number whose real or imaginary part is an integer-valued
float with |x| >= 1e16 emitted the full decimal expansion instead of
scientific notation, diverging from CPython:
Before (RustPython):
repr(1e100 + 1e100j)
(10000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000+1000000000000000
000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000j)
After / CPython:
(1e+100+1e+100j)
Root cause in crates/literal/src/complex.rs::to_string — it bifurcated
each component by .fract() == 0.0:
if im.fract() == 0.0 { im.to_string() } // Rust's default Display
else { float::to_string(im) } // scientific for large/small
Rust's Display never uses scientific notation, so any integer-valued f64
(including 1e16, 1e17, 1e100 which are exactly representable as integers)
routed through the wrong branch and produced the full decimal expansion.
Non-integer magnitudes reached float::to_string and rendered correctly.
The fix is to use one helper per component that implements CPython's
actual PyOS_double_to_string(format='r') rule: scientific notation when
|x| < 1e-4 or |x| >= 1e16, otherwise Rust's default Display (which drops
the trailing '.0' for integer-valued floats — matching CPython's
(1+2j) convention rather than (1.0+2.0j)). The threshold matches
float::to_string; the only behavioral difference is that complex
components render 1.0 as "1" rather than "1.0".
Verified:
* 29 CPython reference cases (normal / boundary / extremes / special /
signed-zero) — all byte-identical after fix.
* 18 additional edge cases (subnormal 5e-324, f64::MAX, MIN_POSITIVE,
DBL_EPSILON, threshold-straddling values) — all byte-identical.
* Lib/test/test_complex.py::test_repr_str /
test_negative_zero_repr_str / test_repr_roundtrip — all pass.
* cargo run -- -m test test_complex — 37 passed.
* cargo run -- -m test test_float test_long — 101 passed.
* ast.unparse() round-trip of source containing complex literals
(e.g. 1e100 + 1e-100j, 1e17 + 1j) produces CPython-identical output.
* extra_tests/snippets/builtin_complex.py — 20+ new regression cases.
* Address CodeRabbit review: clarify threshold boundary test comment
The comment claimed all three assertions stay in non-scientific form,
but the 1e-5 case explicitly verifies scientific notation (since
|1e-5| < 1e-4 falls outside the decimal-form range). Reworded the
header to describe the axis being tested (threshold boundary) and
added per-case inline notes indicating each assertion's expected
form.
* Fix stack overflow on deeply-nested JSON in json.loads()
json.loads() on a deeply-nested array or object payload (e.g.
'[' * 50000 + ']' * 50000) overflowed the native Rust stack and
crashed the interpreter process with SIGSEGV. CPython raises
RecursionError on the same input via _Py_EnterRecursiveCall in
Modules/_json.c.
The recursion lives in the mutual call chain:
JsonScanner::parse_object / parse_array
-> JsonScanner::call_scan_once
-> JsonScanner::parse_object / parse_array
Every descent funnels through call_scan_once, so wrapping its body
with vm.with_recursion covers both '{' and '[' paths (and their
mixed nesting) with a single guard.
Before:
./rustpython -c "import json; json.loads('[' * 50000 + ']' * 50000)"
-> SIGSEGV (exit 139)
After:
-> RecursionError: maximum recursion depth exceeded while
decoding a JSON object from a string
Verified:
- extra_tests/snippets/stdlib_json.py: all assertions pass
(includes 3 new regression cases: array, object, alternating
nesting at depth 100000)
- cargo run -- -m test test_json: 214 passed, 0 regressed
(9 skipped, 13 expected failures, all pre-existing)
- depth 500000 no longer crashes (RecursionError)
- shallow parsing unchanged
* Enable test_highly_nested_objects_decoding
Per @ShaharNaveh's review on #7632: this test was previously marked
`@unittest.skip("TODO: RUSTPYTHON; crashes")` because json.loads
would SIGSEGV on the 500_000-deep input. The recursion-guard added
in this PR makes it raise RecursionError like CPython, so the skip
decorator can be removed.
$ cargo run -- -m unittest \
test.test_json.test_recursion.TestCRecursion.test_highly_nested_objects_decoding \
test.test_json.test_recursion.TestPyRecursion.test_highly_nested_objects_decoding
...
Ran 2 tests in 0.825s
OK
$ cargo run -- -m test test_json
Ran 214 tests (7 skipped, 13 expected failures) — all pass.
* Fix struct_time field overflow to raise OverflowError in time module
* Address CodeRabbit review: cover tm_gmtoff and chain AssertionError
* Fix ruff format: single space before inline comment
Rust and Python differ in which properties they use for alphanumeric,
numeric, et cetera. Both languages list which properties are used which
makes it easy to mimic Python's behavior in Rust.
My previous patch was a bit shortsighted because I filtered out
combining characters from is_alphanumeric. Using properties is exact and
also much cleaner. It also covers edge cases that my initial approach
missed.
Besides isalnum, I also fixed isnumeric and isdigit in the same way by
using properties.
* fix: Python-Rust combining char diff in isalnum
Related to: #7518
Rust and Python differ on alphanumeric characters. Rust follows the
Unicode standard closer than Python. This means that is_alphanumeric
(char function in Rust) is different from isalnum (Python). To fix the
discrepancy, RustPython needs to mimic Python by rejecting certain
characters. Some classes of combining characters count as alphanumeric
in Rust but not Python. Combining characters are accent marks
that are combined with other characters to create a single grapheme.
It's possible that this PR is not exhaustive. I fixed the combining
character issue BUT I don't know the full range of discrepancies.
* fix: Ignore combining characters in SRE
Closes: #7518
* fix: Handle char expansion in islower, isupper
Closes: #7526
`py_islower` and `py_isupper` need to handle expansions for letter
casing. Comparing chars directly can miss edge cases in certain
languages. Unfortunately, like the last PR, this allocates to handle
potential expansions.
I also had to add `icu_casemap` as a dependency.
RustPython is already using parts of icu4x so this doesn't add many
transitive dependencies.
* Ensure islower/isupper handles strs without chars
This fixes a regression mentioned by CodeRabbit. I also figured out how
to check a string's case without allocation using Unicode properties.
Thus, this commit removes `icu_casemap` again. `icu_casemap` and my old
solution is required for a robust case check, but it seems like the
current code is fine for Python.
- Use POP_TOP instead of POP_ITER for for-loop break/return cleanup
- Expand duplicate_end_returns to clone final return for jump predecessors
- Restrict late jump threading pass to unconditional jumps only
- Skip exception blocks in inline/reorder passes
- Simplify threaded_jump_instr NoInterrupt handling
`swapcase` used `to_ascii_lowercase` and uppercase to swap cases. This
is fine for ASCII, but code points may expand into multiple bytes which
leads to incorrect case swaps for some languages. The fix is to use
`to_lowercase` and `to_uppercase` instead.
Unfortunately, this leads to a realloc in `swapcase` when bytes are
expanded.
Part of #7526.
When an inlined comprehension's first iterator expression contains
nested scopes (such as a lambda), those scopes' sub_tables appear at the
current position in the parent's sub_table list. The previous code
spliced the comprehension's own child sub_tables (e.g. inner inlined
comprehensions) into that same position before compiling the iterator,
which shifted the iterator's sub_tables to wrong indices.
Move the splice after the first iterator is compiled so its sub_tables
are consumed at their original positions.
Fixes nested list comprehensions like:
```python
[[x for _, x in g] for _, g in itertools.groupby(..., lambda x: ...)]
```
Disclosure: I used AI to develop the patch though I was heavily
involved.
* Use patched parking_lot_core with fork-safe HASHTABLE reset
parking_lot_core's global HASHTABLE retains stale ThreadData after
fork(), causing segfaults when contended locks enter park(). Use the
patched version from youknowone/parking_lot (rustpython branch) which
registers a pthread_atfork handler to reset the hash table.
Unskip test_asyncio TestFork. Add Manager+fork integration test.
* Unskip fork-related flaky tests after parking_lot fix
With parking_lot_core's HASHTABLE now properly reset via
pthread_atfork, fork-related segfaults and connection errors
in multiprocessing tests should be resolved.
Remove skip/expectedFailure markers from:
- test_concurrent_futures/test_wait.py (6 tests)
- test_concurrent_futures/test_process_pool.py (1 test)
- test_multiprocessing_fork/test_manager.py (all WithManagerTest*)
- test_multiprocessing_fork/test_misc.py (5 tests)
- test_multiprocessing_fork/test_threads.py (2 tests)
- _test_multiprocessing.py (2 shared_memory tests)
Keep test_repr_rlock skipped (flaky thread start latency,
not fork-related).
* Add missing _winapi functions and fix WinHandle int conversion
Add 13 functions: ReadFile, SetNamedPipeHandleState, CreateFileMapping,
OpenFileMapping, MapViewOfFile, UnmapViewOfFile, VirtualQuerySize,
CopyFile2, ResetEvent, CreateMutex, OpenEventW, LoadLibrary,
_mimetypes_read_windows_registry.
Add constants: INVALID_HANDLE_VALUE, FILE_MAP_READ/WRITE/COPY/EXECUTE.
Change WinHandle integer type from usize to isize so negative values
like INVALID_HANDLE_VALUE (-1) can be passed from Python.
* Align _winapi module with CPython
- Rename winapi.rs to _winapi.rs with #[path] attribute
- Rename CreateMutex to CreateMutexW
- Add missing constants: ERROR_ACCESS_DENIED, ERROR_PRIVILEGE_NOT_HELD,
PROCESS_ALL_ACCESS, 10 STARTF_ constants, LOCALE_NAME_SYSTEM_DEFAULT,
LOCALE_NAME_USER_DEFAULT, COPY_FILE_DIRECTORY
- Fix OpenMutexW return type and ReleaseMutex param type to use WinHandle
* Fix ReadFile/WriteFile overlapped keyword argument
Use FromArgs structs so overlapped parameter can be passed as
a keyword argument (overlapped=True), matching the CPython API.
* Remove extra constants and LoadLibrary not in CPython _winapi
Remove 19 constants (WAIT_ABANDONED, CREATE_ALWAYS, CREATE_NEW,
OPEN_ALWAYS, TRUNCATE_EXISTING, FILE_ATTRIBUTE_NORMAL, 8 FILE_FLAG_*,
3 FILE_SHARE_*, NMPWAIT_NOWAIT, NMPWAIT_USE_DEFAULT_WAIT) and
LoadLibrary function that are not present in CPython's _winapi module.
* Fix utf8_mode default to 0 and add PYTHONUTF8 env var support
Default utf8_mode was incorrectly set to 1, causing text-mode
subprocess to always decode as UTF-8 instead of locale encoding.
Changed default to 0 to match CPython 3.13 behavior on Windows.
Added PYTHONUTF8 environment variable handling with -X utf8 override.
* Fix CopyFile2 to raise proper OSError subclass
Use std::io::Error::from_raw_os_error instead of vm.new_os_error so
that winerror attribute is set and errno-to-exception mapping works
(e.g. ERROR_ACCESS_DENIED → PermissionError).
* Fix syntax_non_utf8 test to not depend on locale encoding
Use explicit encoding='latin-1' so the test works regardless of
the system locale (e.g. C/POSIX locale uses ASCII by default).
* Add compile_bool_op_inner and optimize nested opposite-operator BoolOps to avoid redundant __bool__ calls
When a nested BoolOp has the opposite operator (e.g., `And` inside `Or`),
the inner BoolOp's short-circuit exits are redirected to skip the outer
BoolOp's redundant truth test. This avoids calling `__bool__()` twice on
the same value (e.g., `Test() and False or False` previously called
`Test().__bool__()` twice instead of once).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add snapshot test for nested BoolOp bytecode
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Add runtime test for redundant __bool__ check (issue #3567)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Apply clippy and rustfmt
* Apply ruff format
* Refactor compile_bool_op: extract emit_short_circuit_test and unify with compile_bool_op_inner
Reduce code duplication by:
- Extracting the repeated Copy + conditional jump pattern into emit_short_circuit_test
- Merging compile_bool_op and compile_bool_op_inner into a single
compile_bool_op_with_target with an optional short_circuit_target parameter
- Keeping compile_bool_op as a thin wrapper for the public interface
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Relocate redundant __bool__ check test snippet
* Update extra_tests/snippets/syntax_short_circuit_bool.py
* Fix assertion in syntax_short_circuit_bool
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jeong, YunWon <69878+youknowone@users.noreply.github.com>
* Add host_env feature for sandbox isolation
Introduce a `host_env` feature flag that gates all host environment
access (filesystem, network, signals, processes). When disabled,
the VM operates in sandbox mode:
- _io module always available; FileIO gated by host_env
- SandboxStdio provides lightweight stdin/stdout/stderr via Rust std::io
- BytesIO/StringIO/BufferedIO/TextIOWrapper work without host_env
- open() returns UnsupportedOperation in sandbox
- stdlib modules (os, socket, signal, etc.) gated by host_env
- CI checks both host_env ON and OFF builds
* Auto-format: ruff check --select I --fix
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* test
* Fix sorted() to use __lt__ instead of __gt__
CPython's sort uses __lt__ for comparisons, but RustPython was using
__gt__. This caused issues when only __lt__ was overridden on a
subclass (e.g., NamedTuple with custom __lt__), as it would fall back
to the parent class's comparison instead of using the overridden method.