* apply more allow_threads
* Simplify STW thread state transitions
- Fix park_detached_threads: successful CAS no longer sets
all_suspended=false, avoiding unnecessary polling rounds
- Replace park_timeout(50µs) with park() in wait_while_suspended
- Remove redundant self-suspension in attach_thread and detach_thread;
the STW controller handles DETACHED→SUSPENDED via park_detached_threads
- Add double-check under mutex before condvar wait to prevent lost wakes
- Remove dead stats_detach_wait_yields field and add_detach_wait_yields
* Representable for ThreadHandle
* Set ThreadHandle state to Running in parent thread after spawn
Like CPython's ThreadHandle_start, set RUNNING state in the parent
thread immediately after spawn() succeeds, rather than in the child.
This eliminates a race where join() could see Starting state if called
before the child thread executes.
Also reverts the macOS skip for test_start_new_thread_failed since the
root cause is fixed.
* Set ThreadHandle state to Running in parent thread after spawn
* Add debug_assert for thread state in start_the_world
* Unskip now-passing test_get_event_loop_thread and test_start_new_thread_at_finalization
* Wrap IO locks and file ops in allow_threads
Add lock_wrapped to ThreadMutex for detaching thread state
while waiting on contended locks. Use it for buffered and
text IO locks. Wrap FileIO read/write in allow_threads via
crt_fd to prevent STW hangs on blocking file operations.
* Use std::sync for thread start/ready events
Replace parking_lot Mutex/Condvar with std::sync (pthread-based)
for started_event and handle_ready_event. This prevents hangs
in forked children where parking_lot's global HASHTABLE may be
corrupted.
* Suspend Python threads before fork()
Add stop-the-world thread suspension around fork() to prevent
deadlocks from locks held by dead parent threads in the child.
- Thread states: DETACHED / ATTACHED / SUSPENDED with atomic CAS
transitions matching _PyThreadState_{Attach,Detach,Suspend}
- stop_the_world / start_the_world: park all non-requester threads
before fork, resume after (parent) or reset (child)
- allow_threads (Py_BEGIN/END_ALLOW_THREADS): detach around blocking
syscalls (os.read/write, waitpid, Lock.acquire, time.sleep) so
stop_the_world can force-park via CAS
- Acquire/release import lock around fork lifecycle
- zero_reinit_after_fork: generic lock reset for parking_lot types
- gc_clear_raw: detach dict instead of clearing entries
- Lock-free double-check for descriptor cache reads (no read-side
seqlock); write-side seqlock retained for writer serialization
- fork() returns PyResult, checks PythonFinalizationError, calls
sys.audit
* Implement locale-aware 'n' format specifier for int, float, complex
Add LocaleInfo struct and locale-aware formatting methods to FormatSpec.
The 'n' format type now reads thousands_sep, decimal_point, and grouping
from C localeconv() and applies proper locale-based number grouping.
Remove @unittest.skip from test_format.test_locale.
* Fix complex 'n' format and remove locale expectedFailure markers
Rewrite format_complex_locale to reuse format_complex_re_im, matching
formatter_unicode.c: add_parens=0 and skip_re=0 for 'n' type.
Remove @expectedFailure from test_float__format__locale and
test_int__format__locale in test_types.py.
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix thread-safety in GC, type cache, and instruction cache
GC / refcount:
- Add safe_inc() check for strong()==0 in RefCount
- Add try_to_owned() to PyObject for atomic refcount acquire
- Replace strong_count()+to_owned() with try_to_owned() in GC
collection and weakref callback paths to prevent TOCTOU races
Type cache:
- Add proper SeqLock (sequence counter) to TypeCacheEntry
- Readers spin-wait on odd sequence, validate after read
- Writers bracket updates with begin_write/end_write
- Use try_to_owned + pointer revalidation on read path
- Call modified() BEFORE attribute modification in set_attr
Instruction cache:
- Add pointer_cache (AtomicUsize array) to CodeUnits for
single atomic pointer load/store (prevents torn reads)
- Add try_read_cached_descriptor with try_to_owned + pointer
and version revalidation after increment
- Add write_cached_descriptor with version-bracketed writes
RLock:
- Fix release() to check is_owned_by_current_thread
- Add _release_save/_acquire_restore methods
* Fix RLock _acquire_restore tuple handling and unxfail threading test
* Align type cache seqlock writer protocol with CPython
* RLock: use single parking_lot level, track recursion manually
Instead of calling lock()/unlock() N times for recursion depth N,
keep parking_lot at 1 level and manage the count ourselves.
This makes acquire/release O(1) and matches CPython's
_PyRecursiveMutex approach (lock once + set level directly).
* Add try_to_owned_from_ptr to avoid &PyObject on stale ptrs
Use addr_of! to access ref_count directly from a raw pointer
without forming &PyObject first. Applied in type cache and
instruction cache hit paths where the pointer may be stale.
* Fix CI: spelling typo and xfail flaky test_thread_safety
- Fix "minimising" -> "minimizing" for cspell
- xfail test_thread_safety: dict iteration races with
concurrent GC mutations in _finalizer_registry
* Replace GC tracking HashSet with intrusive linked list
Replace per-generation HashSet<GcObjectPtr> with intrusive doubly-linked
lists for GC object tracking. Each PyInner now carries gc_pointers
(prev/next) and gc_generation fields, enabling O(1) track/untrack
without hashing.
- Add gc_pointers (Pointers<PyObject>) and gc_generation (u8) to PyInner
- Implement GcLink trait for intrusive list integration
- Replace generation_objects/permanent_objects/tracked_objects/finalized_objects
HashSets with generation_lists/permanent_list LinkedLists
- Use GcBits::FINALIZED flag instead of finalized_objects HashSet
- Change default_dealloc to untrack directly before memory free
- Hold both src/dst list locks in promote_survivors to prevent race
conditions with concurrent untrack_object calls
- Add pop_front to LinkedList for freeze/unfreeze operations
Move unreachable_refs creation before drop(gen_locks) so that raw
pointer dereferences and refcount increments happen while generation
list read locks are held. Previously, after dropping read locks, other
threads could untrack and free objects, causing use-after-free when
creating strong references from the raw GcPtr pointers.
* Remove PyMutex<FrameState> from Frame, use UnsafeCell fields directly
Move stack, cells_frees, prev_line out of the mutex-protected FrameState
into Frame as FrameUnsafeCell fields. This eliminates mutex lock/unlock
overhead on every frame execution (with_exec).
Safety relies on the same single-threaded execution guarantee that
FastLocals already uses.
* Add thread-local DataStack for bump-allocating frame data
Introduce DataStack with linked chunks (16KB initial, doubling) and
push/pop bump allocation. Add datastack field to VirtualMachine.
Not yet wired to frame creation.
* Unify FastLocals and BoxVec stack into LocalsPlus
Replace separate FastLocals (Box<[Option<PyObjectRef>]>) and
BoxVec<Option<PyStackRef>> with a single LocalsPlus struct that
stores both in a contiguous Box<[usize]> array. The first
nlocalsplus slots are fastlocals and the rest is the evaluation
stack. Typed access is provided through transmute-based methods.
Remove BoxVec import from frame.rs.
* Use DataStack for LocalsPlus in non-generator function calls
Normal function calls now bump-allocate LocalsPlus data from the
per-thread DataStack instead of a separate heap allocation.
Generator/coroutine frames continue using heap allocation since
they outlive the call.
On frame exit, data is copied to the heap (materialize_to_heap)
to preserve locals for tracebacks, then the DataStack is popped.
VirtualMachine.datastack is wrapped in UnsafeCell for interior
mutability (safe because frame allocation is single-threaded LIFO).
* Fix clippy: import Layout from core::alloc instead of alloc::alloc
* Fix vectorcall compatibility with LocalsPlus API
Update vectorcall dispatch functions to use localsplus stack
accessors instead of direct stack field access. Add
stack_truncate method to LocalsPlus. Update vectorcall fast
path in function.rs to use datastack and fastlocals_mut().
* Add datastack, nlocalsplus, ncells, tstate to cspell dictionary
* Fix DataStack pop() for non-monotonic allocation addresses
Check both bounds of the current chunk when determining if a
pop base is in the current chunk. The previous check (base >=
chunk_start) fails on Windows where newer chunks may be
allocated at lower addresses than older ones.
* Fix stale comments: release_datastack -> materialize_localsplus
* Fix non-threading mode for parallel test execution
Two fixes for Cell-based types used in static items under non-threading
mode, which cause data races when Rust test runner uses parallel threads:
1. LazyLock: use std::sync::LazyLock when std is available instead of
wrapping core::cell::LazyCell with a false `unsafe impl Sync`.
The LazyCell wrapper is kept only for no-std (truly single-threaded).
2. gc_state: use static_cell! (thread-local in non-threading mode)
instead of OnceLock, so each thread gets its own GcState with
Cell-based PyRwLock/PyMutex that are not accessed concurrently.
* Fix CallAllocAndEnterInit to use LocalsPlus stack API
* Use checked arithmetic in LocalsPlus and DataStack allocators
* Address code review: checked arithmetic, threading feature deps, Send gate
- Use checked arithmetic for nlocalsplus in Frame::new
- Add "std" to threading feature dependencies in rustpython-common
- Gate GcState Send impl with #[cfg(feature = "threading")]
* Clean up comments: remove redundant/stale remarks, fix CPython references
* Reinit IO buffer locks after fork to prevent deadlocks
BufferedReader/Writer/TextIOWrapper use PyThreadMutex internally.
If a parent thread held one of these locks during fork(), the child
would deadlock on any IO operation.
Add reinit_after_fork() to RawThreadMutex and call it on sys.stdin/
stdout/stderr in the child process fork handler, analogous to
CPython's _PyIO_Reinit().
* Address review: unsafe fn + decoder lock reinit
- Mark reinit_std_streams_after_fork as unsafe fn to encode
fork-only precondition, update call site in posix.rs
- Reinit IncrementalNewlineDecoder's PyThreadMutex via
TextIOWrapper's decoder field to prevent child deadlocks
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Use Mutex::raw() accessor in reinit_mutex_after_fork
Use lock_api's Mutex::raw() to access the underlying RawMutex
instead of casting &PyMutex<T> directly. This avoids layout
assumptions about lock_api::Mutex<R, T> field ordering.
* Replace force_unlock with reinit_*_after_fork
Replace all force_unlock() + try_lock() patterns with zero-based
reinit that bypasses parking_lot internals entirely. After fork(),
the child is single-threaded so reinited locks won't contend.
Add reinit_rwlock_after_fork to common::lock alongside the existing
reinit_mutex_after_fork. Replace force_unlock_after_fork methods in
codecs, intern, and gc_state with reinit_after_fork equivalents.
This fixes after_fork_child silently dropping thread handles when
try_lock() failed on per-handle Arc<Mutex> locks.
* Fix _at_fork_reinit to write INIT directly instead of calling unlock()
unlock() goes through unlock_slow() which accesses parking_lot's
global hash table to unpark waiters. After fork(), this hash table
contains stale entries from dead parent threads, making unlock_slow()
unsafe. Writing INIT directly bypasses parking_lot internals entirely.
* Add import lock (IMP_LOCK) reinit after fork
The import lock is a ReentrantMutex that was never reinit'd after
fork(). If a parent thread held it during fork, the child would
deadlock on any import. Only reset if the owner is a dead thread;
if the surviving thread held it, normal unlock still works.
* Relax RefCount atomic ordering from SeqCst to Arc pattern
- inc/inc_by/get: SeqCst → Relaxed
- safe_inc CAS: SeqCst → Relaxed + compare_exchange_weak
- dec: SeqCst → Release + Acquire fence when count drops to 0
- leak CAS: SeqCst → AcqRel/Relaxed + compare_exchange_weak
* Reuse existing Vec via prepend_arg in execute_call
Replace vec![self_val] + extend(args.args) with
FuncArgs::prepend_arg() to avoid a second heap allocation
on every method call.
* Skip downcast_ref checks in invoke when tracing is disabled
Early return in PyCallable::invoke() when use_tracing is false,
avoiding two downcast_ref type checks on every function call.
* Replace fastlocals PyMutex with UnsafeCell-based FastLocals
Eliminate per-instruction mutex lock/unlock overhead for local
variable access. FastLocals uses UnsafeCell with safety guaranteed
by the frame's state mutex and sequential same-thread execution.
Affects 14+ lock() call sites in hot instruction paths (LoadFast,
StoreFast, DeleteFast, and their paired variants).
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Clear frame locals and stack on generator close
Add Frame::clear_locals_and_stack() to release references held by
closed generators/coroutines, matching _PyFrame_ClearLocals behavior.
Call it from Coro::close() after marking the coroutine as closed.
Update test_generators.py expectedFailure markers accordingly.
* Add dir_fd support for rmdir, remove/unlink, scandir
- rmdir: use unlinkat(fd, path, AT_REMOVEDIR) when dir_fd given
- remove/unlink: use unlinkat(fd, path, 0) when dir_fd given
- scandir: accept fd via fdopendir, add ScandirIteratorFd
- listdir: rewrite fd path to use raw readdir instead of nix::dir::Dir
- DirEntry: add d_type and dir_fd fields for fd-based scandir
- Update supports_fd/supports_dir_fd entries accordingly
* cells_free
* Replace `once_cell` with `std::sync::OnceLock`/`core::cell::OnceCell`
- Replace `once_cell::sync::{Lazy, OnceCell}` with
`std::sync::{LazyLock, OnceLock}`
- Replace `once_cell::unsync::{Lazy, OnceCell}` with
`core::cell::{LazyCell, OnceCell}`
- Inline `get_or_try_init` at call sites (unstable in std as of 1.93)
- Replace `OnceCell::with_value()` with `OnceCell::from()` in codecs.rs
- Remove `once_cell` direct dependency from common and vm crates
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* `std` feature for common
- Gate OS-dependent modules behind `#[cfg(feature = "std")]`
- Replace `std::f64` with `core::f64` in float_ops
- Replace `std::process::abort` with panic in refcount
- Remove `thread_local` from levenshtein (stack buffer)
- Split static_cell into threading/non_threading/no_std
* `std` for codegen
* `no_std` for pylib
* Replace WeakListInner with inline atomic weakref list and stripe locks
Remove heap-allocated WeakListInner (OncePtr<PyMutex<WeakListInner>>).
WeakRefList now holds two inline atomic pointers (head, generic).
PyWeak.parent replaced with wr_object pointing directly to referent.
Add weakref_lock module with AtomicU8 spinlock array for thread safety.
Rewrite upgrade/clear/drop_inner/count/get_weak_references with stripe lock.
Make Pointers methods public in linked_list.rs.
Remove expectedFailure from test_subclass_refs_dont_replace_standard_refs.
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* * Added alloc_instead_of_core, std_instead_of_alloc, and std_instead_of_core clippy rules
* Manually changed part of the code to use core/alloc
* use clippy --fix to fix issues in stdlib
* * Used clippy --fix to fix issues in vm
* Imported Range in vm/src/anystr.rs
* * Used clippy --fix to fix issues in common