- Add CO_NESTED flag (0x10) for nested function scopes
- Emit LOAD_SMALL_INT for integers 0..=255 instead of LOAD_CONST
- Eliminate dead constant expression statements (no side effects)
- Ensure None in co_consts for functions with no other constants
- Add code.__replace__() for copy.replace() support
- Mark test_co_lnotab and test_invalid_bytecode as expectedFailure
* Fix thread-safety in GC, type cache, and instruction cache
GC / refcount:
- Add safe_inc() check for strong()==0 in RefCount
- Add try_to_owned() to PyObject for atomic refcount acquire
- Replace strong_count()+to_owned() with try_to_owned() in GC
collection and weakref callback paths to prevent TOCTOU races
Type cache:
- Add proper SeqLock (sequence counter) to TypeCacheEntry
- Readers spin-wait on odd sequence, validate after read
- Writers bracket updates with begin_write/end_write
- Use try_to_owned + pointer revalidation on read path
- Call modified() BEFORE attribute modification in set_attr
Instruction cache:
- Add pointer_cache (AtomicUsize array) to CodeUnits for
single atomic pointer load/store (prevents torn reads)
- Add try_read_cached_descriptor with try_to_owned + pointer
and version revalidation after increment
- Add write_cached_descriptor with version-bracketed writes
RLock:
- Fix release() to check is_owned_by_current_thread
- Add _release_save/_acquire_restore methods
* Fix RLock _acquire_restore tuple handling and unxfail threading test
* Align type cache seqlock writer protocol with CPython
* RLock: use single parking_lot level, track recursion manually
Instead of calling lock()/unlock() N times for recursion depth N,
keep parking_lot at 1 level and manage the count ourselves.
This makes acquire/release O(1) and matches CPython's
_PyRecursiveMutex approach (lock once + set level directly).
* Add try_to_owned_from_ptr to avoid &PyObject on stale ptrs
Use addr_of! to access ref_count directly from a raw pointer
without forming &PyObject first. Applied in type cache and
instruction cache hit paths where the pointer may be stale.
* Fix CI: spelling typo and xfail flaky test_thread_safety
- Fix "minimising" -> "minimizing" for cspell
- xfail test_thread_safety: dict iteration races with
concurrent GC mutations in _finalizer_registry
* Add CALL_ALLOC_AND_ENTER_INIT specialization
Optimizes user-defined class instantiation MyClass(args...)
when tp_new == object.__new__ and __init__ is a simple
PyFunction. Allocates the object directly and calls __init__
via invoke_exact_args, bypassing the generic type.__call__
dispatch path.
* Invalidate JIT cache when __code__ is reassigned
Change jitted_code from OnceCell to PyMutex<Option<CompiledCode>> so
it can be cleared on __code__ assignment. The setter now sets the
cached JIT code to None to prevent executing stale machine code.
* Atomic operations for specialization cache
- range iterator: deduplicate fast_next/next_fast
- Replace raw pointer reads/writes in CodeUnits with atomic
operations (AtomicU8/AtomicU16) for thread safety
- Add read_op (Acquire), read_arg (Relaxed), compare_exchange_op
- Use Release ordering in replace_op to synchronize cache writes
- Dispatch loop reads opcodes atomically via read_op/read_arg
- Fix adaptive counter access: use read/write_adaptive_counter
instead of read/write_cache_u16 (was reading wrong bytes)
- Add pre-check guards to all specialize_* functions to prevent
concurrent specialization races
- Move modified() before attribute changes in type.__setattr__
to prevent use-after-free of cached descriptors
- Use SeqCst ordering in modified() for version invalidation
- Add Release fence after quicken() initialization
* Fix slot wrapper override for inherited attributes
For __getattribute__: only use getattro_wrapper when the type
itself defines the attribute; otherwise inherit native slot from
base class via MRO.
For __setattr__/__delattr__: only store setattro_wrapper when
the type has its own __setattr__ or __delattr__; otherwise keep
the inherited base slot.
* Fix StoreAttrSlot cache overflow corrupting next instruction
write_cache_u32 at cache_base+3 writes 2 code units (positions 3 and 4),
but STORE_ATTR only has 4 cache entries (positions 0-3). This overwrites
the next instruction with the upper 16 bits of the slot offset.
Changed to write_cache_u16/read_cache_u16 since member descriptor offsets
fit within u16 (max 65535 bytes).
* Exclude method_descriptor from has_python_cmp check
has_python_cmp incorrectly treated method_descriptor as Python-level
comparison methods, causing richcompare slot to use wrapper dispatch
instead of inheriting the native slot.
* Fix CompareOpFloat NaN handling
partial_cmp returns None for NaN comparisons. is_some_and incorrectly
returned false for all NaN comparisons, but NaN != x should be true
per IEEE 754 semantics.
* Fix invoke_exact_args borrow in CallAllocAndEnterInit
* Distinguish Python method vs not-found in slot MRO lookup
Change lookup_slot_in_mro to return a 3-state SlotLookupResult
enum (NativeSlot/PythonMethod/NotFound) instead of Option<T>.
Previously, both "found a Python-level method" and "found nothing"
returned None, causing incorrect slot inheritance. For example,
class Test(Mixin, TestCase) would inherit object.slot_init from
Mixin via inherit_from_mro instead of using init_wrapper to
dispatch TestCase.__init__.
Apply this fix consistently to all slot update sites:
update_main_slot!, update_sub_slot!, TpGetattro, TpSetattro,
TpDescrSet, TpHash, TpRichcompare, SqAssItem, MpAssSubscript.
* Extract specialization helper functions to reduce boilerplate
- deoptimize() / deoptimize_at(): replace specialized op with base op
- adaptive(): decrement warmup counter or call specialize function
- commit_specialization(): replace op on success, backoff on failure
- execute_binary_op_int() / execute_binary_op_float(): typed binary ops
Removes 10 duplicate deoptimize_* functions, consolidates 13 adaptive
counter blocks, 6 binary op handlers, and 7 specialize tail patterns.
Also replaces inline deopt blocks in LoadAttr/StoreAttr handlers.
* Improve specialization guards and fix mark_stacks
- CONTAINS_OP_SET: add frozenset support in handler and specialize
- TO_BOOL_ALWAYS_TRUE: cache type version instead of checking slots
- LOAD_GLOBAL_BUILTIN: cache builtins dict version alongside globals
- mark_stacks: deoptimize specialized opcodes for correct reachability
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Implement LOAD_ATTR inline caching with adaptive specialization
Add type version counter (tp_version_tag) to PyType with subclass
invalidation cascade. Add cache read/write methods (u16/u32/u64)
to CodeUnits. Implement adaptive specialization in load_attr that
replaces the opcode with specialized variants on first execution:
- LoadAttrMethodNoDict: cached method lookup for slotted types
- LoadAttrMethodWithValues: cached method with dict shadow check
- LoadAttrInstanceValue: direct dict lookup skipping descriptors
Specialized opcodes guard on type_version_tag and deoptimize back
to generic LOAD_ATTR with backoff counter on cache miss.
* Add BINARY_OP and CALL adaptive specialization
BINARY_OP: Specialize int add/subtract/multiply and float
add/subtract/multiply with type guards and deoptimization.
CALL: Add func_version to PyFunction, specialize simple
function calls (CallPyExactArgs, CallBoundMethodExactArgs)
with invoke_exact_args fast path that skips FuncArgs
allocation and fill_locals_from_args.
* Lazy quickening for adaptive specialization counters
Move counter initialization from compile-time to RESUME execution,
matching CPython's _PyCode_Quicken pattern. Store counter in CACHE
entry's arg byte to preserve op=Instruction::Cache for dis/JIT.
Add PyCode.quickened flag for one-time initialization.
* Add Instruction::deoptimize() and CodeUnits::original_bytes()
- deoptimize() maps specialized opcodes back to their base adaptive variant
- original_bytes() produces deoptimized bytecode with zeroed CACHE entries
- co_code now returns deoptimized bytes, _co_code_adaptive returns current bytes
- Marshal serialization uses original_bytes() instead of raw transmute
* Fix monitoring and specialization interaction
- cache_entries() returns correct count for instrumented opcodes
- deoptimize() maps instrumented opcodes back to base
- quicken() skips adaptive counter for instrumented opcodes
- instrument_code Phase 3 deoptimizes specialized opcodes and
clears CACHE entries to prevent stale pointer dereferences
* Address review: bounds checks, UB fix, version overflow, error handling
- Add bounds checks to read_cache_u16/u32/u64
- Fix quicken() aliasing UB by using &mut directly
- Add JumpBackwardJit/JumpBackwardNoJit to deoptimize()
- Guard can_specialize_call with NEWLOCALS flag check
- Use compare_exchange_weak for version tag to prevent wraparound
- Propagate dict lookup errors in LoadAttrMethodWithValues
- Apply adaptive backoff on version tag assignment failure
- Remove duplicate imports in frame.rs
* Emit CACHE code units in bytecode to match CPython 3.14
- Add cache_entries() method to Instruction enum
- Emit CACHE code units after opcodes in finalize_code
- Handle NO_LOCATION (line=-1) in linetable for CACHE entries
- Account for CACHE entries in exception table generation
- Skip CACHE entries in VM execution loop (with jump detection)
- Handle CACHE in InstrumentedLine/InstrumentedInstruction/InstrumentedForIter/InstrumentedNotTaken
- Skip CACHE in monitoring instrumentation phases
- Update co_branches() for cache-adjusted offsets
- Restore _cache_format in Lib/opcode.py
- Remove expectedFailure from test_c_call, test_start_offset
* Use relative jump offsets and fix bytecode layout
- Convert jump arguments from absolute to relative offsets
in frame.rs, monitoring.rs, and stack_analysis
- Add jump_relative_forward/backward helpers to ExecutingFrame
- Resolve pseudo jump instructions before offset fixpoint loop
- Emit NOP for break, continue, pass to match line-tracing
- Fix async for: emit EndAsyncFor with correct target, add NotTaken
- Fix comprehension if-cleanup to use separate block
- Fix super() source range for multi-line calls
- Fix NOP removal to preserve line-marker NOPs
- Fix InstrumentedLine cache skipping after re-dispatch
- Match InstrumentedResume/YieldValue in yield_from_target
- Remove CALL_FUNCTION_EX cache entry from opcode.py
- Remove resolved expectedFailure markers
* Align CPython 3.14 LOAD_GLOBAL null-bit and RERAISE semantics
* Remove redundant CPython-referencing comments
Clean up comments that unnecessarily mention CPython per project
convention. Replace with concise descriptions of the behavior itself.
- Implement set_f_lineno with stack analysis and deferred unwinding
- Add Frame::set_lasti() for trace callback line jumps
- Implement co_branches() on code objects
- Clear _cache_format in opcode.py (no inline caches)
- Fix getattro slot inheritance: preserve native slot from inherit_slots
- Fix BRANCH_RIGHT src_offset in InstrumentedPopJumpIf*
- Move lasti increment before line event for correct f_lineno
- Skip RESUME instruction from generating line events
- Defer stack pops via pending_stack_pops/pending_unwind_from_stack
to avoid deadlock with state mutex
- Fix ForIter exhaust target in mark_stacks to skip END_FOR
- Prevent exception handler paths from overwriting normal-flow stacks
- Replace #[cfg(feature = "threading")] duplication with PyAtomic<T>
from rustpython_common::atomic (Radium-based unified API)
- Remove expectedFailure from 31 now-passing jump tests
Add CoMonitoringData with line_opcodes and per_instruction_opcodes
side-tables. INSTRUMENTED_LINE and INSTRUMENTED_INSTRUCTION read
original opcodes from side-tables and re-dispatch after firing events.
- Add decode_exception_table() and CodeUnits::replace_op()
- Add Instruction::to_instrumented/to_base/is_instrumented mappings
- Three-phase instrument_code: de-instrument, re-instrument regular,
then layer INSTRUCTION and LINE with side-table storage
- Mark exception handler targets as line starts in LINE placement
- InstrumentedLine resolves side-table chain atomically when wrapping
InstrumentedInstruction
- InstrumentedForIter fires both BRANCH_LEFT and BRANCH_RIGHT
- Remove callback on DISABLE return for non-local events
* cold block reordering and jump normalization
Add mark_cold, push_cold_blocks_to_end, and normalize_jumps
passes to the codegen CFG pipeline. Use JumpNoInterrupt for
exception handler exit paths in try-except-finally compilation.
* mark test_peepholer
* Generate optimized oparg enums
* No need to match on 255
* Remove `num_enum` crate from `compiler-core`
* Update `Cargo.lock`
* macro fmt
* Rename macro vars
* Match without `,`
* Support alternative values
* Fix alternatives
* Improve docs
* Add const assert
* Don't use `as u32`
* Make only ComparisonOperator unoptimized
* Fix test
* cleanup
* All opargs are optimized
* Remove comment
Move exception handler tracking from compile-time fblock metadata to
a post-codegen CFG analysis pass, matching flowgraph.c's pipeline.
Compiler changes (compile.rs):
- Remove fb_handler, fb_stack_depth, fb_preserve_lasti from FBlockInfo
- Remove handler_stack_depth() and current_except_handler() helpers
- Emit SetupFinally/SetupCleanup/SetupWith/PopBlock pseudo instructions
instead of manually tracking handlers in fblocks
- Add dead blocks after raise/break/continue/return to prevent
dead code from corrupting the except stack
- Add missing PopTop after CallIntrinsic1 ImportStar
- Fix async comprehension SetupFinally/GetANext emission order
- Fix try-except* handler: move BUILD_LIST/COPY before handler loop
- Rework unwind_fblock for HandlerCleanup, TryExcept, FinallyTry,
FinallyEnd, and With/AsyncWith to emit proper PopBlock sequences
New CFG analysis passes (ir.rs):
- mark_except_handlers(): mark blocks targeted by SETUP_* instructions
- label_exception_targets(): walk CFG with except stack to set
per-instruction handler info and convert POP_BLOCK to NOP
- convert_pseudo_ops(): lower remaining pseudo ops after analysis
- Compute handler entry depth from SETUP_* type in max_stackdepth()
(SETUP_CLEANUP=+2, SETUP_FINALLY/SETUP_WITH=+1)
- Fix SEND jump_effect from -1 to 0 (receiver stays until END_SEND)
- Add underflow guard for handler stack_depth calculation
Instruction metadata (instruction.rs):
- SetupCleanup/SetupFinally/SetupWith now carry target: Arg<Label>
- Add is_block_push()/is_pop_block() to PseudoInstruction/AnyInstruction
- Fix LoadSpecial stack_effect from (2,2) to (1,1)
- Set Setup* stack_effect_info to (0,0) for fall-through consistency
Fixes 3 test_coroutines expectedFailures:
- test_with_8, test_for_assign_raising_stop_async_iteration{,_2}
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Made many lightweight/internal types trivially copyable to simplify
value handling across the codebase.
* Added a workspace lint to surface missing copy implementations as
warnings.
* Extended buffer support with explicit retain/release hooks.
* **Bug Fixes**
* Improved diagnostics and debug output for thread/lock acquisition
scenarios.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
See:
1fa166888b/Include/internal/pycore_opcode_utils.h (L39-L44)
and
1fa166888b/Include/internal/pycore_opcode_utils.h (L52-L55)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Updated compiler control flow analysis for improved accuracy in
dead-code elimination and stack-depth tracking.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
This pull request updates `types` module to v3.14.2. While doing it, it
fixes also async-related feature. This pull request's base is generated
by #6827.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Generators that act as iterable coroutines are now recognized as
awaitable, improving async behavior.
* Yield-from and await interactions now handle coroutine-iterable
sources more consistently.
* **Bug Fixes**
* Reduces spurious TypeError cases when awaiting or yielding from
wrapped coroutine-like generators.
<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
New Features
Direct small-integer loading (0–255) and locals-loading for faster execution
Async-generator wrapping and improved generator resume behavior
Performance
Faster integer loads and streamlined jump/loop handling for better runtime performance
Bug Fixes
More robust StopIteration handling and stricter init return checks
Corrected iterator cleanup for async and sync loops
Improvements
Aligns loop and jump semantics with CPython 3.14 patterns