* Revert __class__ lookup skip in object_isinstance
The optimization to skip __class__ lookup based on getattro check
was incorrect: a class can override __class__ as a property while
still using standard __getattribute__. Revert to always performing
the lookup, matching CPython's object_isinstance behavior.
* Collapse nested if in object_isinstance for clippy
* Fix symbol table sub_table desync for non-simple annotation targets
Non-simple annotations (subscript/attribute/parenthesized targets like
`a[0]: expr`) were scanned in the annotation scope during symbol table
analysis, creating sub_tables for any comprehensions. But codegen only
compiles simple name annotations into __annotate__, so those sub_tables
were never consumed. This caused subsequent simple annotations'
comprehension sub_tables to get the wrong index, resulting in
"the symbol 'X' must be present in the symbol table" errors.
Fix: skip entering annotation scope for non-simple annotations since
they are never compiled into __annotate__.
* Validate forbidden expressions in non-simple annotation targets
Fix cspell "desynchronize" warning and validate yield/await/named/async
comprehension expressions in non-simple annotations without creating
annotation scopes.
* Restore in_annotation flag before propagating error
* Implement LOAD_ATTR inline caching with adaptive specialization
Add type version counter (tp_version_tag) to PyType with subclass
invalidation cascade. Add cache read/write methods (u16/u32/u64)
to CodeUnits. Implement adaptive specialization in load_attr that
replaces the opcode with specialized variants on first execution:
- LoadAttrMethodNoDict: cached method lookup for slotted types
- LoadAttrMethodWithValues: cached method with dict shadow check
- LoadAttrInstanceValue: direct dict lookup skipping descriptors
Specialized opcodes guard on type_version_tag and deoptimize back
to generic LOAD_ATTR with backoff counter on cache miss.
* Add BINARY_OP and CALL adaptive specialization
BINARY_OP: Specialize int add/subtract/multiply and float
add/subtract/multiply with type guards and deoptimization.
CALL: Add func_version to PyFunction, specialize simple
function calls (CallPyExactArgs, CallBoundMethodExactArgs)
with invoke_exact_args fast path that skips FuncArgs
allocation and fill_locals_from_args.
* Lazy quickening for adaptive specialization counters
Move counter initialization from compile-time to RESUME execution,
matching CPython's _PyCode_Quicken pattern. Store counter in CACHE
entry's arg byte to preserve op=Instruction::Cache for dis/JIT.
Add PyCode.quickened flag for one-time initialization.
* Add Instruction::deoptimize() and CodeUnits::original_bytes()
- deoptimize() maps specialized opcodes back to their base adaptive variant
- original_bytes() produces deoptimized bytecode with zeroed CACHE entries
- co_code now returns deoptimized bytes, _co_code_adaptive returns current bytes
- Marshal serialization uses original_bytes() instead of raw transmute
* Fix monitoring and specialization interaction
- cache_entries() returns correct count for instrumented opcodes
- deoptimize() maps instrumented opcodes back to base
- quicken() skips adaptive counter for instrumented opcodes
- instrument_code Phase 3 deoptimizes specialized opcodes and
clears CACHE entries to prevent stale pointer dereferences
* Address review: bounds checks, UB fix, version overflow, error handling
- Add bounds checks to read_cache_u16/u32/u64
- Fix quicken() aliasing UB by using &mut directly
- Add JumpBackwardJit/JumpBackwardNoJit to deoptimize()
- Guard can_specialize_call with NEWLOCALS flag check
- Use compare_exchange_weak for version tag to prevent wraparound
- Propagate dict lookup errors in LoadAttrMethodWithValues
- Apply adaptive backoff on version tag assignment failure
- Remove duplicate imports in frame.rs
* Remove intermediate Vec allocation in unpack_sequence fast path
Push elements directly from tuple/list slice in reverse order
instead of cloning into a temporary Vec first.
* Use read-only atomic load before swap in check_signals
Add Relaxed load guard before the Acquire swap to avoid cache-line
invalidation on every instruction dispatch when no signal is pending.
* Cache builtins downcast in ExecutingFrame for LOAD_GLOBAL
Pre-compute builtins.downcast_ref::<PyDict>() at frame entry and reuse
the cached reference in load_global_or_builtin and LoadBuildClass.
Also add get_chain_exact to skip redundant exact_dict type checks.
* Add number Add slot to PyStr for direct str+str dispatch
binary_op1 can now resolve str+str addition directly via the number
slot instead of falling through to the sequence concat path.
* Guard FastLocals access in locals() with try_lock on state mutex
Address CodeRabbit review: f_locals() could access fastlocals without
synchronization when called from another thread. Use try_lock on the
state mutex so concurrent access is properly serialized.
* Use exact type check for builtins_dict cache
downcast_ref::<PyDict>() matches dict subclasses, causing
get_chain_exact to bypass custom __getitem__ overrides.
Use downcast_ref_if_exact to only fast-path exact dict types.
* Consolidate with_recursion in _cmp to single guard
Move the recursion depth check to wrap the entire _cmp body
instead of each individual call_cmp direction, reducing Cell
read/write pairs and scopeguard overhead per comparison.
* Add opcode-level fast paths for FOR_ITER, COMPARE_OP, BINARY_OP
- FOR_ITER: detect PyRangeIterator and bypass generic iterator
protocol (atomic slot load + indirect call)
- COMPARE_OP: inline int/float comparison for exact types,
skip rich_compare dispatch and with_recursion overhead
- BINARY_OP: inline int add/sub with i64 checked arithmetic
to avoid BigInt heap allocation and binary_op1 dispatch
* Also check globals is exact dict for LOAD_GLOBAL fast path
get_chain_exact bypasses __missing__ on dict subclasses.
Move get_chain_exact to PyExact<PyDict> impl with debug_assert,
and have get_chain delegate to it. Store builtins_dict as
Option<&PyExact<PyDict>> to enforce exact type at compile time.
Use PyRangeIterator::next_fast() instead of pub(crate) fields.
Fix comment style issues.
* Add str(int) and repr(int) fast path using i64 conversion
- Skip __str__ method resolution for exact PyInt in PyObject::str()
- Use i64::to_string() for small integers, BigInt::to_string() for large ones
- ~36% improvement in str(int) benchmarks
* Extract PyInt::to_str_radix_10() to deduplicate i64 fast path logic
In object_isinstance(), when is_subtype() returns false, the __class__
attribute lookup via get_attribute_opt is redundant for objects using
standard __getattribute__, since __class__ is a data descriptor on
object that always returns obj.class().
* [update_lib] fast date lookup for todo
* add deps
* Auto-format: ruff format
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Implement PyStackRef borrowed stack references for LOAD_FAST_BORROW
* Fix PyStackRef: use NonZeroUsize for niche optimization, revert borrowed refs
- Change PyStackRef.bits from usize to NonZeroUsize so Option<PyStackRef>
has the same size as Option<PyObjectRef> via niche optimization
- Revert LoadFastBorrow to use clone instead of actual borrowed refs
to avoid borrowed refs remaining on stack at yield points
- Add static size assertions for Option<PyStackRef>
- Add stackref, fastlocal to cspell dictionary
- Remove debug eprintln statements
- Fix clippy warning for unused push_borrowed
* Add borrowed ref debug_assert to InstrumentedYieldValue, clean up comments
- Add prefix, exec_prefix, BINDIR to sysconfigdata build_time_vars
- Add Py_DEBUG and ABIFLAGS to sysconfigdata
- Fix Py_GIL_DISABLED/Py_DEBUG to use int (1/0) instead of bool
- Gitignore generated _sysconfig_vars*.json
* Relax RefCount atomic ordering from SeqCst to Arc pattern
- inc/inc_by/get: SeqCst → Relaxed
- safe_inc CAS: SeqCst → Relaxed + compare_exchange_weak
- dec: SeqCst → Release + Acquire fence when count drops to 0
- leak CAS: SeqCst → AcqRel/Relaxed + compare_exchange_weak
* Reuse existing Vec via prepend_arg in execute_call
Replace vec![self_val] + extend(args.args) with
FuncArgs::prepend_arg() to avoid a second heap allocation
on every method call.
* Skip downcast_ref checks in invoke when tracing is disabled
Early return in PyCallable::invoke() when use_tracing is false,
avoiding two downcast_ref type checks on every function call.
* Replace fastlocals PyMutex with UnsafeCell-based FastLocals
Eliminate per-instruction mutex lock/unlock overhead for local
variable access. FastLocals uses UnsafeCell with safety guaranteed
by the frame's state mutex and sequential same-thread execution.
Affects 14+ lock() call sites in hot instruction paths (LoadFast,
StoreFast, DeleteFast, and their paired variants).
* Auto-format: cargo fmt --all
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Emit CACHE code units in bytecode to match CPython 3.14
- Add cache_entries() method to Instruction enum
- Emit CACHE code units after opcodes in finalize_code
- Handle NO_LOCATION (line=-1) in linetable for CACHE entries
- Account for CACHE entries in exception table generation
- Skip CACHE entries in VM execution loop (with jump detection)
- Handle CACHE in InstrumentedLine/InstrumentedInstruction/InstrumentedForIter/InstrumentedNotTaken
- Skip CACHE in monitoring instrumentation phases
- Update co_branches() for cache-adjusted offsets
- Restore _cache_format in Lib/opcode.py
- Remove expectedFailure from test_c_call, test_start_offset
* Use relative jump offsets and fix bytecode layout
- Convert jump arguments from absolute to relative offsets
in frame.rs, monitoring.rs, and stack_analysis
- Add jump_relative_forward/backward helpers to ExecutingFrame
- Resolve pseudo jump instructions before offset fixpoint loop
- Emit NOP for break, continue, pass to match line-tracing
- Fix async for: emit EndAsyncFor with correct target, add NotTaken
- Fix comprehension if-cleanup to use separate block
- Fix super() source range for multi-line calls
- Fix NOP removal to preserve line-marker NOPs
- Fix InstrumentedLine cache skipping after re-dispatch
- Match InstrumentedResume/YieldValue in yield_from_target
- Remove CALL_FUNCTION_EX cache entry from opcode.py
- Remove resolved expectedFailure markers
* Align CPython 3.14 LOAD_GLOBAL null-bit and RERAISE semantics
* Remove redundant CPython-referencing comments
Clean up comments that unnecessarily mention CPython per project
convention. Replace with concise descriptions of the behavior itself.
- Add TraceEvent::Exception and Opcode variants with profile filtering
- Extract dispatch_traced_frame helper for Call/Return trace events
- Fire exception trace on new raises, SEND StopIteration, FOR_ITER StopIteration
- Fire opcode trace events gated by f_trace_opcodes
- Move prev_line to FrameState for persistence across generator suspend/resume
- Reset prev_line in gen_throw for correct LINE monitoring after yield
- Add per-code event mask (events_for_code) to prevent unrelated code instrumentation
- Remove expectedFailure markers from test_bdb (5) and test_sys_setprofile (14)
Add non-ASCII string check to _hashlib.compare_digest, matching the
behavior of _operator._compare_digest. When both arguments are strings,
non-ASCII characters now correctly raise TypeError.
Also replace the non-constant-time == comparison with constant_time_eq
for proper timing-attack resistance, and return PyResult<bool> instead
of PyResult<PyObjectRef>.
Add #[pygetset] getter/setter for Cursor.row_factory so that Python-level
attribute access reads/writes the Rust struct field instead of the
instance dict.
Fix Connection.cursor() to only propagate the connection's row_factory
to the cursor when the connection's row_factory is not None, matching
CPython behavior. Previously it unconditionally overwrote the cursor's
row_factory, discarding any factory set by a cursor subclass __init__.
When a custom `closed` property on the underlying buffer calls
`detach()` during the `file_closed()` check in `close()`, the
wrapper's internal buffer becomes None. Subsequent flush/close
operations then fail with AttributeError on NoneType.
Add a guard after the `file_closed()` check to detect if the buffer
was detached reentrantly, and return early in that case (detach has
already flushed the stream).
This mirrors the fix applied in CPython
(https://github.com/python/cpython/issues/142594\).