The tests for swapcase() were failing for two reasons. The first is
'𐐧' casing which should be fixed with modern Unicode tables. The second
failure is due to CPython's sigma override, which I implemented in
PR #7717.
* PyBytes.title should be ASCII-only.
* Use icu_casemap over unicode-casing for titles
`icu_casemap` is consistently maintained, official, and tracks the
latest Unicode versions. RustPython is also using other `icu4x` crates,
so using `icu_casemap` is more consistent.
As with islower and isupper, tracking the latest Unicode version is
important because character definitions shift over time which causes
discrepancies between RustPython and CPython.
This commit fixes title().
* Use icu_casemap for capitalize()
I dropped unicode-casing because it's cleaner to use icu4x for
everything. `icu4x` will also stay up to date whereas unicode-casing
will need to be periodically updated with new Unicode tables. Dropping
unicode-casing also removes some binary bloat due to the tables.
`capitalize()` mimics CPython behavior more closely now as well.
Notably, I implemented CPython's sigma edge case handler.
* Match CPython's title() exactly
Five related CPython parity gaps in `str` formatting and construction:
1. **`str(bytes, errors=...)` triggers decode mode.** Previously, only
`encoding=` triggered decode; passing only `errors=` fell back to
`repr()`. CPython's behavior: presence of `encoding` OR `errors`
triggers decode mode (default UTF-8 when only `errors` is given).
2. **`'{...}'.format() IndexError wording.** Generic Rust "tuple index
out of range" replaced with CPython's "Replacement index N out of
range for positional args tuple".
3. **`{0:3.2s}.format('abc')` → 'ab '.** String format spec applied
precision after width padding; CPython truncates BEFORE padding.
Reorder the operations.
4. **`%x` / `%o` / `%X` / `%c` accept `__index__` objects.** Previously
only `PyInt` downcast was attempted. Mirror CPython's
PyNumber_Index dispatch via `try_index_opt`.
5. **`%d` / `%u` / `%i` error wording.** "a number is required" →
"a real number is required" (matches CPython).
Also adds `not <type>` suffix to `%c` error messages so the type is
visible in TypeError text (matches CPython structure even without
fully-qualified names).
Verified byte-identical with CPython 3.14.4 across 25+ probes covering
the format/spec/constructor combinations. Unmasks
`test_str.test_constructor_keyword_args` and
`test_str.test_constructor_defaults`. test_str/test_bytes/test_format/
test_codecs/test_io/test_unicode_identifiers — 1,429 tests pass, 0
regressions. All 188 `extra_tests/snippets/*.py` pass under the CI
feature set.
`test_str.test_format` and `test_str.test_formatting` markers retained:
`test_format` still trips on `'{0:08s}'.format('result')` (numeric
zero-pad treated as fill+left-align by CPython for str type — separate
format-spec parser concern). `test_formatting` still trips on
`%c` error message expecting fully qualified `module.qualname` (RP
returns bare class name — separate broader concern).
CPython rejects digit-only format-string field names that overflow
Py_ssize_t at parse time with ValueError: Too many decimal digits in
format string (Python/string_parser.c::get_integer). RustPython's
FieldName::parse accepted any digit string usize::from_str could parse,
producing IndexError or KeyError at lookup instead.
Cap the parsed index at isize::MAX (Py_ssize_t::MAX on every platform)
inside FieldName::parse. Also reject digits-only strings whose value
overflows usize itself (caught when parse_usize returns None on an
all-digit input). A new FormatParseError::TooManyDecimalDigits maps to
the byte-identical CPython wording.
Unmasks test_str.StrTest.test_format_huge_item_number.
CPython rejects format-spec widths that exceed Py_ssize_t::MAX with
ValueError: Too many decimal digits in format string. RustPython's
FormatSpec::_parse only capped precision (via parse_precision); width
was accepted up to usize::MAX, so values like sys.maxsize + 1 silently
produced an effectively-ignored width.
Reject any width above i32::MAX with FormatSpecError::DecimalDigitsTooMany,
matching the existing precision cap and producing the byte-identical
ValueError wording.
Unmasks test_str.StrTest.test_format_huge_width.
Closes#7450.
CPython's unicode_new_impl returns the PyObject_Str result as-is when
type == &PyUnicode_Type, only invoking unicode_subtype_new for actual
str subclasses. RustPython's PyStr::Constructor stripped the result via
Self::from(s.as_wtf8().to_owned()) and re-materialized through
into_ref_with_type, dropping the subclass type even when cls is exactly
str.
Add a slot_new branch that returns input.str(vm)? directly when cls is
str_type with no encoding. Subtype construction and the bytes-decoding
path are unchanged.
Unmasks test_str.StrTest.test_conversion (11 assertTypedEqual cases).
Rust and Python differ in which properties they use for alphanumeric,
numeric, et cetera. Both languages list which properties are used which
makes it easy to mimic Python's behavior in Rust.
My previous patch was a bit shortsighted because I filtered out
combining characters from is_alphanumeric. Using properties is exact and
also much cleaner. It also covers edge cases that my initial approach
missed.
Besides isalnum, I also fixed isnumeric and isdigit in the same way by
using properties.
* Downgraded skips in tests
* Fixed failing tests
* Fixed test_ftplib + test_socket + test_ssl + test_threaded_import failures
* Removed comments about which tests are run in which environment
* Addressed PR comments
* Annotated skips on failing tests
* Removed unneeded tests
* Removed unneeded sys import from test_ftplib
* Added annotation to test_ftplib
* Readded skipIf to test_cleanup_with_symlink_modes with a more general ENV_POLLUTING_TESTS_WINDOWS
* Addressed PR comments
* Made changes to minimize diff in PR
* Apply suggestion from @youknowone
---------
Co-authored-by: Jeong, YunWon <69878+youknowone@users.noreply.github.com>
* Use `ast.unparse` for decorator generation and every ut_method
* Ensure ut_method type for external patches
* use textwrap
* Apply patches to `test_os.py`
* Apoly on `test_xml_etree.py`
* Run on some test files
* Update `test_str.py`
* Update `test_logging.py` from 3.13.7