Rust and Python differ in which properties they use for alphanumeric,
numeric, et cetera. Both languages list which properties are used which
makes it easy to mimic Python's behavior in Rust.
My previous patch was a bit shortsighted because I filtered out
combining characters from is_alphanumeric. Using properties is exact and
also much cleaner. It also covers edge cases that my initial approach
missed.
Besides isalnum, I also fixed isnumeric and isdigit in the same way by
using properties.
* fix: Python-Rust combining char diff in isalnum
Related to: #7518
Rust and Python differ on alphanumeric characters. Rust follows the
Unicode standard closer than Python. This means that is_alphanumeric
(char function in Rust) is different from isalnum (Python). To fix the
discrepancy, RustPython needs to mimic Python by rejecting certain
characters. Some classes of combining characters count as alphanumeric
in Rust but not Python. Combining characters are accent marks
that are combined with other characters to create a single grapheme.
It's possible that this PR is not exhaustive. I fixed the combining
character issue BUT I don't know the full range of discrepancies.
* fix: Ignore combining characters in SRE
Closes: #7518
* Updated re library + test
* Copied over generate_sre_constants from cpython/Tools
* Customized `generate_sre_constants.py` + ran to update `constants.rs`
* Clarified `dump_enum` docstring in `generate_sre_constants.py`
* * Added alloc_instead_of_core, std_instead_of_alloc, and std_instead_of_core clippy rules
* Manually changed part of the code to use core/alloc
* use clippy --fix to fix issues in stdlib
* * Used clippy --fix to fix issues in vm
* Imported Range in vm/src/anystr.rs
* * Used clippy --fix to fix issues in common