Files
RustPython/extra_tests/snippets/builtin_format.py
Changjoon 71380bead9 Fix process abort on large float format precision (#7633)
* Fix process abort on large float format precision

Formatting a float with large precision (>= ~65535) aborted the
interpreter instead of raising a Python exception. CPython handles
the same input by returning a clean string.

  # Before
  ./rustpython -c "print(f'{1.5:.1000000}')"
  thread 'main' panicked at crates/literal/src/float.rs:135:
  Formatting argument out of range   (exit 101, abort)

  # After
  ./rustpython -c "print(f'{1.5:.1000000}')"
  1.5

Root cause: Rust's `format!("{:.*}", n, x)` panics when `n`
exceeds the fmt runtime's internal precision limit. `format_fixed`
already caps `n` at u16::MAX, but `format_general` and
`format_exponent` (and the `%` branch in `crates/common/src/format.rs`)
passed user-supplied precision straight through to `format!`.

Fix:

  * Introduce `FMT_MAX_PRECISION` + `clamp_fmt_precision()` in
    crates/literal/src/float.rs. Cap is `u16::MAX - 1` because
    `{:.*e}` hits a second panic (`ndigits > 0` in core flt2dec)
    at exactly u16::MAX; the smaller value covers both paths.
  * Apply the helper to `format_fixed` (replacing the existing
    ad-hoc cap), `format_exponent` (entry), and `format_general`
    (three separate format! calls with saturating arithmetic on
    derived precision values).
  * Apply the helper in the `FormatType::Percentage` branch in
    crates/common/src/format.rs.

This is harmless for all normal inputs — f64 carries only ~17
significant digits, so precision beyond 65K is padding zeros at
best. Complex-number and old-style `%`-formatting paths transitively
benefit because they dispatch to the same library functions.

Verified:

  * cargo run -- -m test test_float test_fstring test_format:
    144 passed, 0 regressed.
  * extra_tests/snippets/builtin_format.py: all assertions pass,
    including 7 new regression cases covering e / E / g / G / f /
    % at precision 1_000_000.
  * Probed with 10 magnitude values (0, ±1.5, ±inf, nan, 1e-300,
    1e300, f64::MAX, 5e-324) x 4 format types = 40 combinations,
    plus precision 0/1/2 boundary, complex formatting, old-style
    `%` formatting, and combined specs (fill/align/sign/grouping/
    zero-pad). All return clean strings; no process abort.

* Address CodeRabbit review: split cap + drop redundant clamp

Two refinements after CodeRabbit review:

1. Drop the redundant `format!("{:.*}", precision + 1, base)` in
   `format_general`'s scientific branch. It was a no-op pre-fix
   (magnitude is `.abs()`-ed at the caller, so `base` has no sign
   and its length was exactly `precision + 1`), but after I added
   the cap it turned into an active truncate — dropping 1 char of
   precision at the cap boundary. Reuse `base` directly and extract
   `exp_precision` for reuse by `decimal_point_or_empty`.

2. Split the cap into two helpers.

   `FMT_MAX_PRECISION = u16::MAX`           — for plain `{:.*}` (format_fixed,
                                                 %-branch, format_general's
                                                 non-scientific branch).
   `FMT_MAX_EXP_PRECISION = u16::MAX - 1`   — for `{:.*e}` (format_exponent,
                                                 format_general's scientific
                                                 entry).

   The second value is one lower because `{:.*e}` trips an additional
   `ndigits > 0` assertion in `core::num::flt2dec` at exactly
   `u16::MAX`. The first commit used the tighter cap uniformly,
   which silently regressed `format_fixed` by 1 char at
   `precision == u16::MAX` (it previously capped at exactly that
   value). Two helpers restore byte-identical CPython parity for
   fixed / percent / general-non-scientific paths up through
   `precision == u16::MAX`.

Verification:
  * precision 5 .. 65534:  360 outputs byte-identical to CPython
                           across 8 magnitudes x 9 precisions x 5 types.
  * precision == 65535:    f / g / G / % now match CPython (0 diff).
                           e / E remain 1 char shorter — unavoidable
                           within the `u16::MAX - 1` exp cap.
  * precision > 65535:     output stops at cap; CPython emits full
                           padding — same design divergence as before.
  * No panic regression:   f-string default, e/E, g/G, %, f at
                           precision 1_000_000 all return cleanly.
  * Test suite:            test_float + test_fstring + test_format,
                           162 passed, 0 regressed.

* Fix ruff format: single-line precision clamp

* Address @youknowone review: byte-identical CPython parity at boundary

Per review comment on `extra_tests/snippets/builtin_format.py:209`:
the patch declares `FMT_MAX_PRECISION = u16::MAX`, so the tests must
cover 65535 and 65536 and demonstrate CPython parity at the boundary.

The previous version only avoided panic — at the cap it silently
truncated 1 char short of CPython for e / E, and thousands of chars
short for f / %  at precision beyond the cap. This commit restores
byte-identical CPython output at every precision up to the format-
spec parser's own `i32::MAX` ceiling.

Fix: pad the Rust-format result with '0's up to the user-requested
precision.

Why this is correct, not a workaround: IEEE 754 double has at most
~767 significant decimal digits; past that, every digit is
deterministically '0' in both CPython and the native Rust output.
Our cap (65534 for exp, 65535 for plain) sits far above 767, so
appending zeros reconstructs precisely what CPython would have
produced. Verified on hard inputs: `1e-100`, `5e-324` (subnormal
boundary), `f64::MAX`, mixed magnitudes — the last 100 chars of
Rust-format output at precision 65534 are all '0' for every case.

Changes:

  * `format_fixed`: after format!(), extend with (precision - capped)
    '0' chars before appending the optional decimal point.
  * `format_exponent`: same, applied to the parsed mantissa before
    reassembling with the exponent marker.
  * `FormatType::Percentage` branch: same. Also fixed a bug the
    boundary audit surfaced: the finite-input overflow guard used
    `return Ok("inf%")`, which bypasses the outer sign handler.
    Changed to a match-arm value so `format_sign_and_align` still
    runs and produces "-inf%" for `-f64::MAX`, matching CPython.

Verification:

  * 7 magnitudes × 5 precisions × 6 format types = 210 comparisons
    against CPython at precisions {65534, 65535, 65536, 100000,
    200000}. All 210 byte-identical.
  * Gap audit (complex formatting, old-style % formatting, negative
    magnitudes, -0.0, combined specs with fill / sign / alternate /
    grouping) at boundary precisions. All but 20 byte-identical.
    The 20 remaining diffs all stem from a pre-existing
    complex-imaginary-part repr bug (`1e100j` expands to 100 '0's
    in RustPython vs CPython's `1e+100j`) which reproduces on
    upstream main without any part of this patch and is out of
    scope here.
  * `cargo run -- -m test test_float test_fstring test_format`:
    162 passed, 0 regressed.
  * `extra_tests/snippets/builtin_format.py` now pins exact
    expected strings at 65534 / 65535 / 65536 / 1_000_000 for
    every format type, plus the `f64::MAX × 100 → 'inf%'`
    overflow case.
  * `cargo fmt --check`: pass.

* Clarify boundary test labels + add past-cap depth assertions

Rename the boundary-test section so the three precision points per
format type are labeled below / at / past the cap inline, making the
"past MAX_PRECISION" unhappy-case coverage explicit. Add len-based
assertions at precision 1_000_000 for f, e, and % to exercise the
cap-then-pad path at a depth far beyond the boundary.
2026-04-23 15:25:52 +09:00

254 lines
9.1 KiB
Python

from testutils import assert_raises
assert format(5, "b") == "101"
assert_raises(TypeError, format, 2, 3, _msg="format called with number")
assert format({}) == "{}"
assert_raises(TypeError, format, {}, "b", _msg="format_spec not empty for dict")
class BadFormat:
def __format__(self, spec):
return 42
assert_raises(TypeError, format, BadFormat())
def test_zero_padding():
i = 1
assert f"{i:04d}" == "0001"
test_zero_padding()
assert "{:,}".format(100) == "100"
assert "{:,}".format(1024) == "1,024"
assert "{:_}".format(65536) == "65_536"
assert "{:_}".format(4294967296) == "4_294_967_296"
assert f"{100:_}" == "100"
assert f"{1024:_}" == "1_024"
assert f"{65536:,}" == "65,536"
assert f"{4294967296:,}" == "4,294,967,296"
assert "F" == "{0:{base}}".format(15, base="X")
assert f"{255:#X}" == "0XFF"
assert f"{65:c}" == "A"
assert f"{0x1F5A5:c}" == "🖥"
assert_raises(
ValueError,
"{:+c}".format,
1,
_msg="Sign not allowed with integer format specifier 'c'",
)
assert_raises(
ValueError,
"{:#c}".format,
1,
_msg="Alternate form (#) not allowed with integer format specifier 'c'",
)
assert f"{256:#010x}" == "0x00000100"
assert f"{256:0=#10x}" == "0x00000100"
assert f"{256:0>#10x}" == "000000x100"
assert f"{256:0^#10x}" == "000x100000"
assert f"{256:0<#10x}" == "0x10000000"
assert f"{512:+#010x}" == "+0x0000200"
assert f"{512:0=+#10x}" == "+0x0000200"
assert f"{512:0>+#10x}" == "0000+0x200"
assert f"{512:0^+#10x}" == "00+0x20000"
assert f"{512:0<+#10x}" == "+0x2000000"
assert f"{123:,}" == "123"
assert f"{1234:,}" == "1,234"
assert f"{12345:,}" == "12,345"
assert f"{123456:,}" == "123,456"
assert f"{123:03_}" == "123"
assert f"{123:04_}" == "0_123"
assert f"{123:05_}" == "0_123"
assert f"{123:06_}" == "00_123"
assert f"{123:07_}" == "000_123"
assert f"{255:#010_x}" == "0x000_00ff"
assert f"{255:+#010_x}" == "+0x00_00ff"
assert f"{123.4567:,}" == "123.4567"
assert f"{1234.567:,}" == "1,234.567"
assert f"{12345.67:,}" == "12,345.67"
assert f"{123456.7:,}" == "123,456.7"
assert f"{123.456:07,}" == "123.456"
assert f"{123.456:08,}" == "0,123.456"
assert f"{123.456:09,}" == "0,123.456"
assert f"{123.456:010,}" == "00,123.456"
assert f"{123.456:011,}" == "000,123.456"
assert f"{123.456:+011,}" == "+00,123.456"
assert f"{1234:.3g}" == "1.23e+03"
assert f"{1234567:.6G}" == "1.23457E+06"
assert f"{1234:10}" == " 1234"
assert f"{1234:10,}" == " 1,234"
assert f"{1234:010,}" == "00,001,234"
assert f"{'🐍':4}" == "🐍 "
assert_raises(
ValueError, "{:,o}".format, 1, _msg="ValueError: Cannot specify ',' with 'o'."
)
assert_raises(
ValueError, "{:_n}".format, 1, _msg="ValueError: Cannot specify '_' with 'n'."
)
assert_raises(
ValueError, "{:,o}".format, 1.0, _msg="ValueError: Cannot specify ',' with 'o'."
)
assert_raises(
ValueError, "{:_n}".format, 1.0, _msg="ValueError: Cannot specify '_' with 'n'."
)
assert_raises(
ValueError, "{:,}".format, "abc", _msg="ValueError: Cannot specify ',' with 's'."
)
assert_raises(
ValueError, "{:,x}".format, "abc", _msg="ValueError: Cannot specify ',' with 'x'."
)
assert_raises(
OverflowError,
"{:c}".format,
0x110000,
_msg="OverflowError: %c arg not in range(0x110000)",
)
assert f"{3:f}" == "3.000000"
assert f"{3.1415:.0f}" == "3"
assert f"{3.1415:.1f}" == "3.1"
assert f"{3.1415:.2f}" == "3.14"
assert f"{3.1415:.3f}" == "3.142"
assert f"{3.1415:.4f}" == "3.1415"
assert f"{3.1415:#.0f}" == "3."
assert f"{3.1415:#.1f}" == "3.1"
assert f"{3.1415:#.2f}" == "3.14"
assert f"{3.1415:#.3f}" == "3.142"
assert f"{3.1415:#.4f}" == "3.1415"
assert f"{3:g}" == "3"
assert f"{3.1415:.0g}" == "3"
assert f"{3.1415:.1g}" == "3"
assert f"{3.1415:.2g}" == "3.1"
assert f"{3.1415:.3g}" == "3.14"
assert f"{3.1415:.4g}" == "3.142"
assert f"{0.000012:g}" == "1.2e-05"
assert f"{0.000012:G}" == "1.2E-05"
assert f"{3:#g}" == "3.00000"
assert f"{3.1415:#.0g}" == "3."
assert f"{3.1415:#.1g}" == "3."
assert f"{3.1415:#.2g}" == "3.1"
assert f"{3.1415:#.3g}" == "3.14"
assert f"{3.1415:#.4g}" == "3.142"
assert f"{0.000012:#g}" == "1.20000e-05"
assert f"{0.000012:#G}" == "1.20000E-05"
assert f"{3.1415:.0e}" == "3e+00"
assert f"{3.1415:.1e}" == "3.1e+00"
assert f"{3.1415:.2e}" == "3.14e+00"
assert f"{3.1415:.3e}" == "3.142e+00"
assert f"{3.1415:.4e}" == "3.1415e+00"
assert f"{3.1415:.5e}" == "3.14150e+00"
assert f"{3.1415:.5E}" == "3.14150E+00"
assert f"{3.1415:#.0e}" == "3.e+00"
assert f"{3.1415:#.1e}" == "3.1e+00"
assert f"{3.1415:#.2e}" == "3.14e+00"
assert f"{3.1415:#.3e}" == "3.142e+00"
assert f"{3.1415:#.4e}" == "3.1415e+00"
assert f"{3.1415:#.5e}" == "3.14150e+00"
assert f"{3.1415:#.5E}" == "3.14150E+00"
assert f"{3.1415:.0%}" == "314%"
assert f"{3.1415:.1%}" == "314.2%"
assert f"{3.1415:.2%}" == "314.15%"
assert f"{3.1415:.3%}" == "314.150%"
assert f"{3.1415:#.0%}" == "314.%"
assert f"{3.1415:#.1%}" == "314.2%"
assert f"{3.1415:#.2%}" == "314.15%"
assert f"{3.1415:#.3%}" == "314.150%"
assert f"{3.1415:.0}" == "3e+00"
assert f"{3.1415:.1}" == "3e+00"
assert f"{3.1415:.2}" == "3.1"
assert f"{3.1415:.3}" == "3.14"
assert f"{3.1415:.4}" == "3.142"
assert f"{3.1415:#.0}" == "3.e+00"
assert f"{3.1415:#.1}" == "3.e+00"
assert f"{3.1415:#.2}" == "3.1"
assert f"{3.1415:#.3}" == "3.14"
assert f"{3.1415:#.4}" == "3.142"
assert f"{1234.5:10}" == " 1234.5"
assert f"{1234.5:10,}" == " 1,234.5"
assert f"{1234.5:010,}" == "0,001,234.5"
assert f"{12.34 + 5.6j}" == "(12.34+5.6j)"
assert f"{12.34 - 5.6j: }" == "( 12.34-5.6j)"
assert f"{12.34 + 5.6j:20}" == " (12.34+5.6j)"
assert f"{12.34 + 5.6j:<20}" == "(12.34+5.6j) "
assert f"{-12.34 + 5.6j:^20}" == " (-12.34+5.6j) "
assert f"{12.34 + 5.6j:^+20}" == " (+12.34+5.6j) "
assert f"{12.34 + 5.6j:_^+20}" == "___(+12.34+5.6j)____"
assert f"{-12.34 + 5.6j:f}" == "-12.340000+5.600000j"
assert f"{12.34 + 5.6j:.3f}" == "12.340+5.600j"
assert f"{12.34 + 5.6j:<30.8f}" == "12.34000000+5.60000000j "
assert f"{12.34 + 5.6j:g}" == "12.34+5.6j"
assert f"{12.34 + 5.6j:e}" == "1.234000e+01+5.600000e+00j"
assert f"{12.34 + 5.6j:E}" == "1.234000E+01+5.600000E+00j"
assert f"{12.34 + 5.6j:^30E}" == " 1.234000E+01+5.600000E+00j "
assert f"{12345.6 + 7890.1j:,}" == "(12,345.6+7,890.1j)"
assert f"{12345.6 + 7890.1j:_.3f}" == "12_345.600+7_890.100j"
assert f"{12345.6 + 7890.1j:>+30,f}" == " +12,345.600000+7,890.100000j"
assert f"{123456:,g}" == "123,456"
assert f"{123456:,G}" == "123,456"
assert f"{123456:,e}" == "1.234560e+05"
assert f"{123456:,E}" == "1.234560E+05"
assert f"{123456:,%}" == "12,345,600.000000%"
# test issue 4558
x = 123456789012345678901234567890
for i in range(0, 30):
format(x, ",")
x = x // 10
# Large float precision must not abort the interpreter.
# Previously these paths hit unguarded `format!("{:.*e}", ...)` in
# crates/literal/src/float.rs and `crates/common/src/format.rs` (the `%`
# branch), which panic past Rust's fmt precision limit and killed the
# process instead of raising a Python exception. Internally the limit is
# u16::MAX; output is zero-padded past that boundary to match CPython
# byte-identically.
# Three precision points per format type — below the cap (uncapped
# path), exactly at the cap (boundary), and one past the cap (the
# unhappy case, where internal clamping plus zero-padding has to
# reconstruct CPython's output). All must byte-match CPython.
# f-format pads with trailing zeros up to the requested precision.
assert "{:.65534f}".format(1.5) == "1." + "5" + "0" * 65533 # below cap
assert "{:.65535f}".format(1.5) == "1." + "5" + "0" * 65534 # at cap
assert "{:.65536f}".format(1.5) == "1." + "5" + "0" * 65535 # past cap → padding
# e-format emits a fixed mantissa width + 'e+00'.
assert "{:.65534e}".format(1.5) == "1." + "5" + "0" * 65533 + "e+00" # below
assert "{:.65535e}".format(1.5) == "1." + "5" + "0" * 65534 + "e+00" # at cap
assert (
"{:.65536e}".format(1.5) == "1." + "5" + "0" * 65535 + "e+00"
) # past cap → padding
# %-format multiplies by 100 then applies f-format.
assert "{:.65534%}".format(1.5) == "150." + "0" * 65534 + "%" # below
assert "{:.65535%}".format(1.5) == "150." + "0" * 65535 + "%" # at cap
assert "{:.65536%}".format(1.5) == "150." + "0" * 65536 + "%" # past cap → padding
# g-format strips trailing zeros, so the short form is the natural
# representation regardless of precision.
for p in (65534, 65535, 65536, 1_000_000):
assert ("{:." + str(p) + "g}").format(1.5) == "1.5"
# Far past the cap — verifies the pad path handles arbitrary precision,
# not just one-off values near the boundary.
assert len("{:.1000000f}".format(1.5)) == 1_000_002 # "1." + 1M zeros
assert len("{:.1000000e}".format(1.5)) == 1_000_006 # + "e+00"
assert len("{:.1000000%}".format(1.5)) == 1_000_005 # "150." + 1M zeros + "%"
# Percent overflow: finite input whose *100 is +inf produces "inf%"
# rather than crashing. CPython does the same.
assert "{:.100000%}".format(1.7976931348623157e308) == "inf%"
# Shallow cases unchanged.
assert f"{1.5:.5}" == "1.5"
assert "{:.3f}".format(1.5) == "1.500"
assert "{:.2%}".format(0.25) == "25.00%"
assert "{:.4e}".format(1234.5) == "1.2345e+03"
assert "{:.3g}".format(1234.5) == "1.23e+03"
assert f"{float('nan'):.10f}" == "nan"
assert f"{float('inf'):.10f}" == "inf"