0

Is there any way to specify a format specifier if, for example, casting a pl.Float32, without resorting to complex searches for the period character? As in something like:

s = pl.Series([1.2345, 2.3456, 3.4567])

s.cast(pl.Utf8, fmt="%0.2f") # fmt obviously isn't an argument

My current method is the following:

n = 2 # number of decimals desired
expr = pl.concat_str((
    c.floor().cast(pl.Int32).cast(pl.Utf8),
    pl.lit('.'),
    ((c%1)*(10**n)).round(0).cast(pl.Int32).cast(pl.Utf8)
)).str.ljust(width)

i.e separate the pre-decimal and post-decimal, format individually as strings, and concat together. Is there an easier way to do this?

Expected output:

shape: (3,)
Series: '' [str]
[
    "1.23"
    "2.34"
    "3.45"
]
FObersteiner
  • 22,500
  • 8
  • 42
  • 72
NedDasty
  • 192
  • 1
  • 8

1 Answers1

3

I'm not aware of a direct way to specify a format when casting, but here's two easy ways to obtain a specific number of decimal points.

Use write_csv

We can write a DataFrame as a csv file (to a StringIO buffer), which allows us to set a float_precision parameter. We can then use read_csv to parse the StringIO buffer to obtain our result. (This is much faster than you might think.) Note: we must use infer_schema_length=0 in the read_csv to prevent parsing the string back to a float.

from io import StringIO

s = pl.Series([1.2345, 2.3456, 3.4567])

n = 2
(
    pl.read_csv(
        StringIO(
            pl.select(s)
            .write_csv(float_precision=n)
        ),
        infer_schema_length=0
    )
    .to_series()
)
shape: (3,)
Series: '1.23' [str]
[
        "1.23"
        "2.35"
        "3.46"
]

Pad with zeros and then use a single regex

Another approach is to cast to a string and then append zeroes. From this, we can use a single regex expression to extract our result.

n = 2
zfill = '0' * n
regex = r"^([^\.]*\..{" + str(n) + r"})"
(
    pl.select(s)
    .with_column(
        pl.concat_str([
            pl.col(pl.Float64).cast(pl.Utf8),
            pl.lit(zfill)
        ])
        .str.extract(regex)
    )
    .to_series()
)

shape: (3,)
Series: '' [str]
[
        "1.23"
        "2.34"
        "3.45"
]