As part of the kaggle competition (https://www.kaggle.com/competitions/amex-default-prediction/overview), I'm trying to take advantage of a trick where they (other competitors sharing their solution) reduce the size of a column by interpreting a hexadecimal string as a base-16 uint64. I'm trying to work out if this is possible in polars /rust:
# The python approach - this is used via .apply in pandas.
string = "0000099d6bd597052cdcda90ffabf56573fe9d7c79be5fbac11a8ed792feb62a"
def func(x):
return int(string[-16:], 16)
func(string)
# 13914591055249847850
My attempt at a solution in polars yields nearly the right answer, but the final digits are off, which is a bit confusing:
import polars as pl
def func(x: str) -> int:
return int(x[-16:], 16)
strings = [
"0000099d6bd597052cdcda90ffabf56573fe9d7c79be5fbac11a8ed792feb62a",
"00000fd6641609c6ece5454664794f0340ad84dddce9a267a310b5ae68e9d8e5",
]
df = pl.DataFrame({"id": strings})
result_polars = df.with_column(pl.col("id").apply(func).cast(pl.UInt64)).to_series().to_list()
result_python = [func(x) for x in strings]
result_polars, result_python
# ([13914591055249848320, 11750091188498716672],
# [13914591055249847850, 11750091188498716901])
I've also tried casting directly from utf-8 to uint64, but I get the following error, which yields null
s if I pass strict=False
.
df.with_column(pl.col("id").str.slice(-16).cast(pl.UInt64)).to_series().to_list()
###
ComputeError: strict conversion of cast from Utf8 to UInt64 failed. consider non-strict cast.
If you were trying to cast Utf8 to Date,Time,Datetime, consider using `strptime`