This runs on a single core, despite not using (seemingly) any non-Polars stuff. What am I doing wrong?
(the goal is to convert a list in doc_ids
field in every row into its string representation, s.t. [1, 2, 3]
(list[int]) -> '[1, 2, 3]'
(string))
import polars as pl
df = pl.DataFrame(dict(ent = ['a', 'b'], doc_ids = [[2,3], [3]]))
df = (df.lazy()
.with_column(
pl.concat_str([
pl.lit('['),
pl.col('doc_ids').apply(lambda x: x.cast(pl.Utf8)).arr.join(', '),
pl.lit(']')
])
.alias('docs_str')
)
.drop('doc_ids')
).collect()