I have a dataframe as follows:
df = pl.DataFrame(
{"a": [([1, 2, 3], [2, 3, 4], [6, 7, 8]), ([1, 2, 3], [3, 4, 5], [5, 7, 9])]}
)
Basically, each cell of a
is a tuple of three arrays of the same length. I want to fully split them to separate columns (one scalar resides in one column) like the shape below:
shape: (2, 9)
┌─────────┬─────────┬─────────┬─────────┬─────┬─────────┬─────────┬─────────┬─────────┐
│ field_0 ┆ field_1 ┆ field_2 ┆ field_3 ┆ ... ┆ field_5 ┆ field_6 ┆ field_7 ┆ field_8 │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════════╪═════════╪═════════╪═════════╪═════╪═════════╪═════════╪═════════╪═════════╡
│ 1 ┆ 2 ┆ 3 ┆ 2 ┆ ... ┆ 4 ┆ 6 ┆ 7 ┆ 8 │
│ 1 ┆ 2 ┆ 3 ┆ 3 ┆ ... ┆ 5 ┆ 5 ┆ 7 ┆ 9 │
└─────────┴─────────┴─────────┴─────────┴─────┴─────────┴─────────┴─────────┴─────────┘
One way I have tried is to use arr.to_struct
and unnest
two times to fully flatten the two nested levels. Two levels is fine here, but if there are a variety of nested levels and the number could not be determined ahead, the code will be so long.
Is there any simpler (or more systematic) way to achieve this?