4

I have a Polars dataframe in the form:

df = pl.DataFrame({'a':[1,2,3], 'b':[['a','b'],['a'],['c','d']]}) 
┌─────┬────────────┐
│ a   ┆ b          │
│ --- ┆ ---        │
│ i64 ┆ list[str]  │
╞═════╪════════════╡
│ 1   ┆ ["a", "b"] │
│ 2   ┆ ["a"]      │
│ 3   ┆ ["c", "d"] │
└─────┴────────────┘

I want to convert it to the following form. I plan to save to a parquet file, and query the file (with sql).

┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ "a" │
│ 1   ┆ "b" │
│ 2   ┆ "a" │
│ 3   ┆ "c" │
│ 3   ┆ "d" │
└─────┴─────┘

I have seen an answer that works on struct columns, but df.unnest('b') on my data results in the error:

SchemaError: Series of dtype: List(Utf8) != Struct

I also found a github issue that shows list can be converted to a struct, but I can't work out how to do that, or if it applies here.

kristianp
  • 5,496
  • 37
  • 56

1 Answers1

4

To decompose column with Lists, you can use .explode() method (doc)

df = pl.DataFrame({'a':[1,2,3], 'b':[['a','b'],['a'],['c','d']]})

df.explode("b")
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ str │
╞═════╪═════╡
│ 1   ┆ a   │
│ 1   ┆ b   │
│ 2   ┆ a   │
│ 3   ┆ c   │
│ 3   ┆ d   │
└─────┴─────┘
glebcom
  • 1,131
  • 5
  • 14