I'm really new to Polars.
After running this code
fn read_csv() -> Result<(), PolarsError> {
println!("Hello, polars! ");
let df = CsvReader::from_path("./test_data/tar-data/csv/unziped/body.csv")?
.has_header(false)
.finish()?;
let df = df.lazy().select([
col("column_15").str().split("|").alias("origin"),
col("column_16").str().split("|").alias("destination"),
]);
let mut df = df.collect()?;
println!("Schema {:?}", df.schema());
println!("{:?}", df);
Ok(())
}
I have two columns with lists like bellow. Each column is a List of utf8.
Schema Schema:
name: origin, data type: List(Utf8)
name: destination, data type: List(Utf8)
shape: (10, 2)
┌───────────────────────┬───────────────────────┐
│ origin ┆ destination │
│ --- ┆ --- │
│ list[str] ┆ list[str] │
╞═══════════════════════╪═══════════════════════╡
│ ["JOI", "GRU", "DFW"] ┆ ["VCP", "DFW", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "ATL"] ┆ ["GRU", "ATL", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "MEX"] ┆ ["GRU", "MEX", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "MCO"] ┆ ["GRU", "MCO", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "IAH"] ┆ ["VCP", "IAH", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "ORD"] ┆ ["VCP", "ORD", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "JFK"] ┆ ["GRU", "JFK", "SLC"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ["JOI", "GRU", "EWR"] ┆ ["GRU", "EWR", "SLC"] │
└───────────────────────┴───────────────────────┘
Now I need to zip this data info a single column (or Series in Polaris terms) in a form of list of structs.
My goal is to save the DataFrame as JSON with following structure
{
"MyData": [
[
{
"Origin": "JOI",
"Destination": "VCP"
},
{
"Origin": "GRU",
"Destination": "DFW"
},
{
"Origin": "DFW",
"Destination": "SLC"
}
],
[
{
"Origin": "JOI",
"Destination": "GRU"
},
{
"Origin": "GRU",
"Destination": "ATL"
},
{
"Origin": "ATL",
"Destination": "SLC"
}
],
......
]
}
Which is actually a kind of
name: MyData, data type: List(List(Struct([Field { name: "origin", dtype: Utf8 }, Field { name: "destination", dtype: Utf8 }])))
I was trying to approach this with apply, fold_exprs, etc, but with no luck.
So, actually my question is how to create a column with list of predefined structs from N columns with list of data?