I have a simple dataframe as follows:
import polars as pl
df = pl.DataFrame(
{
"group": [1, 1, 1, 1, 2, 2, 2, 2],
"a": [1, 2, 3, 4, 1, 2, 3, 4],
"b": [5, 1, 7, 9, 2, 4, 9, 7],
"c": [2, 6, 3, 9, 1, 5, 3, 6],
}
)
I want to have a correlation 'matrix' resides in polars dataframe structured like the one below. How can I do that?
┌───────┬──────┬──────────┬──────────┬──────────┐
│ group ┆ name ┆ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪══════╪══════════╪══════════╪══════════╡
│ 1 ┆ a ┆ 1.0 ┆ 0.680336 ┆ 0.734847 │
│ 1 ┆ b ┆ 0.680336 ┆ 1.0 ┆ 0.246885 │
│ 1 ┆ c ┆ 0.734847 ┆ 0.246885 ┆ 1.0 │
│ 2 ┆ a ┆ 1.0 ┆ 0.830455 ┆ 0.756889 │
│ 2 ┆ b ┆ 0.830455 ┆ 1.0 ┆ 0.410983 │
│ 2 ┆ c ┆ 0.756889 ┆ 0.410983 ┆ 1.0 │
└───────┴──────┴──────────┴──────────┴──────────┘
Currently, this is what I tried:
df.groupby("group").agg(
[
pl.corr(col1, col2).alias(f"{col1}_{col2}")
for col1 in ["a", "b", "c"]
for col2 in ["a", "b", "c"]
]
)
shape: (2, 10)
┌───────┬─────┬──────────┬──────────┬─────┬──────────┬──────────┬──────────┬─────┐
│ group ┆ a_a ┆ a_b ┆ a_c ┆ ... ┆ b_c ┆ c_a ┆ c_b ┆ c_c │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ f64 ┆ f64 ┆ ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═══════╪═════╪══════════╪══════════╪═════╪══════════╪══════════╪══════════╪═════╡
│ 2 ┆ 1.0 ┆ 0.830455 ┆ 0.756889 ┆ ... ┆ 0.410983 ┆ 0.756889 ┆ 0.410983 ┆ 1.0 │
│ 1 ┆ 1.0 ┆ 0.680336 ┆ 0.734847 ┆ ... ┆ 0.246885 ┆ 0.734847 ┆ 0.246885 ┆ 1.0 │
└───────┴─────┴──────────┴──────────┴─────┴──────────┴──────────┴──────────┴─────┘
So, not sure about how can I transform it to the shape/structure I want? Or, are there some other (potentially better) ways to generate the results I want directly?