I want to calculate spearman (rank) correlation in a groupby
context using DuckDB/SQL syntax. I tried the following, but failed.
import duckdb
import pandas as pd
df = pd.DataFrame(
{
"a": [1, 1, 2, 2, 6, 1, 3, 6, 3],
"b": [4, 2, 6, 4, 3, 1, 6, 4, 8],
"c": [1, 1, 1, 1, 2, 2, 2, 2, 2],
}
)
duckdb.sql(
"""
SELECT c, corr(rank() OVER (ORDER BY a), rank() OVER (ORDER BY b)) as corr
FROM df
GROUP BY c
"""
)
Error message:
Parse Error: aggregate function calls cannot contain window function calls
Expected result:
c corr
- ----
1 0.70711
2 0.10541