I am experimenting with polars
and would like to understand why using polars
is slower than using pandas
on a particular example:
import pandas as pd
import polars as pl
n=10_000_000
df1 = pd.DataFrame(range(n), columns=['a'])
df2 = pd.DataFrame(range(n), columns=['b'])
df1p = pl.from_pandas(df1.reset_index())
df2p = pl.from_pandas(df2.reset_index())
# takes ~60 ms
df1.join(df2)
# takes ~950 ms
df1p.join(df2p, on='index')