3

I'm trying to convert a Pandas Dataframe to a Polar one.

I simply used the function result_polars = pl.from_pandas(result). Conversion proceeds well, but when I check the shape of the two dataframe I get that the Polars one has half the size of the original Pandas Dataframe.

I believe that 4172903059 in length is almost the maximum dimension that the polars dataframe allows.

Does anyone have suggestions?

Here a screenshot of the shape of the two dataframes.

Here a Minimum working example

import polars as pl
import pandas as pd
import numpy as np

df = pd.DataFrame(np.zeros((4292903069,1), dtype=np.uint8))
df_polars = pl.from_pandas(df)

Using these dimensions the two dataframes have the same size. If instead I put the following:

import polars as pl
import pandas as pd
import numpy as np

df = pd.DataFrame(np.zeros((4392903069,1), dtype=np.uint8))
df_polars = pl.from_pandas(df)

The Polars dataframe has much smaller dimension (97935773).

081N
  • 35
  • 4

1 Answers1

4

The default polars wheel retrieved with pip install polars "only" allows for 2^32 e.g. ~4.2 billion rows.

Do you need more than that install pip install polars-u64-idx and uninstall the previous installation.

ritchie46
  • 10,405
  • 1
  • 24
  • 43
  • Yes, this is definitely the reason! pip install polars-u64-idx additionally requires Cargo and Rust. – 081N Feb 08 '23 at 11:47
  • On linux it should not. I think we only shipped that binary to linux atm indeed. – ritchie46 Feb 08 '23 at 12:29
  • Unfortunately, I'm on Windows and after installing Rust it gives as error "failed to get `ahash` as a dependency of package `py-polars v0.16.2". I'm unable to let it run. – 081N Feb 08 '23 at 12:45