In Pandas, one can perform boolean operations on boolean DataFrames with the all
and any
methods, providing an axis
argument. For example:
import pandas as pd
data = dict(A=["a","b","?"], B=["d","?","f"])
pd_df = pd.DataFrame(data)
For example, to get a boolean mask on columns containing the element "?":
(pd_df == "?").any(axis=0)
and to get a mask on rows:
(pd_df == "?").any(axis=1)
Also, to get a single boolean:
(pd_df == "?").any().any()
In comparison, in polars
, the best I could come up with are the following:
import polars as pl
pl_df = pl.DataFrame(data)
To get a mask on columns:
(pl_df == "?").select(pl.all().any())
To get a mask on rows:
pl_df.select(
pl.concat_list(pl.all() == "?").alias("mask")
).select(
pl.col("mask").arr.eval(pl.element().any()).arr.first()
)
And to get a single boolean value:
pl_df.select(
pl.concat_list(pl.all() == "?").alias("mask")
).select(
pl.col("mask").arr.eval(pl.element().any()).arr.first()
)["mask"].any()
The last two cases seem particularly verbose and convoluted for such a straightforward task, so I'm wondering whether there are shorter/faster equivalents?