Why is Pandera's documented example really slow?

Asked Mar 15 '23 at 18:23

Active Mar 15 '23 at 18:23

Viewed 116 times

I tried the following example from the Panderas documentation:

import pandera as pa

# define schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, checks=pa.Check.le(10)),
    "column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
    "column3": pa.Column(str, checks=[
        pa.Check.str_startswith("value_"),
        # define custom checks as functions that take a series as input and
        # outputs a boolean or boolean Series
        pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

schema.example(size=100)

I got multiple warnings of the form:

UserWarning: Column check doesn't have a defined strategy. Falling back to filtering drawn values based on the check definition. This can considerably slow down data-generation.

and it was around 10-30 seconds for the example to finish generating. This feels like something that should be much faster out of the box. Why is it so slow?

asked Mar 15 '23 at 18:23

Galen

1,128
1
14
31

Did you report it as an [issue](https://github.com/unionai-oss/pandera/issues) or see if it's a known issue? – Random Davis Mar 15 '23 at 18:32

Why is Pandera's documented example really slow?

0 Answers0