0

I tried the following example from the Panderas documentation:

import pandera as pa

# define schema
schema = pa.DataFrameSchema({
    "column1": pa.Column(int, checks=pa.Check.le(10)),
    "column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
    "column3": pa.Column(str, checks=[
        pa.Check.str_startswith("value_"),
        # define custom checks as functions that take a series as input and
        # outputs a boolean or boolean Series
        pa.Check(lambda s: s.str.split("_", expand=True).shape[1] == 2)
    ]),
})

schema.example(size=100)

I got multiple warnings of the form:

UserWarning: Column check doesn't have a defined strategy. Falling back to filtering drawn values based on the check definition. This can considerably slow down data-generation.

and it was around 10-30 seconds for the example to finish generating. This feels like something that should be much faster out of the box. Why is it so slow?

Galen
  • 1,128
  • 1
  • 14
  • 31

0 Answers0