pandera provides a flexible and expressive API for performing data validation on dataframes to make data processing pipelines more readable and robust.
Questions tagged [pandera]
38 questions
1
vote
1 answer
Pandera - validation based on multiple columns
I have created a Pandera validation schema for a Pandas dataframe with ~150 columns, like the first two rows in the schema below.
The single column validation is working, but how can I combine two or more columns for validation?
I found two related…

wl_
- 37
- 7
1
vote
1 answer
Initializing Class Attributes With Pydantic and Pandera
I'm new to Pydantic and Pandera, and need some help with class instantiation and initialization.
I have the following code in one file, sim.py:
import pandera as pa
from pydantic import BaseModel
from datetime import datetime
class…

Horse
- 137
- 5
1
vote
1 answer
validation check - dictionary value types
After converting my csv to dictionary with pandas, a sample of the dictionary will look like this:
[{'Name': '1234', 'Age': 20},
{'Name': 'Alice', 'Age': 30.1},
{'Name': '5678', 'Age': 41.0},
{'Name': 'Bob 1', 'Age': 14},
{'Name': '!@#$%',…

H_SOCIAL MEDIA
- 25
- 6
1
vote
0 answers
Creating alias for a pandera type: DataFrame with particular pandera schema
I want to define a type to be a DataFrame with a particular pandera schema. However, when I lint this code:
from pandera.typing import DataFrame, Series
class MySchema(pa.SchemaModel):
foo: Series[str]
MyDF = DataFrame[MySchema]
def…

Mose Wintner
- 290
- 1
- 10
1
vote
1 answer
Checking for units with Pandera
Recently started using Pandera; what an excellent Python Package!
Does anyone know if it is possible to include so-called metadata of a column into the SchemaModel of a dataframe? For instance, add the unit of a column (seconds, kilometers,…

flow_me_over
- 182
- 9
1
vote
1 answer
pandera - use decorator to specify multiple output schemas
I would like to know if it is possible to use pandera decorator to specify multiple output schemas.
Let's say for example you have a function that returns 2 dataframes and you want to check the schema of these dataframes using check_io()…

kanimbla
- 858
- 1
- 9
- 23
0
votes
0 answers
Data quality rules with pandera for PV timeseries
I am trying to apply some data quality rules using pandera library. I am trying to check the quality of PV timeseries and i want to apply these 2 rules (if there are any negative values, and if a particular threshold is exceeded). I tried this code,…

Robin_hood_963
- 65
- 6
0
votes
0 answers
How to set a custom name for сustom сheck method in pandera?
Using custom checks, I couldn't find how to define a custom name to check. I want something other than null or None to be displayed in the error log
@register_check_method(statistics=["str_length"], check_type="element_wise")
def…
0
votes
1 answer
How to parse multiple date formats using pandera schema
How can I process a column containing datetimes in two formats, both "%Y-%m-%dT%H:%M", and "%Y-%m-%dT%H:%M:%S" ?
MWE showing what I'm trying to do:
from pandera.engines import pandas_engine
from pathlib import Path
import io
import pandas as…

baxx
- 3,956
- 6
- 37
- 75
0
votes
1 answer
A pandera DataFrame Schema with special characters in column names
I have received a dataframe from an institute and the column names have some special characters which are not allowed in Python variable naming. I would like to use the DataFrameModel and NOT the DataFrameSchema in pandera to create a schema to…

wisedoe
- 101
- 2
0
votes
0 answers
Why is Pandera's documented example really slow?
I tried the following example from the Panderas documentation:
import pandera as pa
# define schema
schema = pa.DataFrameSchema({
"column1": pa.Column(int, checks=pa.Check.le(10)),
"column2": pa.Column(float, checks=pa.Check.lt(-1.2)),
…

Galen
- 1,128
- 1
- 14
- 31
0
votes
1 answer
Finding where Pandera schemas are different
I have two Pandas data frames df1 and df2 which should have the same inferred Pandera schema. Unfortunately they do not because when I run pa.infer_schema(df1) != pa.infer_schema(df2) I get a return of False. The print out (which should be __repr__)…

Galen
- 1,128
- 1
- 14
- 31
0
votes
0 answers
How can I specify the equivalent of a spark ArrayType(StringType()) in a pandas dataframe and use a Pandera schema
I would like to be able to specify the dtype within the array as well. Spark how something like ArrayType(StringType()). Does this exist in python/pandas/numpy/pandera world?
import pandas as pd
import pandera as pa
import numpy as np
# data to…

jwsmithers
- 276
- 2
- 5
- 15
0
votes
1 answer
Pandas Dataframe - enforce data properties
I would like to enforce properties on pandas data tables.
Mostly a "uniqueness" of "primary keys" in the table would be interesting.
Is there a way to ensure such properties without having to call a validation function?
It would be preferable that…

David K.
- 76
- 6
0
votes
0 answers
Pandera. Column validation based on a secondary column
Is it possible to validate a column based on another column using Pandera?
My dataframe looks like this:
df = pd.DataFrame({
"Name": ["Thomas","",""],
"Address": ["Address 1", "Address 1", "Address 3"],
"Zip": ["65989", "65989",…

Nebiros
- 13
- 4