Questions tagged [pandera]

pandera provides a flexible and expressive API for performing data validation on dataframes to make data processing pipelines more readable and robust.

38 questions
1
vote
1 answer

Pandera - validation based on multiple columns

I have created a Pandera validation schema for a Pandas dataframe with ~150 columns, like the first two rows in the schema below. The single column validation is working, but how can I combine two or more columns for validation? I found two related…
wl_
  • 37
  • 7
1
vote
1 answer

Initializing Class Attributes With Pydantic and Pandera

I'm new to Pydantic and Pandera, and need some help with class instantiation and initialization. I have the following code in one file, sim.py: import pandera as pa from pydantic import BaseModel from datetime import datetime class…
Horse
  • 137
  • 5
1
vote
1 answer

validation check - dictionary value types

After converting my csv to dictionary with pandas, a sample of the dictionary will look like this: [{'Name': '1234', 'Age': 20}, {'Name': 'Alice', 'Age': 30.1}, {'Name': '5678', 'Age': 41.0}, {'Name': 'Bob 1', 'Age': 14}, {'Name': '!@#$%',…
1
vote
0 answers

Creating alias for a pandera type: DataFrame with particular pandera schema

I want to define a type to be a DataFrame with a particular pandera schema. However, when I lint this code: from pandera.typing import DataFrame, Series class MySchema(pa.SchemaModel): foo: Series[str] MyDF = DataFrame[MySchema] def…
Mose Wintner
  • 290
  • 1
  • 10
1
vote
1 answer

Checking for units with Pandera

Recently started using Pandera; what an excellent Python Package! Does anyone know if it is possible to include so-called metadata of a column into the SchemaModel of a dataframe? For instance, add the unit of a column (seconds, kilometers,…
flow_me_over
  • 182
  • 9
1
vote
1 answer

pandera - use decorator to specify multiple output schemas

I would like to know if it is possible to use pandera decorator to specify multiple output schemas. Let's say for example you have a function that returns 2 dataframes and you want to check the schema of these dataframes using check_io()…
kanimbla
  • 858
  • 1
  • 9
  • 23
0
votes
0 answers

Data quality rules with pandera for PV timeseries

I am trying to apply some data quality rules using pandera library. I am trying to check the quality of PV timeseries and i want to apply these 2 rules (if there are any negative values, and if a particular threshold is exceeded). I tried this code,…
0
votes
0 answers

How to set a custom name for сustom сheck method in pandera?

Using custom checks, I couldn't find how to define a custom name to check. I want something other than null or None to be displayed in the error log @register_check_method(statistics=["str_length"], check_type="element_wise") def…
0
votes
1 answer

How to parse multiple date formats using pandera schema

How can I process a column containing datetimes in two formats, both "%Y-%m-%dT%H:%M", and "%Y-%m-%dT%H:%M:%S" ? MWE showing what I'm trying to do: from pandera.engines import pandas_engine from pathlib import Path import io import pandas as…
baxx
  • 3,956
  • 6
  • 37
  • 75
0
votes
1 answer

A pandera DataFrame Schema with special characters in column names

I have received a dataframe from an institute and the column names have some special characters which are not allowed in Python variable naming. I would like to use the DataFrameModel and NOT the DataFrameSchema in pandera to create a schema to…
wisedoe
  • 101
  • 2
0
votes
0 answers

Why is Pandera's documented example really slow?

I tried the following example from the Panderas documentation: import pandera as pa # define schema schema = pa.DataFrameSchema({ "column1": pa.Column(int, checks=pa.Check.le(10)), "column2": pa.Column(float, checks=pa.Check.lt(-1.2)), …
Galen
  • 1,128
  • 1
  • 14
  • 31
0
votes
1 answer

Finding where Pandera schemas are different

I have two Pandas data frames df1 and df2 which should have the same inferred Pandera schema. Unfortunately they do not because when I run pa.infer_schema(df1) != pa.infer_schema(df2) I get a return of False. The print out (which should be __repr__)…
Galen
  • 1,128
  • 1
  • 14
  • 31
0
votes
0 answers

How can I specify the equivalent of a spark ArrayType(StringType()) in a pandas dataframe and use a Pandera schema

I would like to be able to specify the dtype within the array as well. Spark how something like ArrayType(StringType()). Does this exist in python/pandas/numpy/pandera world? import pandas as pd import pandera as pa import numpy as np # data to…
jwsmithers
  • 276
  • 2
  • 5
  • 15
0
votes
1 answer

Pandas Dataframe - enforce data properties

I would like to enforce properties on pandas data tables. Mostly a "uniqueness" of "primary keys" in the table would be interesting. Is there a way to ensure such properties without having to call a validation function? It would be preferable that…
David K.
  • 76
  • 6
0
votes
0 answers

Pandera. Column validation based on a secondary column

Is it possible to validate a column based on another column using Pandera? My dataframe looks like this: df = pd.DataFrame({ "Name": ["Thomas","",""], "Address": ["Address 1", "Address 1", "Address 3"], "Zip": ["65989", "65989",…
Nebiros
  • 13
  • 4