How can I decode a JSON string into a pydantic model with a dataframe field?

Question

I am using MongoDB to store the results of a script into a database. When I want to reload the data back into python, I need to decode the JSON (or BSON) string into a pydantic basemodel. With a pydantic model with JSON compatible types, I can just do:

base_model = BaseModelClass.parse_raw(string)

But the default json.loads decoder doesn't know how to deal with a DataFrame. I can overwrite the .parse_raw function into something like:

from pydantic import BaseModel
import pandas as pd

class BaseModelClass(BaseModel):
    df: pd.DataFrame
    
    class Config:
        arbitrary_types_allowed = True
        json_encoders = {
            pd.DataFrame: lambda df: df.to_json()
        }

    @classmethod
    def parse_raw(cls, data):
        data = json.loads(data)
        data['df'] = pd.read_json(data['df'])
        return cls(**data)

But ideally I would want to automatically decode fields of type pd.DataFrame rather than manually change the parse_raw function every time. Is there any way of doing something like:

    class Config:
        arbitrary_types_allowed = True
        json_encoders = {
            pd.DataFrame: lambda df: df.to_json()
        }
        json_decoders = {
            pd.DataFrame: lambda df: pd.read_json(data['df'])
        }

To make the detection of any field which should be a data frame, be converted to one, without having to modify the parse_raw() script?

It appears [this is the issue](https://github.com/pydantic/pydantic/discussions/4456) to track for progress on this. — ryanjdillon, Oct 11 '22 at 12:43
Maybe [Pandera](https://pandera.readthedocs.io/en/stable/pydantic_integration.html) is a better choice for you here? — Yaakov Bressler, Nov 14 '22 at 02:02
well since there isnt a best solution here yet, i would suggest you use a `root_validator` to do it, or just a `validator` and preform the conversion there — Omer Ben Haim, Dec 19 '22 at 07:25

score 1 · Answer 1 · answered Jul 16 '23 at 23:25

Pydantic V2:

You can define a custom data type and specify a serializer which will automatically handle conversions:

from typing import Annotated, Any

from pydantic import BaseModel, GetCoreSchemaHandler
import pandas as pd

from pydantic_core import CoreSchema, core_schema


class myDataFrame(pd.DataFrame):

    @classmethod
    def __get_pydantic_core_schema__(
            cls, source_type: Any, handler: GetCoreSchemaHandler
    ) -> CoreSchema:

        validate = core_schema.no_info_plain_validator_function(cls.try_parse_to_df)

        return core_schema.json_or_python_schema(
            json_schema=validate,
            python_schema=validate,
            serialization=core_schema.plain_serializer_function_ser_schema(
                lambda df: df.to_json()
            ),
        )

    @classmethod
    def try_parse_to_df(cls, value: Any):
        if isinstance(value, str):
            return pd.read_json(value)
        return value


# Create a model with your custom type
class BaseModelClass(BaseModel):
    df: myDataFrame


# Create your model
sample_df = pd.DataFrame([[1, 2], [3, 4], [5, 6], [7, 8]], columns=["A", "B"])
my_model = BaseModelClass(df=sample_df)

# Should also be able to parse from json
my_model = BaseModelClass(df=sample_df.to_json())

# Even more dramatically
my_model_2 = BaseModelClass.model_validate_json(my_model.model_dump_json())

I was looking for a solution with Pydantic V2. Thanks a ton for your solution. It saved my day :) — Sudharsan S, Aug 28 '23 at 13:36

How can I decode a JSON string into a pydantic model with a dataframe field?

1 Answers1

Pydantic V2: