5

I have a json object that I want to validate using Pydantic.

The problem I am facing is that:

1 - I don't know how many fields I will have in the JSON. The example below has 2 keys\fields: "225_5_99_0" and "225_5_99_1"

2 - I don't know the names of the fields in this case: It can be "225_5_99_0" or "226_0_0_0". I only know that they adhere to some convention.

3 - Some of the fields have keywords that are not valid for the model's field names, like the '*' key. This I might overcome by aliasing the field name to "asterisk".

Any ideas on the best way to tackle such a JSON?

{
    "225_5_99_0": {
        "*": {
            "flag": "",
            "group1": "225.5.99.0",
            "group_type": "M",
            "uptime": "0:11:23",
            "iif": {"lo5": {
                "flag": "R",
                "uptime": "0:44:41"
            }
            },
            "oil": {
                "bun_1215": {
                    "flag": "",
                    "join_time": "00:03:14",
                    "uptime": "0:42:27"
                },
                "bun_1218": {
                    "flag": "",
                    "join_time": "00:02:44",
                    "uptime": "0:44:41"
                }
            },
            "address": "100.100.100.100",
            "rp": "*",
            "source": "*",
            "upstream": "Joined(00:00:19)"
        }
    },
    "225_5_99_1": {
        "*": {
            "flag": "",
            "group1": "225.5.99.0",
            "group_type": "M",
            "uptime": "0:11:23",
            "iif": {"lo5": {
                "flag": "R",
                "uptime": "0:44:41"
            }
            },
            "oil": {
                "bun_1215": {
                    "flag": "",
                    "join_time": "00:03:14",
                    "uptime": "0:42:27"
                },
                "bun_1218": {
                    "flag": "",
                    "join_time": "00:02:44",
                    "uptime": "0:44:41"
                }
            },
            "address": "100.100.100.100",
            "rp": "*",
            "source": "*",
            "upstream": "Joined(00:00:19)"
        }
    },        
  }
}
RaamEE
  • 3,017
  • 4
  • 33
  • 53
  • Could do with a bit more info: what exactly do you want to validate about it? If you could post some draft model that would help. In general pydantic is a *parsing* library, not a validation library: the difference likely doesn't matter here, but you want to think of it as giving you valid *models* (and bailing if it can't) rather than validating its input. Thus it would help to know what you want the model to look like... – 2e0byo Oct 26 '21 at 11:03
  • 2
    Related https://stackoverflow.com/a/69265184/13782669 – alex_noname Oct 26 '21 at 11:23
  • Thanks @alex_noname - I used your reply to the linked question. – RaamEE Oct 26 '21 at 13:21

1 Answers1

2

Thanks to @alex_noname 's reply, I got what I needed. I've attached the full code and printout at the end.

To follow on the sections in my question:

1 - I don't know how many fields I will have in the JSON. The example below has 2 keys\fields: "225_5_99_0" and "225_5_99_1"

The class Example must define the root attribute as a dictionary, so it becomes a dictionary of the nested objects.

You must also implement the iter and getitem to make Example class behave like a dict\list that it is now.

Also see: Custom Root Types


2 - I don't know the names of the fields in this case: It can be "225_5_99_0" or "226_0_0_0". I only know that they adhere to some convention.

I defined:

UnderScoreNumbers = constr(regex=r'^[0-9_]*$')
...
__root__: Dict[UnderScoreNumbers, Dict]

so the keys are now validated as a string consisting of numbers and underscores, e.g. 225_5_99_0 etc.


3 - Some of the fields have keywords that are not valid for the model's field names, like the '*' key. This I might overcome by aliasing the field name to "asterisk".

I defined a field with an alias '' to consume the dict's key '' which is unique in the json. I think that if you can influence the developers that generate the json, suggest avoiding fields that can't be valid attribute names, e.g. '*', '123a', etc.

asterisk: Dict[str, Dict] = Field(alias='*')


from typing import Dict

from pydantic import BaseModel, constr

example_dict = {
    "225_5_99_0": {
        "*": {
            "flag": "",
            "group1": "225.5.99.0",
            "group_type": "M",
            "uptime": "0:11:23",
            "iif": {"lo5": {
                "flag": "R",
                "uptime": "0:44:41"
            }
            },
            "oil": {
                "bun_1215": {
                    "flag": "",
                    "join_time": "00:03:14",
                    "uptime": "0:42:27"
                },
                "bun_1218": {
                    "flag": "",
                    "join_time": "00:02:44",
                    "uptime": "0:44:41"
                }
            },
            "address": "100.100.100.100",
            "rp": "*",
            "source": "*",
            "upstream": "Joined(00:00:19)"
        }
    },
    "225_5_99_1": {
        "*": {
            "flag": "",
            "group1": "225.5.99.0",
            "group_type": "M",
            "uptime": "0:11:23",
            "iif": {"lo5": {
                "flag": "R",
                "uptime": "0:44:41"
            }
            },
            "oil": {
                "bun_1215": {
                    "flag": "",
                    "join_time": "00:03:14",
                    "uptime": "0:42:27"
                },
                "bun_1218": {
                    "flag": "",
                    "join_time": "00:02:44",
                    "uptime": "0:44:41"
                }
            },
            "address": "100.100.100.100",
            "rp": "*",
            "source": "*",
            "upstream": "Joined(00:00:19)"
        }
    },        
  }
}

class Asterisk(BaseModel):
    asterisk: Dict[str, Dict] = Field(alias='*')

UnderScoreNumbers = constr(regex=r'^[0-9_]*$')


class Example(BaseModel):
    __root__: Dict[UnderScoreNumbers, Dict]

    def __iter__(self):
        return iter(self.__root__)

    def __getitem__(self, item):
        return self.__root__[item]


if __name__ == '__main__':
    e1 = Example.parse_obj(example_dict)
    for key in e1:
        print(f'{key}: {e1[key]}')
    print(e1)
    print('The End!')

Output:

225_5_99_0: {'*': {'flags': '', ...
225_5_99_1: {'*': ...
__root__={'225_5_99_0': {'*': {'flags': '', ....
RaamEE
  • 3,017
  • 4
  • 33
  • 53