0

How do I generate a JSON schema in Python for the below JSON and validate every json against the schema?

Requirements:

  • There is a type which can be CSV or JSON or JSON
  • There is a list of properties.
  • Each property has a key which is a column name and the value is a dict which defines the column attributes. The first column col1 has an attributes type string and corresponds to index zero.
    {
            "type": "csv/json",
            "properties": {
                "col1": {
                    "type": "string",
                    "index":0
                },
                "col2": {
                    "type": "number",
                    "index":1
                },....
            }
        }

How do I generate a JSON schema for this json?

Sample Valid json

{
    "type": "csv",
    "properties": {
        "header1": {
            "type": "string",
            "index":0
        },
        "header2": {
            "type": "number",
            "index":1
        }   
    }
}

Sample invalid json (because the type is bool for header 1 and misses index attribute)

{
    "type": "CSV",
    "properties": {
        "header1": {
            "type": "bool"
        },
        "header2": {
            "type": "number",
            "index":1
        }   
    }
}

Harish
  • 565
  • 1
  • 12
  • 34
  • hi! just to clarify, `type` is basically permits 3 values: `csv`, `json`, and `csv/json` - is that correct? The JSON follows the same schema for these three categories? – Simon David May 16 '23 at 06:29

2 Answers2

1

You can use jsonschema library for generate a JSON schema in Python and validate against the schema.

Install jsonschema first by pip install jsonschema

Now you can use jsonschema for generate a JSON schema for your JSON structure and validate it.

for eg:

import jsonschema
from jsonschema import validate

# Define the JSOn schema
schema = {
    "type": "object",
    "properties": {
        "type": {"enum": ["csv", "json"]},
        "properties": {
            "type": "object",
            "patternProperties": {
                "^.*$": {
                    "type": "object",
                    "properties": {
                        "type": {"type": "string"},
                        "index": {"type": "integer"},
                    },
                    "required": ["type", "index"],
                }
            },
            "additionalProperties": False,
        },
    },
    "required": ["type", "properties"],
}

# Sample valid JSON
valid_json = {
    # Your Sample valid JSON Goes here..
    },
}

# Sample invalid JSON
invalid_json = {
    # Your Invalidate JSON Goes here..
    },
}

# Validate JSON against the schema
try:
    validate(instance=valid_json, schema=schema)
    print("Valid JSON")
except jsonschema.exceptions.ValidationError as e:
    print("Invalid JSON:", e)

try:
    validate(instance=invalid_json, schema=schema)
    print("Valid JSON")
except jsonschema.exceptions.ValidationError as e:
    print("Invalid JSON:", e)

You can customize JSON schema according to your specific requirements.

0

You can use the marshmallow library to specify a schema against which you want to validate your JSONs:

from marshmallow import Schema, fields, validate

class ColumnProperty(Schema):
    type = fields.Str(
        required=True,
        validate=validate.OneOf(["string", "number"])
    )
    index = fields.Integer(
        required=True
    )

class JSONSchema(Schema):
    type = fields.Str(
        required=True,
        validate=validate.OneOf(["json", "csv", "csv/json"])
    )
    properties = fields.Dict(
        keys=fields.String(),
        values=fields.Nested(ColumnProperty)
    )

Let's define your examples and see how we can validate them against this schema:

# instantiate schema
schema = JSONSchema()

example_data1 = {
    "type": "csv/json",
    "properties": {
        "col1": {
            "type": "string",
            "index":0
            },
        "col2": {
            "type": "number",
            "index":1
            }
    }
}

result = schema.load(example_data1)  # passes

example_data2 = {
    "type": "csv",
    "properties": {
        "header1": {
            "type": "string",
            "index":0
        },
        "header2": {
            "type": "number",
            "index":1
        }   
    }
}

result = schema.load(example_data2)  # passes

example_data3 = {
    "type": "CSV",
    "properties": {
        "header1": {
            "type": "bool"
        },
        "header2": {
            "type": "number",
            "index":1
        }   
    }
}

result = schema.load(example_data3) # raises: ValidationError: {'properties': defaultdict(<class 'dict'>,
 # {'header1': {'value': {'index': ['Missing data for required field.'], 
 # 'type': ['Must be one of: string, number.']}}}), 'type': ['Must be one of: json, csv, csv/json.']}
Simon David
  • 663
  • 3
  • 13