0

I have a JSON Schema file like this one, which contains a couple of intentional bugs:

{
    "$schema": "http://json-schema.org/schema#",
    "type": "object",
    "description": "MWE for JSON Schema Validation",
    "properties": {
      "valid_prop": {
        "type": ["string", "number"],
        "description": "This can be either a string or a number."
      },
      "invalid_prop": {
        // NOTE: "type:" here should have been "type" (without the colon)
        "type:": ["string", "null"],
        "description": "Note the extra colon in the name of the type property above"
      }
    },
    // NOTE: Reference to a non-existent property
    "required": ["valid_prop", "nonexistent_prop"]
}

I'd like to write a Python script (or, even better, install a CLI with PiP) that can find those bugs.

I've seen this answer, which suggests doing the following (modified for my use case):

import json
from jsonschema import Draft4Validator

with open('./my-schema.json') as schemaf:
    schema = json.loads('\n'.join(schemaf.readlines()))
    Draft4Validator.check_schema(my_schema)
    print("OK!") # on invalid schema we don't get here

but the above script doesn't detect either of the errors in the schema file. I would have suspected it to detect at least the extra colon in the "type:" property.

Am I using the library incorrectly? How do I write a validation script that detects this error?

petezurich
  • 9,280
  • 9
  • 43
  • 57
Tomas Aschan
  • 58,548
  • 56
  • 243
  • 402
  • 1
    Shouldn't it be possible to describe JSONschema in JSONschema? I wonder if they have done that somewhere in the spec. Couldn't find it while poking around, but it wouldn't surprise me if that exists... – Tomalak Jan 13 '20 at 18:31
  • FYI, `schema = json.load(schemaf)` is a more succinct way to load the file. Even `schema = json.loads(schemaf.read())` would be better than reading line-by-line just to join them up again. `.readlines()` also keeps the newlines in each line, so joining with `\n` creates double-spaced results. – Mark Tolonen Jan 13 '20 at 18:33
  • 1
    From https://json-schema.org/understanding-json-schema/about.html#about *"However, since a JSON Schema can’t contain arbitrary code, there are certain constraints on the relationships between data elements that can’t be expressed. Any “validation tool” for a sufficiently complex data format, therefore, will likely have two phases of validation: one at the schema (or structural) level, and one at the semantic level. The latter check will likely need to be implemented using a more general-purpose programming language."* - So, yes and no. Although errors like `"type:"` could be caught. – Tomalak Jan 13 '20 at 18:33

1 Answers1

2

You say the schema is invalid, but that isn't the case with the example you've provided.

Unknown keywords are ignored. This is to allow for extensions to be created. If unknown keywords were prevented, we wouldn't have the ecosystem of extensions that various people and groups have created, like form generation.

You say that the value in required is a "Reference to a non-existent property". The required keyword has no link to the properties keyword.

required determins which keys an object must have.

properties determines how a subschema should be applied to values in an object.

There's no need for values in required to also be included in properties. In fact it's common that they do not when building complex modular schemas.

In terms of validating if a schema is valid, you can use the JSON Schema meta schema.

In terms of checking for additional things that you consider non desireable, that's down to you, given the examples you've provided are valid.

Some libraries may provide a sanity check, but such is unlikely to pick up on the examples you've provided, as they aren't errors.

Relequestual
  • 11,631
  • 6
  • 47
  • 83