2

I've been struggling a lot trying to validate a json schema against a meta-schema (check if the json actually follows the JSON Schema Standard). I tried to follow the documentation link , link. And I'm basing this on the oficial JSON Schema specification

My use case is this: I'm developing an endpoint that can receive a json with a schema in it. That schema will latter be used to validade some entities, but I would also like to validade the schema it self.

I tried all these, but they all return the same result.. valid.. So it seems to me that they don't validate anything..

private void ValidateSchema(string schemaString)
{
    var element = JsonNode.Parse(schemaString);
    var metaSchema = Json.Schema.MetaSchemas.Metadata202012;
    var options = new ValidationOptions
    {
        OutputFormat = OutputFormat.Detailed,
        ValidateMetaSchema = false // tried also with true
    };
    var results = metaSchema.Validate(element, options);
}
private void ValidateSchema(string schemaString)
{
    var element = JsonNode.Parse(schemaString);
    var metaSchema = Json.Schema.MetaSchemas.Draft202012; 
    var options = new ValidationOptions
    {
        OutputFormat = OutputFormat.Detailed,
        ValidateMetaSchema = false // tried also with true
    };
    var results = metaSchema.Validate(element, options);
}

And these were the inputs I tried. I expected that some would return invalid.

@"{""f"":""a""}"
@"{}"
@"{""required"": [""prop1"", ""prop2"", ""prop3"", ""prop4"", ""prop5"", ""prop6""]}"
@"{
""$schema"": ""http://json-schema.org/draft-07/schema#"",
""type"": ""object"",
""required"": [""prop1"", ""prop2"", ""prop3"", ""prop4"", ""prop5"", ""prop6""]
}"
Lombas
  • 1,000
  • 1
  • 8
  • 24

3 Answers3

3

Your examples are not covered by validation against the meta schemas. The meta schemas use an open model and there is also no semantic checking. You would need a JSON schema linter like the one coming with JSONBuddy (https://www.json-buddy.com), also available at json-schema-linter.com for quick testing.

Clemens
  • 1,744
  • 11
  • 20
  • That is weird. I updated my question providing some links where they clearly talk about that kind of validation. Maybe I need to keep reading the JSON Schema Standard, to understand what that validation is about... what it actually validates. I wasn't expecting to validate the meaning (semantic) as you said but the syntax of the schema. – Lombas Feb 17 '23 at 14:50
  • But why should your examples be not valid against the meta schema? The validator does not check if required properties are also defined. In addition, JSON Schema is an open model. Unknown keywords are allowed but have no effect. – Clemens Feb 17 '23 at 15:00
  • Ok, I get now what you mean. So maybe, what I would like to be able to do is set a validation option, that throws an error if an unknown keyword is present. That would be useful on my case. If that is not possible, then I'll leave at that... – Lombas Feb 17 '23 at 16:04
  • You might want to do that, but the JSON Schema specification says to ignore unknown keywords so it is actually not compliant behavior. – Clemens Feb 17 '23 at 16:15
3

As far as the code is concerned, you're doing it right, and you don't need the ValidateMetaSchema option. That option validates the schema against its metaschema while you're validating other JSON data against the schema.

For example, if I have deserialized your last example

{
  "$schema": ""http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": [
    "prop1",
    "prop2",
    "prop3",
    "prop4",
    "prop5", 
    "prop6"
  ]
}

into JsonSchema and I'm using that to validate some other JSON data, then that option will add a secondary validation of the schema against the draft 7 metaschema. If the schema is invalid for some reason (for example, in draft 7 "$defs": 42 would be ignored, but draft 2020-12 it's invalid), then the $schema keyword would raise an error that would be included in the output.

In your case, you'd be directly validating the draft 7 metaschema against its metaschema (which is just itself). But we already know that the draft 7 metaschema is valid against itself, so this is just an extra check that's unnecessary. Go ahead and leave that option off.


In a comment, you asked if there was a way to raise an error if unknown keywords were present. There is no such option.

However, what you can do is check the schema.Keywords property for any that are of type UnrecognizedKeyword. If there are any, then you have extra data in the schema.

Be mindful that schemas can nest, though, so you'll need to check each level.

{
  "allOf": [
    { "unrecognized": "keyword" }
  ]
}

Here, you'll need to go find the AllOfKeyword and check its subschemas for UnrecognizedKeywords.


Aside from that, I'm going to expand on @Clemens' answer a bit to explain why your examples are coming back as valid.

{
  "f": "a"
}

When validated against a metaschema, this JSON is going to produce the same validation result as your second example {} because (as @Clemens mentioned) JSON Schema ignores unknown keywords. Since f isn't a recognized keyword, it's ignored by validation. (There's an annotation collected for f the output, though.)

Because this has no validation keywords in it, it will validate all JSON instances. Technically this is a valid schema, although it doesn't do much.

{
  "required": [
    "prop1",
    "prop2",
    "prop3",
    "prop4",
    "prop5",
    "prop6"
  ]
}

Here, you're requiring certain properties to be present if the JSON instance is an object. But if the instance is not an object, required has no effect. You'll probably want to constrain values more with "type": "object".

{
  "$schema": ""http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": [
    "prop1",
    "prop2",
    "prop3",
    "prop4",
    "prop5", 
    "prop6"
  ]
}

Here you have all the pieces in place, and what you probably expect can work. The JSON is still a valid draft 7 schema (it's also valid for draft 2020-12).

In order for the schema to be invalid, you'd have to put a valid in for a defined keyword that it doesn't support, like giving maximum a string value. In this case, the schema will fail validation.

That said, if you were to try to deserialize invalid schema JSON into a JsonSchema model, the serializer will throw an exception because there's some validation that occurs during deserialization.

I think your approach of validating the schema JSON against a meta-schema is better than letting the serializer throw an exception, but you'll want to be sure you validate against the metaschema that's represented in the $schema keyword. (So don't validate a draft 2020-12 schema against the draft 7 metaschema.)

gregsdennis
  • 7,218
  • 3
  • 38
  • 71
2

I'm posting here the code I used after @gregsdennis's answer, so anyone can use it:

private static void ValidateSchema(string schema)
{
    JsonSchema jsonSchema;
    try
    {
        jsonSchema = JsonSerializer.Deserialize<JsonSchema>(schema)!;
    }
    catch (Exception e)
    {
        throw new ArgumentException($"Submitted schema is invalid. Could not deserialize. Error: {e.Message}");
    }

    if (jsonSchema.Keywords == null)
    {
        throw new ArgumentException($"Submitted schema is invalid. No Keywords found.");
    }
    var unrecognizedKeywords = jsonSchema.Keywords!.Where(k => k is UnrecognizedKeyword).ToArray();
    if (unrecognizedKeywords.Any())
    {
        var data = string.Join(", ", unrecognizedKeywords.Select(SerializeForLogging));
        throw new ArgumentException($"Submitted schema is invalid. Unrecognized Keywords: {data}");
    }
}
Lombas
  • 1,000
  • 1
  • 8
  • 24
  • `jsonSchema.Keywords == null` is a valid case. Boolean schemas deserialize without a keywords collection. `true` is analogous to `{}`, and `false` is analogous to `{"not": {}}`. – gregsdennis Mar 07 '23 at 21:34
  • 1
    It may not be valid for _your_ use case, but it's generally valid. – gregsdennis Mar 07 '23 at 21:34