6

I have code that correctly validates an article returned from an endpoint that returns single articles. I'm pretty sure it's working correctly as it gives a validation error when I deliberately don't include a required field in the article.

I also have this code that tries to validate an array of articles returned from an endpoint that returns an array of articles. However, I'm pretty sure that isn't working correctly, as it always says the data is valid, even when I deliberately don't include a required field in the articles.

How do I correctly validate an array of data against the schema?

The full test code is below as a standalone runnable test. Both of the tests should fail, however only one of them does.

<?php

declare(strict_types=1);

error_reporting(E_ALL);

require_once __DIR__ . '/vendor/autoload.php';


// Return the definition of the schema, either as an array
// or a PHP object
function getSchema($asArray = false)
{
    $schemaJson = <<< 'JSON'
{
  "swagger": "2.0",
  "info": {
    "termsOfService": "http://swagger.io/terms/",
    "version": "1.0.0",
    "title": "Example api"
  },
  "paths": {
    "/articles": {
      "get": {
        "tags": [
          "article"
        ],
        "summary": "Find all articles",
        "description": "Returns a list of articles",
        "operationId": "getArticleById",
        "produces": [
          "application/json"
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Article"
              }
            }
          }
        },
        "parameters": [
        ]
      }
    },
    "/articles/{articleId}": {
      "get": {
        "tags": [
          "article"
        ],
        "summary": "Find article by ID",
        "description": "Returns a single article",
        "operationId": "getArticleById",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "articleId",
            "in": "path",
            "description": "ID of article to return",
            "required": true,
            "type": "integer",
            "format": "int64"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "$ref": "#/definitions/Article"
            }
          }
        }
      }
    }
  },
  "definitions": {
    "Article": {
      "type": "object",
      "required": [
        "id",
        "title"
      ],
      "properties": {
        "id": {
          "type": "integer",
          "format": "int64"
        },
        "title": {
          "type": "string",
          "description": "The title for the link of the article"
        }
      }
    }
  },
  "schemes": [
    "http"
  ],
  "host": "example.com",
  "basePath": "/",
  "tags": [],
  "securityDefinitions": {
  },
  "security": [
    {
      "ApiKeyAuth": []
    }
  ]
}
JSON;

    return json_decode($schemaJson, $asArray);
}

// Extract the schema of the 200 response of an api endpoint.
function getSchemaForPath($path)
{
    $swaggerData = getSchema(true);
    if (isset($swaggerData["paths"][$path]['get']["responses"][200]['schema']) !== true) {
        echo "response not defined";
        exit(-1);
    }

    return $swaggerData["paths"][$path]['get']["responses"][200]['schema'];
}

// JsonSchema needs to know about the ID used for the top-level
// schema apparently.
function aliasSchema($prefix, $schemaForPath)
{
    $aliasedSchema = [];

    foreach ($schemaForPath as $key => $value) {
        if ($key === '$ref') {
            $aliasedSchema[$key] = $prefix . $value;
        }
        else if (is_array($value) === true) {
            $aliasedSchema[$key] = aliasSchema($prefix, $value);
        }
        else {
            $aliasedSchema[$key] = $value;
        }
    }
    return $aliasedSchema;
}


// Test the data matches the schema.
function testDataMatches($endpointData, $schemaForPath)
{
    // Setup the top level schema and get a validator from it.
    $schemaStorage = new \JsonSchema\SchemaStorage();
    $id = 'file://example';
    $swaggerClass = getSchema(false);
    $schemaStorage->addSchema($id, $swaggerClass);
    $factory = new \JsonSchema\Constraints\Factory($schemaStorage);
    $jsonValidator = new \JsonSchema\Validator($factory);

    // Alias the schema for the endpoint, so JsonSchema can work with it.
    $schemaForPath = aliasSchema($id, $schemaForPath);

    // Validate the things
    $jsonValidator->check($endpointData, (object)$schemaForPath);

    // Process the result
    if ($jsonValidator->isValid()) {
        echo "The supplied JSON validates against the schema definition: " . \json_encode($schemaForPath) . " \n";
        return;
    }

    $messages = [];
    $messages[] = "End points does not validate. Violations:\n";
    foreach ($jsonValidator->getErrors() as $error) {
        $messages[] = sprintf("[%s] %s\n", $error['property'], $error['message']);
    }

    $messages[] = "Data: " . \json_encode($endpointData, JSON_PRETTY_PRINT);

    echo implode("\n", $messages);
    echo "\n";
}



// We have two data sets to test. A list of articles.

$articleListJson = <<< JSON
[
  {
      "id": 19874
  },
  {
      "id": 19873
  }
]
JSON;
$articleListData = json_decode($articleListJson);


// A single article
$articleJson = <<< JSON
{
  "id": 19874
}
JSON;
$articleData = json_decode($articleJson);


// This passes, when it shouldn't as none of the articles have a title
testDataMatches($articleListData, getSchemaForPath("/articles"));


// This fails correctly, as it is correct for it to fail to validate, as the article doesn't have a title
testDataMatches($articleData, getSchemaForPath("/articles/{articleId}"));

The minimal composer.json is:

{
    "require": {
        "justinrainbow/json-schema": "^5.2"
    }
}
vearutop
  • 3,924
  • 24
  • 41
Danack
  • 24,939
  • 16
  • 90
  • 122

4 Answers4

3

Edit-2: 22nd May

I have been digging further turns out that the issue is because of your top level conversion to object

$jsonValidator->check($endpointData, (object)$schemaForPath);

You shouldn't have just done that and it would have all worked

$jsonValidator->check($endpointData, $schemaForPath);

So it doesn't seem to be a bug it was just a wrong usage. If you just remove (object) and run the code

$ php test.php
End points does not validate. Violations:

[[0].title] The property title is required

[[1].title] The property title is required

Data: [
    {
        "id": 19874
    },
    {
        "id": 19873
    }
]
End points does not validate. Violations:

[title] The property title is required

Data: {
    "id": 19874
}

Edit-1

To fix the original code you would need to update the CollectionConstraints.php

/**
 * Validates the items
 *
 * @param array            $value
 * @param \stdClass        $schema
 * @param JsonPointer|null $path
 * @param string           $i
 */
protected function validateItems(&$value, $schema = null, JsonPointer $path = null, $i = null)
{
    if (is_array($schema->items) && array_key_exists('$ref', $schema->items)) {
        $schema->items = $this->factory->getSchemaStorage()->resolveRefSchema((object)$schema->items);
        var_dump($schema->items);
    };

    if (is_object($schema->items)) {

This will handle your use case for sure but if you don't prefer changing code from the dependency then use my original answer

Original Answer

The library has a bug/limitation that in src/JsonSchema/Constraints/CollectionConstraint.php they don't resolve a $ref variable as such. If I updated your code like below

// Alias the schema for the endpoint, so JsonSchema can work with it.
$schemaForPath = aliasSchema($id, $schemaForPath);

if (array_key_exists('items', $schemaForPath))
{
  $schemaForPath['items'] = $factory->getSchemaStorage()->resolveRefSchema((object)$schemaForPath['items']);
}
// Validate the things
$jsonValidator->check($endpointData, (object)$schemaForPath);

and run it again, I get the exceptions needed

$ php test2.php
End points does not validate. Violations:

[[0].title] The property title is required

[[1].title] The property title is required

Data: [
    {
        "id": 19874
    },
    {
        "id": 19873
    }
]
End points does not validate. Violations:

[title] The property title is required

Data: {
    "id": 19874
}

You either need to fix the CollectionConstraint.php or open an issue with developer of the repo. Or else manually replace your $ref in the whole schema, like had shown above. My code will resolve the issue specific to your schema, but fixing any other schema should not be a big issue

Issue fixed

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • Thanks for the comprehensive answer, it seems correct and I've opened PR for the library......... "You may award your bounty in 21 hours." – Danack May 21 '18 at 13:52
  • @Danack, no worries. Do post the link to the PR in comments here, so it is here for reference – Tarun Lalwani May 21 '18 at 13:55
  • It is a bit premature to fix `justinrainbows/json-schema`, though this library is somewhat obsolete in terms of supporting last spec of JSON schema it is still solid and reliable for draft-04. – vearutop May 21 '18 at 14:35
  • @Danack, bounty assignment should be available now – Tarun Lalwani May 22 '18 at 13:44
  • @Danack, also see the latest update. Turns out that you should not have interfered with type cast of schema. So no PR needed :-) – Tarun Lalwani May 22 '18 at 13:54
3

EDIT: Important thing here is that provided schema document is instance of Swagger Schema, which employs extended subset of JSON Schema to define some cases of request and response. Swagger 2.0 Schema itself can be validated by its JSON Schema, but it can not act as a JSON Schema for API Response structure directly.

In case entity schema is compatible with standard JSON Schema you can perform validation with general purpose validator, but you have to provide all relevant definitions, it can be easy when you have absolute references, but more complicated for local (relative) references that start with #/. IIRC they must be defined in the local schema.


The problem here is that you are trying to use schema references detached from resolution scope. I've added id to make references absolute, therefore not requiring being in scope.

"$ref": "http://example.com/my-schema#/definitions/Article"

The code below works well.

<?php

require_once __DIR__ . '/vendor/autoload.php';

$swaggerSchemaData = json_decode(<<<'JSON'
{
  "id": "http://example.com/my-schema",
  "swagger": "2.0",
  "info": {
    "termsOfService": "http://swagger.io/terms/",
    "version": "1.0.0",
    "title": "Example api"
  },
  "paths": {
    "/articles": {
      "get": {
        "tags": [
          "article"
        ],
        "summary": "Find all articles",
        "description": "Returns a list of articles",
        "operationId": "getArticleById",
        "produces": [
          "application/json"
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "http://example.com/my-schema#/definitions/Article"
              }
            }
          }
        },
        "parameters": [
        ]
      }
    },
    "/articles/{articleId}": {
      "get": {
        "tags": [
          "article"
        ],
        "summary": "Find article by ID",
        "description": "Returns a single article",
        "operationId": "getArticleById",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "articleId",
            "in": "path",
            "description": "ID of article to return",
            "required": true,
            "type": "integer",
            "format": "int64"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "$ref": "http://example.com/my-schema#/definitions/Article"
            }
          }
        }
      }
    }
  },
  "definitions": {
    "Article": {
      "type": "object",
      "required": [
        "id",
        "title"
      ],
      "properties": {
        "id": {
          "type": "integer",
          "format": "int64"
        },
        "title": {
          "type": "string",
          "description": "The title for the link of the article"
        }
      }
    }
  },
  "schemes": [
    "http"
  ],
  "host": "example.com",
  "basePath": "/",
  "tags": [],
  "securityDefinitions": {
  },
  "security": [
    {
      "ApiKeyAuth": []
    }
  ]
}
JSON
);



$schemaStorage = new \JsonSchema\SchemaStorage();
$schemaStorage->addSchema('http://example.com/my-schema', $swaggerSchemaData);
$factory = new \JsonSchema\Constraints\Factory($schemaStorage);
$validator = new \JsonSchema\Validator($factory);

$schemaData = $swaggerSchemaData->paths->{"/articles"}->get->responses->{"200"}->schema;

$data = json_decode('[{"id":1},{"id":2,"title":"Title2"}]');
$validator->validate($data, $schemaData);
var_dump($validator->isValid()); // bool(false)
$data = json_decode('[{"id":1,"title":"Title1"},{"id":2,"title":"Title2"}]');
$validator->validate($data, $schemaData);
var_dump($validator->isValid()); // bool(true)
vearutop
  • 3,924
  • 24
  • 41
  • 1
    "you are trying to use schema references detached from resolution scope" That might be true, but it's irrelevant. The example petstore schema doesn't have absolute references, http://petstore.swagger.io/v2/swagger.json and they shouldn't be necessary. – Danack May 21 '18 at 15:44
  • You can validate swagger schema (e.g. petstore.json) with JSON schema, but you can not directly validate swagger entities with JSON schema. You need to either adapt them, or use a Swagger Response/Request validator. When you try to extract `$swaggerData["paths"][$path]['get']["responses"][200]['schema']` you miss references. Local reference `#/...` has to be defined in local document. – vearutop May 21 '18 at 15:51
0

I'm not sure I fully understand your code here, but I have an idea based on some assumptions.

Assuming $typeForEndPointis the schema you're using for validation, your item key word needs to be an object rather than an array.

The items key word can be an array or an object. If it's an object, that schema is applicable to every item in the array. If it is an array, each item in that array is applicable to the item in the same position as the array being validated.

This means you're only validating the first item in the array.

If "items" is a schema, validation succeeds if all elements in the array successfully validate against that schema.

If "items" is an array of schemas, validation succeeds if each element of the instance validates against the schema at the same position, if any.

https://datatracker.ietf.org/doc/html/draft-handrews-json-schema-validation-01#section-6.4.1

Community
  • 1
  • 1
Relequestual
  • 11,631
  • 6
  • 47
  • 83
  • 2
    "I'm not sure I fully understand your code here" yeah, I get that a lot. I've refactored the code to be standalone and a complete example by itself. – Danack May 21 '18 at 11:02
  • OK, so it looks like my answer just picked up on issues in your make shift example. Taking another look. – Relequestual May 21 '18 at 12:40
  • The schema is valid and works as expected when tested outside of your code. Something else is in play. Digging. – Relequestual May 21 '18 at 12:51
  • tbh, I've got a suspicion that it could just be a bug or something not supported in the JsonSchema library I'm using. – Danack May 21 '18 at 12:58
  • 1
    Yeah I think it must be. They don't use the official test suite. – Relequestual May 21 '18 at 13:04
  • So that I can link to it in the issue I open, please could you link me the official test suite please? – Danack May 21 '18 at 13:05
  • Of course! https://github.com/json-schema-org/JSON-Schema-Test-Suite - We also run a slack if you have any other JSON Schema related questions, found on the official site. – Relequestual May 21 '18 at 13:06
  • Confirmed. I see a number of issues and PRs regarding the use of $ref within specific key words. It should be universal =/ – Relequestual May 21 '18 at 13:14
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171469/discussion-between-relequestual-and-danack). – Relequestual May 21 '18 at 13:18
  • They don't use official test suite, but they are tested by a 3rd party with about 95% PASS result: https://github.com/swaggest/php-json-schema-bench/blob/master/report-draft-04.md – vearutop May 21 '18 at 15:33
  • 1
    Looks like they DO, but don't publish the results =/ https://github.com/justinrainbow/json-schema/blob/master/tests/Drafts/Draft4Test.php – Relequestual May 21 '18 at 15:36
0

jsonValidator don't like mixed of object and array association, You can use either:

$jsonValidator->check($endpointData, $schemaForPath);

or

$jsonValidator->check($endpointData, json_decode(json_encode($schemaForPath)));
  • They have check in the code itself for the same `// make sure $schema is an object if (is_array($schema)) { $schema = self::arrayToObjectRecursive($schema); }`, which does exactly what you pointed out – Tarun Lalwani May 21 '18 at 15:21
  • By casting the variable `$schemaForPath` into an object, this check is not executed anymore (`is_array($schema)` returns false), that's why the $schema is not converted into an object anymore. I point to either don't cast the array into an object and let the library calls the `self::arrayToObjectRecursive` or to convert the entiere array into an object (which is equivalent to the original call `self::arrayToObjectRecursive`) – jderusse May 22 '18 at 12:33