Context
I implemented a Rust library for JSON Schema validation which operates with serde_json::Value
instances. Now I want to use it from Python and considering PyO3
as my primary choice for connecting them. Python values should be converted to serde_json::Value
when passed to the library and serde_json::Value
should be converted back to Python inside validation errors returned by the Rust part.
One of the possible ways will be implementing serde::se::Serialize
for a newtype wrapper around pyo3::types::PyAny
and then passing it to serde_json::to_value
, but I am not sure how efficient it will be. What are the options and what trade-offs are there?
From Python side I mostly interested in built-in types, that are serializable by json.dumps
, without custom classes at the moment.
Rust-side example:
use jsonschema::{JSONSchema, Draft, CompilationError};
use serde_json::Value;
pub fn is_valid(schema: &Value, instance: &Value) -> bool {
let compiled = JSONSchema::compile(schema, None).expect("Invalid schema");
compiled.is_valid(instance)
}
I.e. there is a function that accepts two references to serde_json::Value
and I want to expose it to Python. From the Python side there might be two use-cases:
- The instance is a JSON-encoded string:
import jsonschema_rs
assert jsonschema_rs.is_valid(
{"minItems": 2},
"[1, 2]"
)
- The instance is a Python structure (not a JSON-encoded string):
import jsonschema_rs
assert jsonschema_rs.is_valid(
{"minItems": 2},
[1, 2]
)
Possible use-cases
Web app request/response structure validation.
- When a request going in, then its body is validated as is, without parsing according to the schema.
- When a response is returned, this structure is validated according to the schema before serializing to JSON;
In the future, both steps might be combined with Rust-powered JSON deserialization (on the request side) and deserialization (on the response side).
- Using for property-based testing as an extension for Hypothesis
In this case, the faster the input is validation, the more test cases are generated. Current implementation uses Python library under the hood which is quite slow for complex schemas that I am working with usually.
Update
I tried to implement Serialize
trait here and added a comparison with raw string input as suggested by @Sven Marnach in comments. Indeed, the raw string is the fastest option, but if it involves calling json.dumps
in Python it goes significantly worse than the variant with the trait.
Small objects & schema (100000 iterations):
String : 1.31617
Trait : 1.52797 (x1.16)
String + dumps: 2.77378 (x2.1)
Big objects & schema (100 iterations):
String : 1.42146
Trait : 3.70745 (x2.6)
String + dumps: 6.21213 (x4.37)
Benchmark code and test data.
Having a version for strings definitely makes sense, but calling json.dumps
is quite expensive. I don't know if there are any better options for such scenarios.
Python version: 3.7
Rust version: 1.42.0
Dependencies:
- serde_json = "1.0.48"
- serde = "1.0.105"
- jsonschema = "0.2.0"