0

I want to deserialize json values in parallel using rayon. A valid json from the serde-json example fails when trying to deserialize inside par_iter, despite being parsed correctly without parallelization. This is the code:

use rayon::prelude::*; // 1.7.0
use serde_json::{Result, Value};

fn main() -> Result<()> {
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;
    let v: Value = serde_json::from_str(data)?;
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    let mut batch = Vec::<String>::new();
    batch.push(data.to_string());
    batch.push(data.to_string());
    
    let _values = batch.par_iter()
        .for_each(|json: &String| {
            serde_json::from_str(json.as_str()).unwrap()
        });
        
    Ok(())
}

and this is the error

thread 'thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: map, expected unit", line: 2, column: 8)', src/main.rs:23:49

Link to the Playground.

IIRC, I've seen other par_iter examples that use unwrap inside. Is this not recommended? In my case, I want to do it because I need the program to panic if an invalid input comes in.

ZJaume
  • 69
  • 5
  • 4
    The `.for_each` method doesn't yield any `_values` so the closure is expected to return nothing (aka `()`). So due to type inference, `serde_json::from_str` is attempting to deserialize your JSON into the unit type, which isn't going to work. – kmdreko Jul 18 '23 at 16:12
  • 1
    @kmdreko: Indeed, changing the inner code to `let v: Value = serde_json::from_str(json.as_str()).unwrap();` just works. – rodrigo Jul 18 '23 at 16:29
  • About the panic: panicking in production code is discouraged except for bugs. Examples often use it, though, for convenience (some say this is unfortunate). In small scripts this is also fine. And `par_iter()` is no exception to this rule ([`try_for_each()`](https://docs.rs/rayon/latest/rayon/iter/trait.ParallelIterator.html#method.try_for_each) exists). – Chayim Friedman Jul 18 '23 at 18:29

1 Answers1

1

serde_json::from_str determines its output type automatically from the type of variable it gets written into. In your case, however, for_each doesn't expect a return value, so from_str attempt to deserialize it into a ().

Use map().collect() together with a : Vec<Value> annotation to make this work:

use rayon::prelude::*; // 1.7.0
use serde_json::{Result, Value};

fn main() -> Result<()> {
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;
    let v: Value = serde_json::from_str(data)?;
    println!("Please call {} at the number {}", v["name"], v["phones"][0]);

    let mut batch = Vec::<String>::new();
    batch.push(data.to_string());
    batch.push(data.to_string());

    let values: Vec<Value> = batch
        .par_iter()
        .map(|json: &String| serde_json::from_str(json.as_str()).unwrap())
        .collect();

    println!("Values:\n{:#?}", values);

    Ok(())
}
Please call "John Doe" at the number "+44 1234567"
Values:
[
    Object {
        "age": Number(43),
        "name": String("John Doe"),
        "phones": Array [
            String("+44 1234567"),
            String("+44 2345678"),
        ],
    },
    Object {
        "age": Number(43),
        "name": String("John Doe"),
        "phones": Array [
            String("+44 1234567"),
            String("+44 2345678"),
        ],
    },
]

Although honestly, it's a little weird to use serde::Value; usually people deserialize directly into a struct:

use rayon::prelude::*;
use serde::{Deserialize, Serialize};
use serde_json::Result;

#[derive(Debug, Serialize, Deserialize)]
struct Entry {
    name: String,
    age: u32,
    phones: Vec<String>,
}

fn main() -> Result<()> {
    let data = r#"
        {
            "name": "John Doe",
            "age": 43,
            "phones": [
                "+44 1234567",
                "+44 2345678"
            ]
        }"#;
    let v: Entry = serde_json::from_str(data)?;
    println!("Please call {} at the number {}", v.name, v.phones[0]);

    let mut batch = Vec::<String>::new();
    batch.push(data.to_string());
    batch.push(data.to_string());

    let values: Vec<Entry> = batch
        .par_iter()
        .map(|json: &String| serde_json::from_str(json.as_str()).unwrap())
        .collect();

    println!("Values:\n{:#?}", values);

    Ok(())
}
Please call John Doe at the number +44 1234567
Values:
[
    Entry {
        name: "John Doe",
        age: 43,
        phones: [
            "+44 1234567",
            "+44 2345678",
        ],
    },
    Entry {
        name: "John Doe",
        age: 43,
        phones: [
            "+44 1234567",
            "+44 2345678",
        ],
    },
]
Finomnis
  • 18,094
  • 1
  • 20
  • 27