13

At the beginning of my program, I read data from a file:

let file = std::fs::File::open("data/games.json").unwrap();
let data: Games = serde_json::from_reader(file).unwrap();

I would like to know how it would be possible to do this at compile time for the following reasons:

  1. Performance: no need to deserialize at runtime
  2. Portability: the program can be run on any machine without the need to have the json file containing the data with it.

I might also be useful to mention that, the data can be read only which means the solution can store it as static.

Nils André
  • 571
  • 5
  • 17
  • 1- I dont think deserialization performance will be an issue (for huge data set faster serialization methode can be used like bincode) 2 - For portability use [`include_bytes`](https://doc.rust-lang.org/std/macro.include_bytes.html), then you can deserialize the required bytes (avoid using json this will incrasse binary size for nothing) – Asya Corbeau Oct 12 '19 at 23:47
  • @AsyaCorbeau is there any reason to use [`include_bytes!`](https://doc.rust-lang.org/std/macro.include_bytes.html) over [`include_str!`](https://doc.rust-lang.org/std/macro.include_str.html)? – Nils André Oct 13 '19 at 11:16
  • @NilsAndré none, as JSON implies valid utf8, which is the only requirement separating an array of bytes from a string in rust. – Sébastien Renauld Oct 13 '19 at 16:44
  • True but as i said with binary encoding the executable size will be decreased (without any compression overhead), JSON is useless when its not readed by humans/there is no encoding limitations and there is an overhead : deserializing JSON require a lot more work than an optimized binary encoding, optionaly this can avoid all JSON limitations at the cost of interoperability (that you can recover by using JSON as assets and transcoding theme at compile time) – Asya Corbeau Oct 15 '19 at 18:59

2 Answers2

7

This is straightforward, but leads to some potential issues. First, we need to deal with something: do we want to load the tree of objects from a file, or parse that at runtime?

99% of the time, parsing on boot into a static ref is enough for people, so I'm going to give you that solution; I will point you to the "other" version at the end, but that requires a lot more work and is domain-specific.

The macro (because it has to be a macro) you are looking for to be able to include a file at compile-time is in the standard library: std::include_str!. As the name suggests, it takes your file at compile-time and generates a &'static str from it for you to use. You are then free to do whatever you like with it (such as parsing it).

From there, it is a simple matter to then use lazy_static! to generate a static ref to our JSON Value (or whatever it may be that you decide to go for) for every part of the program to use. In your case, for instance, it could look like this:

const GAME_JSON: &str = include_str!("my/file.json");

#[derive(Serialize, Deserialize, Debug)]
struct Game {
    name: String,
}

lazy_static! {
    static ref GAMES: Vec<Game> = serde_json::from_str(&GAME_JSON).unwrap();
}

You need to be aware of two things when doing this:

  1. This will massively bloat your file size, as the &str isn't compressed in any way. Consider gzip
  2. You'll need to worry about the usual concerns around multiple, threaded access to the same static ref, but since it isn't mutable you only really need to worry about a portion of it

The other way requires dynamically generating your objects at compile-time using a procedural macro. As stated, I wouldn't recommend it unless you really have a really expensive startup cost when parsing that JSON; most people will not, and the last time I had this was when dealing with deeply-nested multi-GB JSON files.

The crates you want to look out for are proc_macro2 and syn for the code generation; the rest is very similar to how you would write a normal method.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Sébastien Renauld
  • 19,203
  • 2
  • 46
  • 66
  • For compile time generation of objects, would I use `serde` or have to do my own parser? – Nils André Oct 13 '19 at 17:15
  • @NilsAndré The option is yours on that one. The last time I had to do this, I had the advantage of being able to transform the structure into something that both took less space and killed an intermediary parsing step, and I used [protocol buffers](https://github.com/stepancheg/rust-protobuf) (combined with serde) along with a pre-processing step as a proc macro. I would **not** recommend this approach unless you absolutely have to. – Sébastien Renauld Oct 13 '19 at 18:25
  • @NilsAndré to reiterate: what you "gain" in not having a JSON parsing step on bootup of your application, you lose *massively* by having a defined-by-you parsing step, because the data will get parsed no matter what, by something, for you to be able to use it. The gains are small and the headache is large. – Sébastien Renauld Oct 13 '19 at 18:26
  • A build script would accomplish the same as your procedural macro and be simpler. – Shepmaster Nov 18 '19 at 19:53
4

When you are deserializing something at runtime, you're essentially building some representation in program memory from another representation on disk. But at compile-time, there's no notion of "program memory" yet - where will this data deserialize too?

However, what you're trying to achieve is, in fact, possible. The main idea is like following: to create something in program memory, you must write some code which will create the data. What if you're able to generate the code automatically, based on the serialized data? That's what uneval crate does (disclaimer: I'm the author, so you're encouraged to look through the source to see if you can do better).

To use this approach, you'll have to create build.rs with approximately the following content:

// somehow include the Games struct with its Serialize and Deserialize implementations
fn main() {
    let games: Games = serde_json::from_str(include_str!("data/games.json")).unwrap();
    uneval::to_out_dir(games, "games.rs");
}

And in you initialization code you'll have the following:

let data: Games = include!(concat!(env!("OUT_DIR"), "/games.rs"));

Note, however, that this might be fairly hard to do in ergonomic way, since the necessary struct definitions now must be shared between the build.rs and the crate itself, as I mentioned in the comment. It might be a little easier if you split your crate in two, keeping struct definitions (and only them) in one crate, and the logic which uses them - in another one. There's some other ways - with include! trickery, or by using the fact that the build script is an ordinary Rust binary and can include other modules as well, - but this will complicate things even more.

Cerberus
  • 8,879
  • 1
  • 25
  • 40