-2

How do you efficiently save data to disk in Rust?

I have a vec of ~100m structs in rust. I'd like to write them to a file that isn't too large and then read them again. I'd like to do this as fast as possible.

I'm overwhelmed by the plethora of libraries speedy, alkahest, rkyv, postcard, none of which are clear how to use with zstd. Of course, I'm sure all are great, but can someone give an example using any of these libraries?

Even chatGPT can't figure it out!

Test
  • 962
  • 9
  • 26
  • What is "not too large" for you? 100 million is a lot of data. If each struct is 8 bytes and there's no overhead from the `Vec` at all, that's *already* 800MB without compression. And if your structs are larger or need to store metadata with them, then it's going to be larger than that. I might also ask where this data is coming from. Do you actually have a nearly gigabyte-large `Vec` stored in memory that needs to be dumped to disk, or is this being streamed from somewhere? – Silvio Mayolo May 09 '23 at 03:11
  • It is on disk. I am trying to read something in from disk and iterate over it and write something else to disk. It will fit in memory. If you'd prefer to stream it, you can. Just literally any way to save things on disk. – Test May 09 '23 at 03:18
  • This should not be that hard in Rust. There are a million speedy serialization formats in go that you can easily iterate over and have a clear API. I'm not writing a bad question for asking, "How do you save a list in Rust"? – Test May 09 '23 at 03:19
  • How about `chunks`, if you need to write or read data in chunks, you can use File::write_all and File::read_exact methods with a fixed-size buffer. This can be more efficient than reading or writing the data all at once and you might want to use `serde` to serialize your structs into a format that can be written to a file – metaphor May 09 '23 at 03:22
  • Sure, but how do you actually do it? Rust is awash in libraries but short on how to actually use them efficiently. – Test May 09 '23 at 03:23
  • I believe you can do it by yourself, using your skill and make it work then share your solution to the forum to receive advice from an expert developer. – metaphor May 09 '23 at 03:29
  • No, I can't—I've been trying literally for 16 hours. This is not easy for a beginner. These libaries do *not* make clear how to interop with one another. – Test May 09 '23 at 03:31
  • I'm switching back to go. – Test May 09 '23 at 03:31

1 Answers1

2

Rust has two traits for dealing with I/O: Read and Write. Generally, you'll create a reader and a writer, then use copy to move bytes from one to the other. Or if you are going to/from memory, you'll use whatever function in your situation takes a reader/writer and call that.

For serializing rust data, you need to decide on a serialization format. I'll use JSON because it's easy to read, but you'll probably want something else if you want space efficiency, although compression should make any format mostly the same.

The most important serialization library is serde, which is explained more in this question. Annotate your struct with the derive macros for Serialize and Deserialize.

serde_json has the to_writer function, which takes a serde-compatible type and writes it out in JSON to a writer. You have the data, so now you need a writer.

The zstd crate has an Encoder, which is a writer that wraps another writer. That other writer will just be a File. Then you simply give the writer and the data to to_writer.

// Open the file
let writer = File::create("temp.json.zstd")?;
// Wrap the file in the zstd encoder
let writer = zstd::Encoder::new(writer, 0)?.auto_finish();
// Write the data into the encoder
serde_json::to_writer(writer, &data)?;

Note that you can pass &mut writer if you wanted to keep it afterwards. Also note that writing/reading data unbuffered to/from a file is very slow. Fortunately, zstd buffers both sides by default so you don't need to wrap your file in a BufWriter/BufReader manually.

Then reading the file is much the same, but with the Decoder and from_reader instead.

// Open the file
let reader = File::open("temp.json.zstd")?;
// Wrap the file in the zstd decoder
let reader = zstd::Decoder::new(reader)?;
// Read the data from the decoder
let data: Vec<Data> = serde_json::from_reader(reader)?;
drewtato
  • 6,783
  • 1
  • 12
  • 17