5

I'm trying to get access to an Iterator over the contents of a file uploaded via an input field.

I can pass the JS file into Wasm just fine via web-sys, but I cannot for the life of me figure out how to access anything other then length and name of the passed file in Rust.

I think I could pass the whole file into Wasm as a ByteArray and iterate over that, but preferably I would like to iterate straight over the file contents without copying since the files itself will be quite large (~1 GB).

I found in the Mozilla JS docs that I should be able to access the underlying file blob, get a ReadableStream from that via the .stream() method and get a Reader from that which should be able to be iterated over. But in web-sys, the .getReader() method of the ReadableStream returns a simple JSValue which I can't do anything useful with.

Am I missing something here or is this functionality simply missing in web-sys or is there some other way to do this? Maybe create the Iterator in JS and pass that to Rust?

Matthias Braun
  • 32,039
  • 22
  • 142
  • 171
Erik Schulze
  • 119
  • 9
  • Have you tried casting the `JsValue` into a usable type using `.dyn_into::().unwrap()`? If you have any examples of code you tried, we can start from that. And maybe link the reference you mentioned... – Todd Jun 12 '21 at 07:55
  • Or something like `let reader: Reader = rstream.getReader().try_into().unwrap();` – Todd Jun 12 '21 at 08:06
  • There is no `Reader` in web-sys. – frankenapps Jun 13 '21 at 09:18
  • There's a [bug report](https://github.com/rustwasm/wasm-bindgen/issues/1727) about this for wasm-bindgen with some pointers in it. – Matthias Braun Jan 22 '23 at 18:52

4 Answers4

3

I managed to get a working example using read_as_binary_string.

Here's the code

lib.rs

use js_sys::JsString;
use std::cell::RefCell;
use std::rc::Rc;
use wasm_bindgen::prelude::*;
use wasm_bindgen::JsCast;
use web_sys::{console, Event, FileReader, HtmlInputElement};

#[wasm_bindgen(start)]
pub fn main_wasm() {
    let my_file: Rc<RefCell<Vec<u8>>> = Rc::new(RefCell::new(Vec::new()));
    set_file_reader(&my_file);
}

fn set_file_reader(file: &Rc<RefCell<Vec<u8>>>) {
    let filereader = FileReader::new().unwrap().dyn_into::<FileReader>().unwrap();
    let my_file = Rc::clone(&file);

    let onload = Closure::wrap(Box::new(move |event: Event| {
        let element = event.target().unwrap().dyn_into::<FileReader>().unwrap();
        let data = element.result().unwrap();
        let file_string: JsString = data.dyn_into::<JsString>().unwrap();
        let file_vec: Vec<u8> = file_string.iter().map(|x| x as u8).collect();
        *my_file.borrow_mut() = file_vec;
        console::log_1(&format!("file loaded: {:?}", file_string).into());
    }) as Box<dyn FnMut(_)>);

    filereader.set_onloadend(Some(onload.as_ref().unchecked_ref()));
    onload.forget();

    let fileinput: HtmlInputElement = web_sys::window()
        .unwrap()
        .document()
        .expect("should have a document.")
        .create_element("input")
        .unwrap()
        .dyn_into::<HtmlInputElement>()
        .unwrap();

    fileinput.set_id("file-upload");
    fileinput.set_type("file");

    web_sys::window()
        .unwrap()
        .document()
        .unwrap()
        .body()
        .expect("document should have a body")
        .append_child(&fileinput)
        .unwrap();

    let callback = Closure::wrap(Box::new(move |event: Event| {
        let element = event
            .target()
            .unwrap()
            .dyn_into::<HtmlInputElement>()
            .unwrap();
        let filelist = element.files().unwrap();

        let _file = filelist.get(0).expect("should have a file handle.");
        filereader.read_as_binary_string(&_file).unwrap();
    }) as Box<dyn FnMut(_)>);

    fileinput
        .add_event_listener_with_callback("change", callback.as_ref().unchecked_ref())
        .unwrap();
    callback.forget();
}

index.html

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8" />
  </head>
  <body>
    <noscript
      >This page contains webassembly and javascript content, please enable
      javascript in your browser.</noscript
    >
    <script src="./stack.js"></script>
    <script>
      wasm_bindgen("./stack_bg.wasm");
    </script>
  </body>
</html>

and the Cargo.toml

[package]
name = "stack"
version = "0.1.0"
authors = [""]
edition = "2018"


[lib]
crate-type = ["cdylib", "rlib"]

[dependencies]
js-sys = "0.3.55"

wee_alloc = { version = "0.4.2", optional = true }


[dependencies.web-sys]
version = "0.3.4"
features = [
  'Document',
  'Window',
  'console',
  'Event',
  'FileReader',
  'File',
  'FileList',
  'HtmlInputElement']

[dev-dependencies]
wasm-bindgen-test = "0.2"

[dependencies.wasm-bindgen]
version = "0.2.70"

[profile.release]
# Tell `rustc` to optimize for small code size.
opt-level = "s"
debug = false


You can check the example working here: http://rustwasmfileinput.glitch.me/

Armand
  • 31
  • 3
2

Your best bet would be to use the wasm_streams crate which bridges the Web stream APIs like ReadableStream you're getting from the .stream() method to Rust async stream APIs.

The official example uses Fetch API as a source, but this part will be relevant for your File usecase as well: https://github.com/MattiasBuelens/wasm-streams/blob/f6dacf58a8826dc67923ab4a3bae87635690ca64/examples/fetch_as_stream.rs#L25-L33

let body = ReadableStream::from_raw(raw_body.dyn_into().unwrap_throw());

// Convert the JS ReadableStream to a Rust stream
let mut stream = body.into_stream();

// Consume the stream, logging each individual chunk
while let Some(Ok(chunk)) = stream.next().await {
    console::log_1(&chunk);
}
RReverser
  • 1,940
  • 14
  • 15
1

I think you can do something similar using FileReader.

Here is an example, where I log the text content of a file:

use wasm_bindgen::prelude::*;
use web_sys::{Event, FileReader, HtmlInputElement};

use wasm_bindgen::JsCast;

#[wasm_bindgen]
extern "C" {
    #[wasm_bindgen(js_namespace = console)]
    fn log(s: &str);
}

#[wasm_bindgen(start)]
pub fn main() -> Result<(), JsValue> {
    let window = web_sys::window().expect("no global `window` exists");
    let document = window.document().expect("should have a document on window");
    let body = document.body().expect("document should have a body");

    let filereader = FileReader::new().unwrap().dyn_into::<FileReader>()?;

    let closure = Closure::wrap(Box::new(move |event: Event| {
        let element = event.target().unwrap().dyn_into::<FileReader>().unwrap();
        let data = element.result().unwrap();
        let js_data = js_sys::Uint8Array::from(data);
        let rust_str: String = js_data.to_string().into();
        log(rust_str.as_str());
    }) as Box<dyn FnMut(_)>);
 
    filereader.set_onloadend(Some(closure.as_ref().unchecked_ref()));
    closure.forget();

    let fileinput: HtmlInputElement = document.create_element("input").unwrap().dyn_into::<HtmlInputElement>()?;
    fileinput.set_type("file");

    let closure = Closure::wrap(Box::new(move |event: Event| {
        let element = event.target().unwrap().dyn_into::<HtmlInputElement>().unwrap();
        let filelist = element.files().unwrap();

        let file = filelist.get(0).unwrap();

        filereader.read_as_text(&file).unwrap();
        //log(filelist.length().to_string().as_str());
    }) as Box<dyn FnMut(_)>);
    fileinput.add_event_listener_with_callback("change", closure.as_ref().unchecked_ref())?;
    closure.forget();

    body.append_child(&fileinput)?;

    Ok(())
}

And the HTML:

<html>
  <head>
    <meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
  </head>
  <body>
    <script type="module">
      import init from './pkg/without_a_bundler.js';

      async function run() {
        await init();
      }

      run();
    </script>
  </body>
</html>

and Cargo.toml

[package]
name = "without-a-bundler"
version = "0.1.0"
authors = [""]
edition = "2018"

[lib]
crate-type = ["cdylib"]

[dependencies]
js-sys = "0.3.51"
wasm-bindgen = "0.2.74"

[dependencies.web-sys]
version = "0.3.4"
features = [
  'Blob',
  'BlobEvent',
  'Document',
  'Element',
  'Event',
  'File',
  'FileList',
  'FileReader',
  'HtmlElement',
  'HtmlInputElement',
  'Node',
  'ReadableStream',
  'Window',
]

However I have no idea how to use get_reader() of ReadableStream, because according to the linked documentation, it should return either a ReadableStreamDefaultReader or a ReadableStreamBYOBReader. While the latter is experimental and I think it is therefore understandable, that it is not present in web-sys, I do not know why ReadableStreamDefaultReader is also not present.

frankenapps
  • 5,800
  • 6
  • 28
  • 69
0

You should use ReadableStreamDefaultReader::new().

let stream: ReadableStream = response.body().unwrap();
let reader = ReadableStreamDefaultReader::new(&stream)?;

Then you can use ReadableStreamDefaultReader.read() the same way as in JS.

You also will need struct for deserialization:

#[derive(serde::Serialize, serde::Deserialize)]
struct ReadableStreamDefaultReadResult<T> {
    pub value: T,
    pub done: bool,
}

Here is example of usage:

loop {
    let reader_promise = JsFuture::from(reader.read());
    let result = reader_promise.await?;

    let result: ReadableStreamDefaultReadResult<Option<Vec<u8>>> =
        serde_wasm_bindgen::from_value(result).unwrap();

    if result.done {
        break;
    }

    // here you can read chunk of bytes from `result.value`
}