5

What is the best way to pass data using the Apache Arrow format from Node.js to Rust? Storing the data in each language is easy enough, but its the sharing memory that is giving me challenges.

I'm using Napi-rs to generate the node.js API bindings.

I'm getting a "Failed to create reference from Buffer" for the JavaScript code below. When I try to pass arrowVector.data[0].buffers to the rust function I get "../src/node_buffer.cc:245:char *node::Buffer::Data(Local<v8::Value>): Assertion `val->IsArrayBufferView()' failed."

I think I'm missing something core here.

Here is my sample Node test code:

import { makeVector } from 'apache-arrow';
import {testFn} from './index.js';

// Create arrow Vec
const LENGTH = 2000;
const rainAmounts = Float32Array.from(
    { length: LENGTH },
    () => Number((Math.random() * 20).toFixed(1)));

const arrowVector = makeVector(rainAmounts);

// how to get buffers from vec? and send to rust function

testFn(arrowVector);

Here is my sample Rust code:

use napi::bindgen_prelude::Buffer;

#[napi]
pub fn test_fn(buffers: Buffer) {
    println!("test_fn called");
}
cafce25
  • 15,907
  • 4
  • 25
  • 31
lostAstronaut
  • 1,331
  • 5
  • 20
  • 34

1 Answers1

5

NAPI-RS does mention:

Zero copy data interactive between Rust & Node.js via Buffer and TypedArray

But... what you are passing directly to the Rust function is not a buffer.
It is an Arrow vector.

testFn(arrowVector);

That is not compatible with the Rust side:

#[napi]
pub fn test_fn(buffers: Buffer) {
    println!("test_fn called");
}

Hence, the Failed to create reference from Buffer error message.

In your Node.js code, get the buffers for the Arrow vector and pass them directly to Rust.

import { makeVector } from 'apache-arrow';
import {testFn} from './index.js';

// Create arrow Vec
const LENGTH = 2000;
const rainAmounts = Float32Array.from(
    { length: LENGTH },
    () => Number((Math.random() * 20).toFixed(1))
);

const arrowVector = makeVector(rainAmounts);

// Get buffers from the Arrow vector and send to Rust
testFn(arrowVector.data.buffers[0].buffer);

In Rust, the Napi-rs buffer should be able to directly take a buffer from Node.js.

use napi::{CallContext, Env, JsObject, Result, Task};
use napi_derive::napi;

#[napi]
fn test_fn(ctx: CallContext) -> Result<JsObject> {
    let buffer: Buffer<u8> = ctx.get::<Buffer<u8>>(0)?;
    
    // Here, buffer.data() gives you a slice to the actual data

    println!("test_fn called with buffer of length {}", buffer.length());

    ctx.env.get_undefined()
}

However, keep in mind that Arrow vectors often contain multiple buffers. Depending on the data types involved, you might need to pass multiple buffers from Node.js to Rust and interpret them correctly in Rust.
See apache/arrow/js/src/data.ts#buffers()

public get buffers() {
    return [this.valueOffsets, this.values, this.nullBitmap, this.typeIds] as Buffers<T>;
}

This approach allows you to avoid copying the data, but the memory is still managed by Node.js, so you need to be careful about lifetimes. If Node.js garbage collects the original Arrow vector, the buffer's data you passed to Rust might be deallocated. To avoid this, ensure that the Arrow vector in Node.js remains in scope and alive as long as the Rust code might access its data.
See "how long does variables stay in memory in Node.js" for more.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250