3

Taking the example below from the NPM documentation (https://www.npmjs.com/package/parquets), how would I write the resulting parquet file directly to minio. I want to avoid writing the parquet file to disk and then a second operation to move the file into minio.

The example below writes the file to disk as soon as close() is called.

// advanced fruits table
let schema = new ParquetSchema({
  name: { type: 'UTF8' },
  colours: { type: 'UTF8', repeated: true },
  stock: {
    repeated: true,
    fields: {
      price: { type: 'DOUBLE' },
      quantity: { type: 'INT64' },
    }
  }
});

// the above schema allows us to store the following rows:
let writer = await ParquetWriter.openFile(schema, 'fruits.parquet');

await writer.appendRow({
  name: 'banana',
  colours: ['yellow'],
  stock: [
    { price: 2.45, quantity: 16 },
    { price: 2.60, quantity: 420 }
  ]
});

await writer.appendRow({
  name: 'apple',
  colours: ['red', 'green'],
  stock: [
    { price: 1.20, quantity: 42 },
    { price: 1.30, quantity: 230 }
  ]
});

await writer.close();
PrestonDocks
  • 4,851
  • 9
  • 47
  • 82
  • You are attempting to create parquet format in memory and then write it to s3? Using duckdb there is option to write directly to s3 compatible object stores without creating files on the disk- https://stackoverflow.com/a/74207838/6563567 . Using parquets library you can stream the file to memory but then writing it to minio might not be possible. – ns15 Oct 26 '22 at 12:40

1 Answers1

1

Use /tmp to write your parquet files and then use minio's sdk to upload that file.

To make sure it scales properly, just include a random string or UUID to the filename where u store it in tmp.

Unfortunately i couldn't find any stream option on the library so unless someone knows how to return a Stream object from this library, you cannot do exactly what you are looking for.

  • Sorry it took so long to get back to you and not sure if the SDK has been updated yet, but without the streaming option it limits the size of files you can work with. – PrestonDocks Nov 03 '22 at 07:54