Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.
Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

For installation details see this

595 questions

votes

2 answers

How can I get the row view of data read from parquet file?

Example: Let's say a table name user has id, name, email, phone, and is_active as attributes. And there are 1000s of users part of this table. I would like to read the details per user. void ParquetReaderPlus::read_next_row(long row_group_index,…

asked Sep 24 '21 at 05:08

Shravan40

8,922
6
28
48

votes

1 answer

How to cast pyarrow timestamp dtype to time64 type?

I'm trying to cast pyarrow timestamp type of time64 type. But it's showing cast error. import pyarrow as pa from datetime import datetime dt = datetime.now() table = pa.Table.from_pydict({'ts': pa.array([dt, dt])}) new_schema = table.schema.set(0,…

python pyarrow apache-arrow

asked Aug 13 '21 at 04:38

Avinash Raj

172,303
28
230
274

votes

1 answer

Data compression using Arrow.jl in Julia

I tried to compress data using Arrow.jl. However, the test run using the below code didn’t show any size reduction (or compression). May I seek advice on my implementation, like is there something I am doing wrong? Code: using CSV, DataFrames,…

julia compression apache-arrow lz4

asked Jul 29 '21 at 15:40

Mohammad Saad

1,935
10
28

votes

0 answers

How to build an Apache Arrow message containing a list of structs with arrow-rs?

I'm using the arrow-rs crate (version 4.4) to declared the following schema: Schema::new(vec![ Field::new("name", DataType::Utf8, false), Field::new("attributes", DataType::List( Box::new(Field::new( …

rust apache-arrow

asked Jul 16 '21 at 21:55

lquerel

votes

1 answer

How to initialise a fixed-size ListArray in pyarrow from a numpy array efficiently?

How would I efficiently initialise a fixed-size pyarray.ListArray from a suitably prepared numpy array? The documentation of pyarray.array indicates that a nested iterable input structure works, but in practice that does not work if the outer…

python numpy pyarrow apache-arrow

asked Jun 16 '21 at 21:02

burnpanck

1,955
1
12
36

votes

1 answer

Writing a Vec of Rows to a Parquet file

I know how to read a Parquet file into a Vec. extern crate parquet; use parquet::file::reader::{FileReader, SerializedFileReader}; use std::{fs, sync::Arc}; use parquet::column::writer::ColumnWriter; use parquet::{ file::{ …

rust parquet apache-arrow

asked Jun 09 '21 at 09:04

tsorn

3,365
1
29
48

votes

3 answers

What is the best way of using arrow parquet in more modern cmake?

Below is the solution that worked for me, but not sure if it is the best way to do this. I used brew to install it. vcpkg does not work at the moment, unfortunately. What I don't like about this solution is that I need to set Parquet_DIR and…

c++ cmake apache-arrow

asked Jun 07 '21 at 18:09

Amir

votes

1 answer

Error: Invalid: Unrecognized filesystem type in URI when loading parquet file from url using arrow package

I'm pretty new to parquet file format and I'm using the read_parquet() (in the arrow package) to load parquet file (stored in my Dropbox share folder) into R. However, I received the following error message library(arrow) df <-…

r parquet apache-arrow

asked Apr 23 '21 at 19:16

Chris T.

1,699
7
23
45

votes

0 answers

Write C++ data to Apache Parquet: ParquetFileWriter or Write Arrow Table?

I'm looking for the proper way to write data to a Parquet file in Cpp/C++. It seems like there are two choices: either writing direct to Parquet or writing to Arrow then Parquet. Is writing to Arrow then converting to Parquet with WriteTable…

c++ parquet apache-arrow

asked Apr 12 '21 at 22:01

user2183336

votes

1 answer

apache-arrow does not compile with typescript

I posted this question for the @apache-arrow/ts library as well. I've been able to get this to bundle with webpack, but I've been considering using rollup instead for other issues I'm having with my library. However, that requires me to do a tsc…

typescript apache-arrow

asked Apr 05 '21 at 16:11

westandy

1,360
2
16
41

votes

2 answers

MethodError when trying to get a row from an Arrow Dataframe in Julia

I have a dataset that looks like this: I am taking a CSV file, converting it to Parquet and then sending it to Arrow. There is a reason why I am doing it like this. My goal is to get access to the information in row "Algeria". This is my code: df =…

julia parquet apache-arrow

asked Mar 17 '21 at 21:26

Onur-Andros Ozbek

2,998
2
29
78

votes

0 answers

How can I get "year" "month" "date" from timestamp in pyarrow?

I am trying to extract the "year" "month" "date" from the arrows timestamp[s] type. I know how to do it in pandas, as follows import pyarrow.dataset as ds dataset = ds.dataset(path, format="csv") table = dataset.to_table() ## following codes wont…

csv parquet pyarrow apache-arrow

asked Feb 24 '21 at 09:30

Xion

votes

1 answer

How to change column datatype with pyarrow

I am reading a set of arrow files and am writing them to a parquet file: import pathlib from pyarrow import parquet as pq from pyarrow import feather import pyarrow as pa base_path = pathlib.Path('../mydata') fields = [ pa.field('value',…

parquet pyarrow apache-arrow

asked Feb 16 '21 at 11:43

ARF

7,420
8
45
72

votes

1 answer

Can I add a new column without rewriting an entire file?

I've been experimenting with Apache Arrow. I have used the column oriented memory mapped files for many years. In the past, I've used a separate file for each column. Arrow seems to like to store everything in one file. Is there a way to add a…

apache-arrow

asked Feb 11 '21 at 21:55

Kevin Atteson

votes

1 answer

Apache Arrow Bus Error/Seg Fault when using Python bindings

I am writing data to parquet files. Apache Arrow provides a straightforward example for doing this: parquet-arrow, in which the data flow is essentially: data => arrow::ArrayBuilder => arrow::Array => arrow::Table => parquet file. This works fine as…

python c++ boost-python pybind11 apache-arrow

asked Jan 09 '21 at 07:55

AJ Donich

Prev 1 2 3

…

39 40 Next