Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.
Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

For installation details see this

595 questions

vote

0 answers

unable to cast refs returned from `ChunkedArray::chunks` to concrete arrow type

I'm trying to extract the raw buffers from ChunkedArray. The arrow2 documentation suggests doing this by casting a &dyn arrow2::array::Array as its concrete type, see here. This seems to work fine when I create an arrow buffer directly, however,…

apache-arrow rust-polars

asked Oct 04 '22 at 01:17

ExpandingMan

vote

2 answers

Trying to use arrow-dataset java library but got missing arrow_dataset_jni.dll error

I followed the maven instructions to include the arrow-dataset in pom.xml However, when running the code, it complained arrow-dataset-jni.dll not found How to create or install dll ? Thank you J

apache-arrow

asked Oct 03 '22 at 15:44

Jac

vote

2 answers

PyArrow issue with timestamp data

I am trying to load data from a csv into a parquet file using pyarrow. I am using the convert options to set the data types to their proper type and then using the timestamp_parsers option to dictate how the timestamp data should be interpreted:…

pyarrow strptime apache-arrow

asked Sep 20 '22 at 00:17

not_a_comp_scientist

vote

2 answers

Selecting deep columns in pyarrow.dataset parquet

Let's say I have a deeply nested arrow table like: pyarrow.Table arr: struct not null, b: list not null> not null> child 0, arr: struct not null, b:…

python parquet pyarrow apache-arrow

asked Sep 09 '22 at 17:31

mdurant

27,272
5
45
74

vote

2 answers

Java Apache Arrow Copy data from one VectorSchemaRoot to another

If I have a VectorSchemaRoot that already contains data using the the Java Apache Arrow library, how would I go about copying that data to another VectorSchemaRoot?

java apache-arrow

asked Sep 07 '22 at 19:10

jjbskir

8,474
9
40
53

vote

0 answers

can't import pyarrow on macOS because symbol __Py_FatalErrorFunc is not found in lib.cpython-37m-darwin.so

I'm on MacOS Monterey 12.5, m1 chip Using Python 3.7.13 in a virtualenv created as follows: pyenv install 3.7.13 pyenv virtualenv 3.7.13 qtrainer pyenv activate qtrainer OpenSSL version is 1.1.1q apache-arrow version is 9.0.0 my .zshrc file…

python pyarrow apache-arrow

asked Aug 22 '22 at 16:30

Borbag

vote

0 answers

reading multiple parquet files in java takes unresonable amount of memory

Reading 20 uncompressed parquet files with total size 3.2GB, takes more then 12GB in RAM, when reading them "concurrently". "concurrently" means that I need to read the second file before closing the first file, not multithreading. The data is time…

java pyarrow apache-arrow

asked Aug 22 '22 at 16:04

driedplum

vote

2 answers

How to avoid getting a memory leak while copying a VectorSchemaRoot

I need to copy all of the contents of a stream of VectorSchemaRoots into a single object: Stream data = fetchStream(); VectorSchemaRoot finalResult = VectorSchemaRoot.create(schema, allocator); VectorLoader = new…

java apache-arrow

asked Aug 22 '22 at 12:17

Pablo

1,302
1
16
35

vote

0 answers

memory use for reading the same .csv file using baseR::read.csv(), readr::read_csv(), data.table::fread(), and arrow::read_csv_arrow() in R

I tried to read the same .csv file using different functions in R (base::read.csv(), readr::read_csv(), data.table::fread(), and arrow::read_csv_arrow()), but this same file leads to very different sizes in memory. See an example…

r data.table ram readr apache-arrow

asked Jul 31 '22 at 01:59

Miao Cai

vote

1 answer

What is the difference between StringType and LargeStringType in Apache Arrow?

According to documentation: class arrow::StringType : public arrow::BinaryType #include Concrete type class for variable-size string data, utf8-encoded. class arrow::LargeStringType : public arrow::LargeBinaryType #include…

apache-arrow apache-arrow-cpp

asked Jul 22 '22 at 02:26

NekoApocalypse

vote

1 answer

list not supported in join non-key field?

I am trying to join 2 Arrow tables where some columns are of list data type. Note that my join columns/keys are primitive data types and some my non-join columns/keys are of list. But, PyArrow join() cannot join such as table, although…

pyarrow apache-arrow

asked Jul 21 '22 at 18:33

Jayjeet Chakraborty

vote

0 answers

Authenticating R Arrow With Temporary AWS Credentials in a Profile?

I am trying to use the arrow R package to read a parquet file from s3. The documentation only describes how to specifying AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY when authenticating for access to private s3 bucket. However, I have to generate…

r amazon-web-services amazon-s3 apache-arrow

asked Jul 12 '22 at 20:13

Ramón J Romero y Vigil

17,373
7
77
125

vote

1 answer

How to query pyarrow table stuct field

I have a table, let's say 2 columns A (list), B (list) and 2 rows: A: ["X", "Y"], ["Y", "Z"] B: [1, 3], [5, 6] I'd like to achieve something like SELECT * FROM table WHERE A.Y = 5 and it'd return a single (second) row. How do I achieve this using…

pyarrow apache-arrow

asked Jul 06 '22 at 06:02

alippai

vote

0 answers

how to vectorize arrow::compute::Take?

I have an array of large size input_array and an array of offsets take_array. I want to return the elements with those offsets very fast. Can I vectorize it for the arrow array? If so, how? arrow::compute::Take(input_array, take_array) Use Case: I…

llvm simd intel-mkl apache-arrow

asked Jun 23 '22 at 19:32

cpchung

vote

1 answer

Is there an established means of using AzureStor and arrow together in R?

In the arrow R guide there's info about using S3 buckets but nothing about using Azure cloud storage. There's an unrelated package AzureStor which connects to Azure Storage but uses different syntax so they don't (seemingly) work together. Is there…

r azure-blob-storage apache-arrow

asked Jun 08 '22 at 00:05

Dean MacGregor

11,847
9
34
72

Prev 1 2 3

…

39 40 Next