Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.
Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

For installation details see this

595 questions

votes

1 answer

How can I parse timestamp with time zone?

What I am trying to do I am using Py Arrow to parse data from a csv (originally from a Postgres database). I am having issues parsing a timestamp (with a timezone) that looks like 2017-08-19 14:22:11.802755+00. I am then receiving an error that…

python pyarrow apache-arrow

asked Jul 15 '21 at 14:29

alt-f4

2,112
17
49

votes

0 answers

Access individual elements of ChunkedArray by its index within column

What is the best method to randomly access individual elements ("Scalars") of arrow::ChunkedArray e.g. for testing and display purposes? Is there some equivalent method to Array::GetScalar which takes into account that the ChunkedArray consists of…

c++ apache-arrow

asked Jul 07 '21 at 13:52

VolkerM

votes

0 answers

R and RStudio crashes running read_parquet() on Mac M1

As the title states, both R and RStudio crash with a 'fatal error' when I try to run read_parquet('abc.parquet') For reference, read_parquet() is a function from the arrow() library Using: Macbook Pro M1 2020 Macbook Pro M1 2020 R version 4.1.0 (I…

r arm apache-arrow

asked Jun 23 '21 at 19:10

gmarais

1,801
4
16
32

votes

1 answer

What is a common use case for Apache arrow in a data pipeline built in Spark

What is the purpose of Apache Arrow? It converts from one binary format to another, but why do i need that? If I have a spark program,then spark can read parquet,so why do i need to convert it into another format,midway through my processing? Is it…

apache-spark parquet pyarrow apache-arrow

asked May 11 '21 at 20:28

Victor

16,609
71
229
409

votes

0 answers

Apache Arrow Flight: Multiple calls to FlightServer

I've been following this tutorial on how to set up and use Apache Arrow Flight. From the example, server.py: import pyarrow as pa import pyarrow.flight as fl def create_table_int(): data = [ pa.array([1, 2, 3]), pa.array([4, 5,…

python apache-arrow

asked May 08 '21 at 00:25

ajp619

votes

1 answer

call StructArray.from_arrays specifying a missing value mask

I'm trying to create a pyarrow.StructArray with missing values. I works fine when I use pyarrow.array passing tuples representing my records: >>> pyarrow.array( [ None, (1, "foo"), ], type=pyarrow.struct( …

python pyarrow apache-arrow

asked May 06 '21 at 11:21

0x26res

11,925
11
54
108

votes

1 answer

How do you tell the Apache Arrow Format Version for a given Library Version?

Apache Arrow in their documentation list that each release has two versions, a Library Version and a Format Version: https://arrow.apache.org/docs/format/Versioning.html It appears that over the last year there have been 4 Library Versions, but it's…

apache-arrow

asked Apr 28 '21 at 22:14

aschreiber1

votes

1 answer

Error occurs when debugging rust program with vscode (windows only)

I am trying to debug the code below with vscode, but an error occurs. Development environment Microsoft Windows 10 Home 10.0.19042 Build 19042 rustc 1.49.0 (e1884a8e3 2020-12-29) Vscode 1.54.3 CodeLLDB v1.6.1 //…

visual-studio-code rust vscode-debugger apache-arrow

asked Mar 23 '21 at 13:56

yiun seungryong

votes

1 answer

How to correctly read an Apache Arrow Feather file produced by pyarrow?

I have been unsuccessful to read an Apache Arrow Feather with javascript produced by a python script javascript library of Arrow.. I am using pyarrow and arrow/js from the Apache Arrow project. I created a simple python script to create the Feather…

javascript python apache-arrow feather

asked Mar 11 '21 at 15:26

ToniR

votes

1 answer

How to read column names and metadata from feather files in R arrow?

The (now-superseded) stand-alone feather library for R had a function called feather_metadata() that allowed to read column names and types from feather files on disk, without opening them. This was useful to select only specific columns when…

r apache-arrow feather

asked Mar 08 '21 at 11:32

MatteoS

votes

1 answer

Comparison of protobuf and arrow

Both are language-neutral and platform-neutral data exchange libraries. I wonder what are the difference of them and which library is good for which situations.

protocol-buffers apache-arrow data-exchange

asked Mar 07 '21 at 20:40

Benjamin Du

1,391
1
17
25

votes

0 answers

Issue with writing Parquet Files via Arrow Package in R

Just wondering if there's a difference in the read/write parquet function from the arrow package in R when running in Windows vs Linux OS? Example code(insert anything in dataframe): mydata = data.frame(...) write_parquet(mydata,…

r parquet apache-arrow

asked Jan 20 '21 at 18:53

user7922176

votes

0 answers

Efficient way to calculate area of a 2D polygon in Pyspark for N rows in a group-by

I have a dataframe in pyspark (I get it from reading in a partition with around 1.6 million rows, but often I read in multiple partitions). For each week of data, there are ~200,000 different timestamps and for each timestamp there are up to 8…

python apache-spark pyspark pandas-groupby apache-arrow

asked Nov 11 '20 at 00:03

salessteals

votes

0 answers

pyarrow convert string to dict array in table without going to pandas

I have a daily process where I read in a historical parquet dataset and then concatenate that with a new file each day. I'm trying to optimize memory by making better use of arrows dictionary arrays. I want to avoid doing round trip to pandas…

python pyarrow apache-arrow

asked May 20 '20 at 19:10

matthewmturner

votes

1 answer

Building Apache Arrow inside existing C++ Executable project CMAKE

I'm working on a C++ CMake project that uses Apache Arrow as a dependency. My goal is to be able to include and use arrow/api.h. However, I couldn't find any documentation or tutorial that explains what I can do to achieve that so my first thought…

c++ cmake apache-arrow

asked Jan 15 '20 at 22:10

eyadMhanna

2,412
3
31
49

Prev 1 2 3

…

39 40 Next