Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.
Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

For installation details see this

595 questions

votes

0 answers

The error of read more columns than the original data in r::arrow

The original dataset should only contain 28 columns but arrow returns 29 columns. The code is as below: schema1 = arrow::schema( Key =int64(), Sex = string(), Age = int64(), …

r tidyverse apache-arrow

asked Mar 29 '23 at 07:38

doraemon

votes

0 answers

Apache arrow C++ Parquet, how to read and decode min and max values statistics

I'm writing a program using Apache Arrow C++ library to extract metadata from a parquet file, and I've been having a lot of trouble finding documentation and examples. After some try and error I managed to do the job using this…

c++ parquet apache-arrow

asked Mar 07 '23 at 22:39

Alberto Pires

votes

1 answer

Identify partitioning variable in parquet file

Is there an easy way of identifying the variable that was used to partition a parquet dataset? As an example, below I create a toy parquet using the mtcars dataset. # Load library library(arrow) # Write data to parquet mtcars |>…

r parquet apache-arrow

asked Jan 10 '23 at 21:00

Dan

11,370
4
43
68

votes

1 answer

How to prevent arrow from pulling data into R when a binding is not found for a given function?

I wonder if there is a way to prevent arrow from pulling data into R by default when it cannot find a suitable binding. So that instead of getting the following warning message pulling data into R, arrow will throw an error instead. Is there an…

r apache-arrow

asked Nov 08 '22 at 10:45

andreranza

votes

2 answers

R + Arrow 10 : convert blank to numeric NA

Please have a look at the reprex at the end of the post. I need to read a column as a string, perform several manipulations and then save convert it to a numerical column. The blanks ("") in the string column give me a headache because arrow does…

r na apache-arrow

asked Nov 07 '22 at 14:37

larry77

1,309
14
29

votes

0 answers

JS apache-arrow tableFromIPC's supported compression method/level?

I'm using file systems (local, Google Cloud Storage, and maybe S3) to exchange data between the web front end (JS) and back end (Python). After writing Arrow IPC data format to file systems using the Python back end like below: with…

pyarrow apache-arrow

asked Nov 04 '22 at 20:51

Yan Yang

1,804
2
15
37

votes

0 answers

R+Arrow 10.0: Bug when Using gsub?

A bit of a follow up question of R+arrow: Error when using the dataset API Please have a look at the reprex at the end of the post. Essentially, I work on a data file without loading it into memory and I want to replace "" with "0" in a string. In…

r linux apache-arrow

asked Oct 31 '22 at 13:31

larry77

1,309
14
29

votes

1 answer

R, How to refer to variable name as string, in function that uses arrow:open_dataset internally

Trying to create a function that will compute the average of some variable, whose name is provided in the function. For instance: mean_of_var <- function(var){ open_dataset('myfile') %>% summarise(meanB=mean(get(var) ,na.rm = T), …

r get apache-arrow open-dataset

asked Oct 25 '22 at 04:38

LucasMation

2,408
2
22
45

votes

1 answer

Can we read a parquet file and partition file in java arrow similar to pyarrow?

I have been trying to implement below pyarrow code in java but could not find anything. can you please suggest is it even possible to implement below code in java arrow or is there any alternative library to achieve this table1 =…

java apache-spark pyspark pyarrow apache-arrow

asked Oct 13 '22 at 16:55

ganga ramana

votes

0 answers

How to read csv with \" within quoted string with read_csv_arrow

I have a large csv file that I'd like to read with arrow::read_csv_arrow(). However, the file contains quoted strings. readr::read_delim() is able to read the file (given correct settings), while arrow::read_csv_arrow() is…

r readr apache-arrow

asked Oct 13 '22 at 14:15

Thomas K

3,242
15
29

votes

1 answer

How do I use generics in Apache Arrow?

Say I have a function called boop. It has different behaviour depending on the class of its argument, so I use generics, like so: library(dplyr) df <- data.frame(a = c("these", "are", "some", "strings"), b = 1:4) boop <-…

r dplyr apache-arrow

asked Oct 04 '22 at 16:12

Dan

11,370
4
43
68

votes

1 answer

Summarise before collecting in arrow using strings for column names

Say I want to summarise a column in an arrow table prior to collecting (because the actual dataset is larger than memory). I could do something like this: arrow_table(mtcars) %>% summarise(mean(mpg)) %>% collect() # A tibble: 1 × 1 # …

r dplyr apache-arrow

asked Sep 29 '22 at 14:26

Dan

11,370
4
43
68

votes

0 answers

How can I use R Arrow and AWS S3 in a shiny app deployed on EC2 with shinyproxy

I have been testing out the apache-arrow R package to fetch data from S3 (parquet files) for some shiny apps and have had some success. However, while everything works as expected during local development, after deploying to shinyproxy on an EC2…

r amazon-web-services amazon-s3 apache-arrow shinyproxy

asked Sep 08 '22 at 20:11

Devin

votes

0 answers

Set as `NA` when Arrow's schema cannot parse values of a CSV in R

I am trying to read a csv (~ 18,000,000 rows, ~ 1000 columns) into arrow (in R) with open_dataset pre-specifying a schema. There are some instances in which the csv was generated incorrectly and some values don't match the intended schema (say some…

r csv parsing apache-arrow

asked Aug 31 '22 at 22:48

Rodrigo Zepeda

1,935
2
15
25

votes

2 answers

How to get columns data from golang apache-arrow？

I am using apache-arrow/go to read parquet data. I can parse the data to table by using apach-arrow. reader, err := ipc.NewReader(buf, ipc.WithAllocator(alloc)) if err != nil { log.Println(err.Error()) return nil } …

go apache-arrow

asked Aug 25 '22 at 10:39

Pccc

Prev 1 2 3

…

39 40 Next