Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

For installation details see this

595 questions
0
votes
1 answer

Join operation on attributes from arrow structs

Assume I have a json file named 'my_data.json' as below. {"a": [1, 2], "b": {"c": true, "d": "1991-02-03"}} {"a": [3, 4, 5], "b": {"c": false, "d": "2019-04-01"}} If I need to do a join operation based on attribute d, can I do it directly from…
0
votes
1 answer

return a Pandas Series inside of pandas_udf spark

on Apache Spark I have a pandas_udf function that should return a pd.Series How can this be archived? I tried: @pandas_udf(ArrayType(LongType()), PandasUDFType.SCALAR_ITER) # Only works with spark 3.0 def udf(iterator): ... return…
Jorge Machado
  • 752
  • 1
  • 8
  • 28
0
votes
1 answer

How to install libparquet-dev on Docker so I can use R's {arrow}?

I am basing my docker image on https://hub.docker.com/r/rocker/tidyverse/dockerfile So I tried to add the following line to the docker file to try and install libparquet-dev which is required to use Arrow from R. RUN apt-get update -qq && apt-get…
xiaodai
  • 14,889
  • 18
  • 76
  • 140
0
votes
0 answers

Coredump for the offical example of reading csv by apache arrow?

I am trying to write a example for reading csv by apache-arrow in c++ according to the offical one,https://arrow.apache.org/docs/cpp/csv.html#, but it meets Segmentation fault at status = reader->Read(&table); Can anyone help? thank you~ environment…
0
votes
1 answer

Reading Arrow Feather files in C++

I've scoured the Arrow docs, but haven't found much clarity on how to read Feather files generated via pyarrow back into C++. import pyarrow.feather as feather feather.write_feather(df, 'test_file.feather') Is this not a recommended flow? It looks…
winter
  • 441
  • 1
  • 4
  • 5
0
votes
1 answer

R: arrow with installing arrow on Ubuntu 18.04

I tried to install {arrow} using install.packages("arrow") but I am getting the following error In file included from array.cpp:18:0: ./arrow_types.h:198:10: fatal error: parquet/arrow/reader.h: No such file or directory I am using R 3.6.1
xiaodai
  • 14,889
  • 18
  • 76
  • 140
0
votes
1 answer

Apache Arrow table from iostream or memory buffer

I have some code that retrieves a parquet file from AWS S3 using the AWS API. The result is a std iostream: std::basic_iostream< char, std::char_traits> From this I want to create an Apache Arrow Table without saving the iostream to a…
user1978816
  • 812
  • 1
  • 8
  • 19
0
votes
1 answer

Spark Arrow, toPandas() and wide transformation

What does toPandas() actually do when using arrows optimization? Is the resulting pandas dataframe safe for wide transformations (that requires data shuffling) on the pandas dataframe eg..merge operations? what about group and aggregate? What kind…
Alwyn
  • 8,079
  • 12
  • 59
  • 107
0
votes
1 answer

Row based access using ParquetSharp library in C# which is based on apache-parquet-cpp (Arrow)

Does anyone know how row based read access to a parquet file using ParquetSharp is performed? This is where I have got to but the inputStream throws an cannot convert to string error. using (var buffer = new ResizableBuffer()) { using (var…
azuric
  • 2,679
  • 7
  • 29
  • 44
0
votes
1 answer

Arrow build fails in windows

I am trying to build Apache Arrow on Windows offline. As per instruction in the website, I have downloaded all the dependencies and set the environment variables: SET ARROW_BOOST_URL=%ARROW_DEPENDENCY_ROOT%boost-1.67.0.tar.gz SET…
0
votes
1 answer

Converting Python seqence to arrow Array via C++ API

I'm attempting to investigate how Arrow converts a python list into an equivalent arrow::Array using the C++ API below. #include #include #include #include #include #include…
clery00
  • 251
  • 2
  • 14
0
votes
0 answers

Configuring incomplete, errors occurred

I get "Configuring incomplete, errors occurred!" while running cmake to build Apache arrow. (Running on Ubuntu 16.04.6 LTS) I am using cmake version 3.5.2 with the following flags: cmake ../arrow/cpp/ -DARROW_PARQUET=ON…
newme
  • 1
  • 1
  • 4
0
votes
1 answer

Can Apache arrow support infinite level nested struct?

In this Apache arrow documentation page https://arrow.apache.org/docs/format/Metadata.html It seems to support it. Would some post some code to show infinite level nested struct please? Thanks.
0
votes
1 answer

Performing transformations on Arrow table

What kind of transformations can you apply to an Arrow table? Is its main use (for now) an interchange format for languages?
marz
  • 831
  • 1
  • 7
  • 12
0
votes
1 answer

Access Gadiva filter result by index in Apache Arrow

Maybe I'm missing something obvious, but for the life of me, I can't figure out how I can access the elements of an array after a Gandiva filter operation. I have linked a minimal example which I compile like this: $ /usr/lib64/ccache/g++ -g -Wall…
suvayu
  • 4,271
  • 2
  • 29
  • 35
1 2 3
39
40