Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

For installation details see this

595 questions
0
votes
1 answer

Struct of Arrays in flatbuffer?

Let's say I have the following flatbuffer IDL file: table Monster { mana:short = 150; inventory:[ubyte]; // Vector of scalars. } And that I want to serialize an array of 2 Monster objects in a buffer. Apparently it is possible to create the…
lezebulon
  • 7,607
  • 11
  • 42
  • 73
0
votes
1 answer

Construct an Arrow DoubleArray from double pointer

I have a two-dimensional double array pointer. I can cast it to u_int8_t, fetch it to mutable_data() of Arrow Pool Buffer, and construct an Arrow DoubleArray. However, when I get value from Value(), raw_values() of the array, I cannot get correct…
mrz1603
  • 3
  • 2
-1
votes
1 answer

How to convert a buffer of data into a arrow::Table without intermediate file creation in C++?

I have a pipeline of processes that do different stuff. One of the pipes reads a file and de-compresses it into a buffer. The buffer in question contains an arrow Table. There is a component that takes this buffer and returns a table with the…
mohabouje
  • 3,867
  • 2
  • 14
  • 28
-1
votes
1 answer

Create DataFrame from Object HuggingFace

I recently download a dataset from HuggingFace HuggingFace. I've used datasets.Dataset.load_dataset() and it gives me a Dataset backed by an Apache Arrow table. So I have problems to export the data into a DataFrame to work with pandas. The…
-1
votes
2 answers

How to convert list into schema in r?

The code is as below: schema = schema(`Key`=int64(), Sex = string(), `Age` = int64(), `Date of Birth` = date32(), `Institution` = string(), `Admission Date` =…
doraemon
  • 439
  • 3
  • 10
-1
votes
1 answer

Spark dataframe creation through already distributed in-memory data sets

I am new to the Spark community. Please ignore if this question doesn't make sense. My PySpark Dataframe is just taking a fraction of time (in ms) in 'Sorting', but moving data is much expensive (> 14 sec). Explanation: I have a huge Arrow…
-2
votes
1 answer

What's the best practice for swap apache arrow data between different processes?

I have a data api which could get stream data use rust as an independent service process, and plan to write several client process to read the data, because the client process have some function based on apache arrow datatype. I think this might be…
Hakase
  • 211
  • 1
  • 12
-2
votes
1 answer

Difference Between apache-arrow-flight and apache-kafka (accessing large datasets over a network)

as far as i know, both platform supports big data ingestion(streaming). What are the advantages and disadvantages of each platform?
sailfish009
  • 2,561
  • 1
  • 24
  • 31
-2
votes
3 answers

apache arrow - reading csv file

all I'm working with apache arrow now. When reading csv file with arrow::csv::TableReader::Read function, I want to read this file as a file with no header. But, it reads csv file and treat first row as csv header(data field). Is there any options…
-4
votes
1 answer

How to create arrow array of dates using ArrayFromJSON

Basically, I want to create array of date32 type using nice ArrayFromJSON function which is super handy for writing unit tests. I've tried: auto dateArray = arrow::ArrayFromJSON(arrow::date32(), R"(["2017-11-01"])"); But this doesn't work at least…
Kirill Lykov
  • 1,293
  • 2
  • 22
  • 39
1 2 3
39
40