Questions tagged [apache-arrow]

Apache Arrow™ enables execution engines to take advantage of the latest SIM D (Single input multiple data) operations included in modern processors, for native vectorized optimization of analytical data processing.

Arrow memory format supports zero-copy reads for lightning-fast data access without serialization overhead.
Columnar layout of data also allows for a better use of CPU caches by placing all data relevant to a column operation in as compact of a format as possible.
Arrow acts as a new high-performance interface between various systems. It is also focused on supporting a wide variety of industry-standard programming languages. Java, C, C++, Python are underway and more languages are expected soon.

For installation details see this

595 questions

votes

1 answer

Converting Arbitrary Objects into Bytes in Python3

My goal is to feed an object that supports the buffer protocol into hashlib's sha2 generator such that sha2 hashes generated from the same underlying data in different execution environments are consistent, and so can be used for equality tests. I…

asked Dec 01 '20 at 18:40

Alex Flanagan

votes

2 answers

Is there a way to deal with embedded nuls while reading in parquet files?

I have data scraped from the internet (hence varied encodings) and stored as parquet files. While processing it in R I use the arrow library. For the following code…

r string parquet nul apache-arrow

asked Oct 16 '20 at 05:28

Akash21795

votes

1 answer

Write Parquet MAP datatype by PyArrow

I'm writing in Python and would like to use PyArrow to generate Parquet files. Per my understanding and the Implementation Status, the C++ (Python) library already implemented the MAP type. From the Data Types, I can also find the type…

python pyarrow apache-arrow

asked Oct 06 '20 at 01:01

Yucan

votes

1 answer

Reading schema & metadata from a parquet file

I am reading a third-party parquet file using parquetjs-lite const parquet = require("parquetjs-lite"); : reader = await parquet.ParquetReader.openFile(fileName); cursor = reader.getCursor() : I am able to read the records (and rowCount) but how…

node.js parquet apache-arrow

asked Sep 28 '20 at 07:36

user14013917

votes

0 answers

Error: "as_tibble not exported by namespace arrow" with Apache Arrow on Databricks using R

I am working with R on (Azure) Databricks and wanted to enable Apache Arrow for I/O. However, using below sample code, I'm getting some weird errow that I cannot trace back. The error is occurring on clusters using Databricks runtime ML7.0 (Spark…

r apache-spark databricks azure-databricks apache-arrow

asked Aug 05 '20 at 07:34

K.O.T.

votes

0 answers

Read Parquet file in to array of C++ structs

Originally I was writing and reading C++ struct data to file as binary, using reinterpet_cast. This was good because no code changes were required when a new member was added. The cast handled it automatically. I'm now writing to a Parquet file…

c++ parquet apache-arrow

asked Jun 22 '20 at 23:05

user997112

29,025
43
182
361

votes

0 answers

Updating Parquet datasets where the schema changes overtime

I have a single parquet file that I have been incrementally been building every day for several months. The file size is around 1.1GB now and when read into memory it approaches my PCs memory limit. So, I would like to split it up into several…

pandas parquet pyarrow apache-arrow

asked Jun 11 '20 at 22:39

matthewmturner

votes

1 answer

Install Apache Arrow Java in Eclipse

I'm currently trying to install Apache Arrow for Java in Eclipse and having some troubles. I've found the Java Packages on https://search.maven.org/search?q=g:org.apache.arrow%20AND%20v:0.17.1 Because I didn't find any information about the…

java windows eclipse jar apache-arrow

asked May 21 '20 at 16:12

G.M

votes

0 answers

Using apache-arrow in a browser application - Typescript compiler errors

Attempting to use apache-arrow within a browser application, but typescript compiler throws the following errors in some of arrow's .d.ts files import { Table } from "../node_modules/@apache-arrow/es2015-esm/Arrow"; export class SomeClass…

typescript apache-arrow

asked Apr 10 '20 at 02:26

shyamals

votes

1 answer

Pyarrow table memory compared to raw csv size

I have a 2GB CSV file that I read into a pyarrow table with the following: from pyarrow import csv tbl = csv.read_csv(path) When I call tbl.nbytes I get 3.4GB. I was surprised at how much larger the csv was in arrow memory than as a csv. Maybe…

pyarrow apache-arrow

asked Mar 28 '20 at 14:57

matthewmturner

votes

1 answer

TypeError: field Customer: Can not merge type and

SL No: Customer Month Amount 1 A1 12-Jan-04 495414.75 2 A1 3-Jan-04 245899.02 3 A1 15-Jan-04 259490.06 My Df is above Code import findspark findspark.init('/home/mak/spark-3.0.0-preview2-bin-hadoop2.7') import pyspark from…

pandas apache-spark pyspark apache-arrow

asked Feb 02 '20 at 01:46

user6882757

votes

2 answers

Add a subproject by CMake

Apache Arrow submodule is stored at thirdparty/apache_arrow/cpp, so my main CMakeLists.txt looks like cmake_minimum_required(VERSION 3.0.0) project(arrow_parcer VERSION 0.1.0) add_subdirectory(src) add_subdirectory(thirdparty/apache_arrow/cpp) At…

c++ cmake apache-arrow

asked Jan 25 '20 at 23:43

Alexander Zolkin

votes

2 answers

Is there Spark Arrow Streaming = Arrow Streaming + Spark Structured Streaming?

Currently we have spark structured streaming In arrow doc, I found arrow streaming, where we can create a stream in Python, produce the data, and use StreamReader to consume the stream in Java/Scala I am wondering if there is integration of these…

apache-spark spark-structured-streaming pyarrow apache-arrow

asked Nov 23 '19 at 07:24

Litchy

votes

0 answers

Pyarrow table create column from existing columns

Is there a way to use append_column to create a column based on columns that currently exist in a pyarrow table? I want to create a pa.struct() field using columns that already exist. Looking for something along the lines of the following: pa_table…

python-3.x struct pyarrow apache-arrow

asked Nov 14 '19 at 22:58

R.Z.

votes

1 answer

How to convert PyArrow table to Arrow table when interfacing between PyArrow in python and Arrow in C++

I have a C++ library which is built against the Apache Arrow C++ libraries, with a binding to python using Pybind. I'd like to be able to write a function in C++ to take a table constructed with PyArrow, like: void test(arrow::Table test); Passing…

python c++ pybind11 pyarrow apache-arrow

asked Sep 10 '19 at 03:41

Tim P

Prev 1 2 3

…

39 40 Next