Questions tagged [pyarrow]

pyarrow is a Python interface for Apache Arrow

About:

pyarrow provides the Python API of Apache Arrow.

Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.

Resources:

1078 questions

votes

1 answer

"Raise RuntimeError('Not supported on 32-bit Windows')" when installing pyarrow

I get this error whenever I try to install pyarrow on my PC. It is 64bit so I don't understand it: raise RuntimeError('Not supported on 32-bit Windows') RuntimeError: Not supported on 32-bit Windows ---------------------------------------- …

python pip pyarrow

asked Apr 24 '19 at 16:12

WorkDoubts

votes

1 answer

How to use Pandas UDFs on macOS Mojave? (that fails due to [__NSPlaceholderDictionary initialize] may have been in progress...)

I'm trying to use Pandas UDFs (a.k.a. Vectorized UDFs) in Apache Spark 2.4.0 on macOS 10.14.3 (macOS Mojave). I installed pandas and pyarrow using pip (and later pip3). Whenever I execute the sample code from the official documentation of Spark SQL…

apache-spark pyspark apache-spark-sql pyarrow

asked Mar 27 '19 at 14:09

Jacek Laskowski

72,696
27
242
420

votes

2 answers

Read CSV with PyArrow

I have large CSV files that I'd ultimately like to convert to parquet. Pandas won't help because of memory constraints and its difficulty handling NULL values (which are common in my data). I checked the PyArrow docs and there are tools for…

python pyarrow

asked Sep 19 '18 at 19:53

dudemonkey

1,091
5
15
26

votes

2 answers

Python pandas_udf spark error

I started playing around with spark locally and finding this weird issue 1) pip install pyspark==2.3.1 2) pyspark> import pandas as pd from pyspark.sql.functions import pandas_udf, PandasUDFType, udf df = pd.DataFrame({'x':…

pandas apache-spark pyspark pyarrow

asked Aug 06 '18 at 18:33

Shrikar

votes

1 answer

Can I [de]serialize a dictionary of dataframes in the arrow/js implementation?

I want to use Apache Arrow to send data from a Django backend to a Angular frontend. I want to use a dictionary of dataframes/tables as payload in messages. It's posssible with pyarrow to share data in this way between python microservices, but i…

javascript python ipc pyarrow apache-arrow

asked Jul 18 '18 at 19:02

gabomgp

votes

4 answers

How to efficiently read rows from Google BigTable into a pandas DataFrame

Use case: I am using Google BigTable to store counts like this: | rowkey | columnfamily | | | col1 | col2 | col3 | |----------|------|------|------| | row1 | 1 | 2 | 3 | | row2 | 2 | 4 | 8 | | row3 | 3 …

python pandas bigtable pyarrow

asked Feb 16 '18 at 14:07

bartaelterman

votes

1 answer

Are parquet file created with pyarrow vs pyspark compatible?

I have to convert analytics data in JSON to parquet in two steps. For the large amounts of existing data I am writing a PySpark job and doing df.repartition(*partitionby).write.partitionBy(partitionby). …

python aws-lambda parquet amazon-athena pyarrow

asked Jan 18 '18 at 06:11

siberiancrane

votes

1 answer

hdfs.connect() vs HdfsClient in PyArrow

I apologize if this is a noob question, but I couldn't find any relevant reference - what is the difference between these two? If I'd like to read parquet files from hdfs using pyarrow, which one would I use?

hadoop hdfs parquet pyarrow

asked Nov 20 '17 at 21:01

Jay

2,535
3
32
44

votes

3 answers

Connect python-polars to SQL server (no support currently)

How can I directly connect MS SQL Server to polars? The documentation does not list any supported connections but recommends the use of pandas. Update: SQL Server Authentication works per answer, but Windows domain authentication is not working. see…

sql-server sqlalchemy pyarrow python-polars

asked Dec 31 '22 at 04:48

Isaacnfairplay

votes

3 answers

ModuleNotFoundError: No module named 'pyarrow.lib'

This is the full error message. Traceback (most recent call last): File "C:\Users\adi\OneDrive\Desktop\Python310\machine learning project.py", line 3, in import streamlit as st File…

python pyarrow streamlit facebook-prophet

asked May 10 '22 at 20:20

Addy

votes

0 answers

How to convert pyarrow.Table to PySpark Dataframe?

I have a pyarrow.Table object that I want to pass to PySpark (and save as a Spark table). How can I convert pyarrow.Table to pyspark.sql.DataFrame? The only way I can see it to convert it to pandas.DataFrame, but aren't there some more direct and…

python pandas apache-spark pyspark pyarrow

asked Feb 19 '22 at 23:27

Felix

3,351
6
40
68

votes

2 answers

How to sort a Pyarrow table?

How do I sort an Arrow table in PyArrow? There does not appear to be a single function that will do this, the closest is sort_indices.

python pyarrow apache-arrow

asked Jan 28 '22 at 12:06

Contango

76,540
58
260
305

votes

3 answers

How would I go about converting a .csv to an .arrow file without loading it all into memory?

I found a similar question here: Read CSV with PyArrow In this answer it references sys.stdin.buffer and sys.stdout.buffer, but I am not exactly sure how that would be used to write the .arrow file, or name it. I can't seem to find the exact…

python pandas csv pyarrow apache-arrow

asked Oct 18 '21 at 01:32

kasbah512

votes

1 answer

Why does Dask seem to store Parquet inefficiently

When I save the same table using Pandas and Dask into Parquet, Pandas creates a 4k file, wheres Dask creates a 39M file. Create the dataframe import pandas as pd import pyarrow as pa import pyarrow.parquet as pq import dask.dataframe as dd n =…

python pandas dask parquet pyarrow

asked Aug 06 '21 at 23:22

Dahn

1,397
1
10
29

votes

2 answers

Read last N rows of S3 parquet table

If I apply what was discussed here to read parquet files in an S3 buck to pandas dataframe, particularly: import pyarrow.parquet as pq import s3fs s3 = s3fs.S3FileSystem() pandas_dataframe = pq.ParquetDataset('s3://your-bucket/',…

python amazon-web-services amazon-s3 pyarrow

asked Jun 20 '21 at 05:41

Tristan Tran

1,351
1
10
36

Prev 1 2 3

…

71 72 Next