7

How can I open a .snappy.parquet file in python 3.5? So far, I used this code:

import numpy
import pyarrow

filename = "/Users/T/Desktop/data.snappy.parquet" 
df = pyarrow.parquet.read_table(filename).to_pandas()

But, it gives this error:

AttributeError: module 'pyarrow' has no attribute 'compat'

P.S. I installed pyarrow this way:

pip install pyarrow
user9439906
  • 433
  • 2
  • 7
  • 17

3 Answers3

8

I have got the same issue and managed to solve it by following the solutio proposed in https://github.com/dask/fastparquet/issues/366 solution.

1) install python-snappy by using conda install (for some reason with pip install, I couldn't download it)

2) Add the snappy_decompress function.

from fastparquet import ParquetFile
import snappy
def snappy_decompress(data, uncompressed_size):
    return snappy.decompress(data)
pf = ParquetFile('filename') # filename includes .snappy.parquet extension
dff=pf.to_pandas()
Bengi Koseoglu
  • 159
  • 4
  • 10
4

The error AttributeError: module 'pyarrow' has no attribute 'compat' is sadly a bit misleading. To execute the to_pandas() function on a pyarrow.Table instance you need pandas installed. The above error is a sympton of the missing requirement.

pandas is a not a hard requirement of pyarrow as most of its functionality is usable with just Python built-ins and NumPy. Thus users of pyarrow which include pandas can work with it without needing to have pandas pre-installed.

Uwe L. Korn
  • 8,080
  • 1
  • 30
  • 42
4

You can use pandas to read snppay.parquet files into a python pandas dataframe.

import pandas as pd
filename = "/Users/T/Desktop/data.snappy.parquet"
df = pd.read_parquet(filename)