1

I am trying to read larger size (40-400MB ) Parquet file in Google colab. I am getting error it says :

46 raise ValueError("engine must be one of 'pyarrow', 'fastparquet'")

ValueError: engine must be one of 'pyarrow', 'fastparquet'

I have refer this page and run following command : I am still facing the same error enter image description here

My code looks like this : enter image description here

What am I missing here.

Gautam
  • 3,707
  • 5
  • 36
  • 57
  • 4
    I have solved to settle this error message after adding this code "data = pd.read_parquet('/content/drive/My Drive/ColabNotebooks/000.snappy.parquet', engine='pyarrow')" – Ramgau May 19 '21 at 09:07

1 Answers1

0

First import all necessary libraries...

import numpy as np
import pandas as pd
import io
import sys
import math

We need to install pyarrow for reading the parquet file...

pip install pyarrow

import pyarrow as pa
import pyarrow.parquet as pq

Now we need to mount the required file and can see the output as...

from google.colab import drive
drive.mount('/content/gdrive')

output_file_parquet = pq.read_table('/content/gdrive/MyDrive/ColabNotebooks/000.snappy.parque')
output = output_file_parquet.to_pandas()
output

Note: I was not able to mount to 'drive' instead 'gdrive' worked for me.

mbty7813
  • 1
  • 4