4

I'm very new to R or even bash. I'm trying to read Parquet file from my local using read_parquet function, but it requires to install arrow library: install.packages('arrow'), which is taking forever (read it as stuck/hang on installation step) on my Ubuntu WSL. I have tried everything else.

install.packages('arrow')  #Taking forever to install
library(arrow)
df <- read_parquet("Financial_Sample.parquet")

Could someone please help me to find any other function or library to read parquet file. any lead would be appreciated!

3 Answers3

1

To be able to use read_parquet, I had to install arrow with:

Sys.setenv(LIBARROW_MINIMAL = "false")
install.packages("arrow")

which installs arrow with the following capabilities:

dataset    TRUE
parquet    TRUE
json       TRUE
s3         TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE
Gorka
  • 1,971
  • 1
  • 13
  • 28
  • This works IMHO if one has the required C++ libs installed. I.e. libbrotli-dev on Debian. – darked89 Dec 10 '21 at 10:04
  • 1
    I have brotli installed on my system too. But I would say that the arrow installation process builds the missing libraries, see `arrow/cpp/cmake_modules/ThirdpartyToolchain.cmake`. – Gorka Dec 10 '21 at 11:04
  • I stand corrected. Indeed the cmake you file you pointed apparently defines the required source files for dependencies. Really good to know. – darked89 Dec 10 '21 at 15:02
0

Check:

library(arrow)
arrow_info()
Arrow package version: 6.0.1

Capabilities:
               
dataset    TRUE
parquet    TRUE
json       TRUE
<snip>

The default install of arrow in R does not provide you with bunch of compression algorithms (or rather any of these):

snappy    FALSE
gzip      FALSE
brotli    FALSE
zstd      FALSE
lz4       FALSE
lz4_frame FALSE
lzo       FALSE
bz2       FALSE
darked89
  • 332
  • 1
  • 2
  • 17
0

The fastest way I found to install on remote servers:

Sys.setenv(NOT_CRAN = "true")
install.packages("arrow")

This is the recommended method based on the warning messages I also got from a hanging (forever) load time with a standard CRAN install.