2

I'm pretty new to parquet file format and I'm using the read_parquet() (in the arrow package) to load parquet file (stored in my Dropbox share folder) into R. However, I received the following error message

library(arrow)
 df <- read_parquet("https://www.dropbox.com/s/mysgf4sojpjgyp7/part-394.parquet?dl=1")

Error: Invalid: Unrecognized filesystem type in URI: https://www.dropbox.com/s/mysgf4sojpjgyp7/part-394.parquet?dl=1

What might cause this problem here and do I need to partition the url link beforehand?

Chris T.
  • 1,699
  • 7
  • 23
  • 45

1 Answers1

1

The file reading functions in the arrow package do not yet support HTTP[S] URIs. We hope to add this feature in a future release (ARROW-7594). In the meantime:

If you have Dropbox installed on the computer where you're running this, use the local path to the file instead of the HTTPS URI.

If you do not have Dropbox installed, then download the file first, like this:

myfile <- tempfile()
download.file(
  "https://www.dropbox.com/s/mysgf4sojpjgyp7/part-394.parquet?dl=1",
  myfile,
  mode = "wb"
)
df <- read_parquet(myfile)
ianmcook
  • 537
  • 4
  • 10
  • Thanks for clarifying this. But I received yet another error message after running your code, its says `Error: IOError: Couldn't deserialize thrift: don't know what type:` – Chris T. Apr 23 '21 at 19:48
  • If you're using Windows, you'll need to use `mode = "wb"` in the call to `download.file()`. I updated the answer to use this. – ianmcook Apr 23 '21 at 20:23
  • Yes, I'm using windows, and after I add this additional argument, the code works. Thanks so much for your help. – Chris T. Apr 24 '21 at 00:59