2

I am using the feather packages for data exchange between Python (collecting the data) and R (used for analysis), writing and reading the data in Python is extremely fast. However, reading the same feather object in R is VERY slow, on the order of minutes for about a 10MB feather object that has about 80K rows and 24 columns. Each time I am reading the feather object locally so it is not due to network latency.

The only thing that I think it could be is, some of the variables (5 to be exact) are int64 type in Python which get coerced to double types when R has import them. This causes R to give the coercing int64 to double warning during the reading of the feather object. Can anyone confirm this or is there another explanation?

EDIT: Coercing is not the problem, I re-saved the int64 columns in Python at int32 and the reading in R is still just as slow. Need help.

EDIT 2: Example Code As requested, here is the code I am running. Just reading the feather object from a sub-folder essentially:

library(feather)
test_feather = read_feather("C:/my_folder/subfolder/test.feather")
guy
  • 1,021
  • 2
  • 16
  • 40
  • 2
    I've used feather with R to read several gigabytes of unstructured text and its extremely fast. Perhaps you could provide your script, per usual stackoverflow standards, so that we may find your user error. Thanks – Justin Aug 12 '17 at 07:58
  • @Justin I am just running the read command, so my code is simply `data <- read_feather("c:/my_project/subfolder/my_data.feather")` – guy Aug 12 '17 at 18:33
  • `sessionInfo()` and a **reproducible** example please? – Ben Bolker Aug 14 '17 at 12:13
  • @BenBolker Added the `sessionInfo()` and I don't know how best to post a repoducible example since I can't simple post my feather object here. As I said, it comes from a Python dataframe that is about 50k rows with 25 variables, with a mix of float64, int32, datetimes, and unicode strings – guy Aug 14 '17 at 12:29
  • Could it be due to the data being first handled / written in a Linux system and then trying to read the feather object in R on a windows machine? – guy Aug 14 '17 at 13:08

1 Answers1

2

The issue is due to the creation of the feather object in a Linux environment while the reading of the same object in R was happening in a windows system. I don't fully know the details but essentially each OS has a different specification when representing binary data on disk.

I don't remember reading this issue / warning in the documentation (though I suppose it is obvious and implicit), but perhaps a little reminder might save some future people from making the same mistake.

guy
  • 1,021
  • 2
  • 16
  • 40