3

The (now-superseded) stand-alone feather library for R had a function called feather_metadata() that allowed to read column names and types from feather files on disk, without opening them. This was useful to select only specific columns when loading a feather file in R with read_feather(path, columns = c(...))

Now that the feather format is part of the arrow library, feather_metadata() is not included anymore.

Is there an equivalent function in arrow to read column names and types of files on disk from R before loading them?

MatteoS
  • 745
  • 2
  • 6
  • 17

1 Answers1

2

In the current version of the arrow R package, there is no direct replacement for feather::feather_metadata(path), but there are two workarounds that might work for you:

  • If you just need the column names (not the data types), you can do this:

    rf <- arrow::ReadableFile$create(path)
    fr <- arrow::FeatherReader$create(rf)
    names(fr)
    
  • If you need the data types of the columns, you can try this:

    arrow::read_feather(path, as_data_frame = FALSE)
    

    That gives output like what you're looking for, and it should be pretty fast (because it does not convert the file to an R data frame) but it does read the full file (or at least it memory-maps the full file) so you might not want to do this if your Feather files are really large.

ianmcook
  • 537
  • 4
  • 10
  • Is there a similar way to do this with parquet files that have partions? – Ryan Garnett Sep 01 '21 at 21:54
  • Yes, you can use [arrow::open_dataset()](https://arrow.apache.org/docs/r/reference/open_dataset.html) to open multi-file datasets optionally with partitioning. It does not read the files from disk. – ianmcook Sep 02 '21 at 22:43