How to read column names and metadata from feather files in R arrow?

Question

The (now-superseded) stand-alone feather library for R had a function called feather_metadata() that allowed to read column names and types from feather files on disk, without opening them. This was useful to select only specific columns when loading a feather file in R with read_feather(path, columns = c(...))

Now that the feather format is part of the arrow library, feather_metadata() is not included anymore.

Is there an equivalent function in arrow to read column names and types of files on disk from R before loading them?

score 2 · Accepted Answer · answered Mar 09 '21 at 03:55

In the current version of the arrow R package, there is no direct replacement for feather::feather_metadata(path), but there are two workarounds that might work for you:

If you just need the column names (not the data types), you can do this:

rf <- arrow::ReadableFile$create(path)
fr <- arrow::FeatherReader$create(rf)
names(fr)

If you need the data types of the columns, you can try this:
```
arrow::read_feather(path, as_data_frame = FALSE)
```
That gives output like what you're looking for, and it should be pretty fast (because it does not convert the file to an R data frame) but it does read the full file (or at least it memory-maps the full file) so you might not want to do this if your Feather files are really large.

Is there a similar way to do this with parquet files that have partions? — Ryan Garnett, Sep 01 '21 at 21:54
Yes, you can use [arrow::open_dataset()](https://arrow.apache.org/docs/r/reference/open_dataset.html) to open multi-file datasets optionally with partitioning. It does not read the files from disk. — ianmcook, Sep 02 '21 at 22:43

How to read column names and metadata from feather files in R arrow?

1 Answers1

Linked