1

I recently encountered a problem where we had a fixed width file. For example -

Name   Income
John   $10,000
Mary   $15,000
Walter $25,000

How to read the fixed width files using just the column names?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Slayer
  • 832
  • 1
  • 6
  • 21
  • Thank you for posting a question you knew the answer to! – Gregor Thomas Apr 05 '19 at 21:58
  • 1
    I encountered this at work today, I thought I should put it out into the world, so that everyone can benefit from it at some point. – Slayer Apr 05 '19 at 21:59
  • 1
    Don't bother trying on variable-width files. `fwf_empty` will look for *consistent* widths, it will give bad results if there are not consistent widths. If you need to read a delimited file where you don't know the delimiter, `data.table::fread` usually does a good job guessing. – Gregor Thomas Apr 05 '19 at 22:01
  • `data.table::fread` is a pretty cool. But my problem was not solved using it. Therefore, I researched a little bit on this front. – Slayer Apr 05 '19 at 22:05

1 Answers1

1

In order to solve this problem I came across a readr function read_fwf() which takes file name as an argument and another argument fwf_empty() specifying the whether the fix width be guess or not.

Say, my file name is fixed_width_file.csv, and I have a million rows. I would read the file just by using the column names.

library(readr)
read_fwf("fixed_width_file.csv",
         fwf_empty("fixed_width_file.csv", 
         col_names = c("Name", "Income")),
         skip = 1)

Check to see that the columns are aligned by looking at head of the data.frame.

I will update the answer as I know more.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Slayer
  • 832
  • 1
  • 6
  • 21