0

I have a large dataset (1.1GB) in tab seperated format. When I read this dataset into the program R using the normal read.table function:

data <- read.table(file="C:/Localdata/Postcode model/Data/FinalDataset.txt", 
                   header=TRUE, sep="\t")

it works fine. However, I wish to read it in using the ff function from the ff library so I use the code:

library(ff)
data <- read.table.ffdf(file="C:/Localdata/Postcode model/Data/FinalDataset.txt", 
                        header=TRUE, sep="\t")

There is no issue with loading the ff package and the function works fine. The error that is thrown up is:

Error in read.table(header = FALSE, sep = "\t", file = 3L, fileEncoding = "", : more columns than column names

Why is this?

jbaums
  • 27,115
  • 5
  • 79
  • 119
Emrys Komen
  • 63
  • 2
  • 10
  • 1
    You might consider including the contents of the first few lines of your file, to make it easier for us to diagnose the problem. – jbaums Oct 03 '14 at 08:12
  • The data has 16 columns, each with its own header. Two of the columns are string types and the rest numeric – Emrys Komen Oct 03 '14 at 08:38
  • Does `unique(count.fields('C:/Localdata/Postcode model/Data/FinalDataset.txt', sep='\t', comment.char=''))` return `16`? – jbaums Oct 03 '14 at 08:51
  • it returns 16 17 so yeah, that's where the problem lies i'm guessing. read.table does read it in correctly but I need to use ffdf so a solution to why it's playing up would be much appreciated – Emrys Komen Oct 03 '14 at 10:33
  • Try adding `comment.char='#'` to the `read.table.ffdf` call. Maybe one of the lines of the txt file has a comment, and this would be interpreted as such by `read.table` but not, (by default) by `read.table.ffdf`. – jbaums Oct 03 '14 at 11:05
  • didn't work unfortunately :( – Emrys Komen Oct 03 '14 at 11:35
  • It's worth inspecting those rows with 17 fields, then. For example with: `f <- 'C:/Localdata/Postcode model/Data/FinalDataset.txt'; i <- which(count.fields(f, sep='\t') == 17); do.call(rbind, lapply(i, function(x) read.table(f, header=FALSE, skip=x-1, nrows=1, sep='\t')))`. – jbaums Oct 03 '14 at 15:37
  • Can you just manually put in a 17th header into your file? Then read it in normally? – Apprentice Queue Nov 02 '14 at 22:59

0 Answers0