How to read first four columns from a file with different number of columns on each row into a data frame

Question

I have a text file whose first 10 rows look like these:

 3  a         1       4   6   2
 3  a         1       4   6   2
 4  a         1       4   6   8   2
 4  a         1       4   6   8   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 5  a         1       4   8  10   2   6
 5  a         2       6   8  10   2   4
 5  a         1       4   8  10   2   6
 5  a         1       4   8  10   2   6
 5  a         2       6   8  10   2   4

I only want to read the first four columns of each row and save it in to a data frame.

I've tried with several codes, the last one being:

library(data.table)

nudos<-fread("caliz.txt",select=c(1:4),fill=TRUE)

which keeps giving this error message:

Stopped early on line 119. Expected 11 fields but found 13. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<10 n 21 4 8 -14 2 -16 -18 -20 -6 -10 -12>>

Thanks!

Well, as the error messages suggests the problem is at line 119 so having the first 10 rows would not help to solve the exact problem. Can you share the text at line 119 or is it possible to share the complete text file? — Ronak Shah, Jun 26 '21 at 05:06

score 1 · Answer 1 · answered Jun 26 '21 at 05:14

Your table seems malformed. Even if you just want to select the first 4 columns, R reads all of them and can't cope with rows that contain more or less elements. You'll have to manually split and select the values:

lin = readLines("test.txt")
cells = strsplit(lin," ")
data = c()
for(line in cells){
  found = 0
  cell = 1
  while(found<4){
    c = line[[cell]]
    print(line)
    print(cell)
    print(c)
    if(nchar(c)>0){
      found = found+1
      data=c(data,c)
    }
    cell = cell+1
  }
}

df = as.data.frame(matrix(data,ncol=4,byrow=T))

This results in the data frame:

> df
   V1 V2 V3 V4
1   3  a  1  4
2   3  a  1  4
3   4  a  1  4
4   4  a  1  4
5   3  a  1  4
6   3  a  1  4
7   3  a  1  4
8   3  a  1  4
9   3  a  1  4
10  3  a  1  4
11  5  a  1  4
12  5  a  2  6
13  5  a  1  4
14  5  a  1  4
15  5  a  2  6

You can now change the object class of certain columns (e.g. df[,1] = as.integer(df[,1]), as they are all character at the moment. You might want to get numeric values. But that's up to you.

Rui Barradas · Accepted Answer · 2021-06-26T20:33:50.513

1

Here is a base R solution. It uses readLines to read the file and a series of *apply loops to parse it.

# read the file as text lines
txt <- readLines("test.txt")
# split by one or more spaces
txt <- strsplit(txt, " +")
# keep only the vector elements with more than 0 chars
txt <- lapply(txt, function(x) x[sapply(x, nchar) > 0])
# the last line may have a '\n' only, remove it
txt <- txt[lengths(txt) > 0]
# now extract the first 4 elements of each vector
txt <- lapply(txt, '[', 1:4)
# and rbind to data.frame
df1 <- do.call(rbind.data.frame, txt)
names(df1) <- paste0("V", 1:4)

head(df1)
#  V1 V2 V3 V4
#1  3  a  1  4
#2  3  a  1  4
#3  4  a  1  4
#4  4  a  1  4
#5  3  a  1  4
#6  3  a  1  4

edited Jun 26 '21 at 20:33

answered Jun 26 '21 at 05:22

Rui Barradas

70,273
8
34
66

Hi Rui. I get a message error in "\(x)", "unexpected input". – Sergio Enrique Yarza Acuña Jun 26 '21 at 19:06
@SergioEnriqueYarzaAcuña See if the error goes away after the edit. – Rui Barradas Jun 26 '21 at 20:34

How to read first four columns from a file with different number of columns on each row into a data frame

2 Answers2