1

I have a text file whose first 10 rows look like these:

 3  a         1       4   6   2
 3  a         1       4   6   2
 4  a         1       4   6   8   2
 4  a         1       4   6   8   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 5  a         1       4   8  10   2   6
 5  a         2       6   8  10   2   4
 5  a         1       4   8  10   2   6
 5  a         1       4   8  10   2   6
 5  a         2       6   8  10   2   4

I only want to read the first four columns of each row and save it in to a data frame.

I've tried with several codes, the last one being:

library(data.table)

nudos<-fread("caliz.txt",select=c(1:4),fill=TRUE)

which keeps giving this error message:

Stopped early on line 119. Expected 11 fields but found 13. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<10 n 21 4 8 -14 2 -16 -18 -20 -6 -10 -12>>

Thanks!

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Can you set fill=TRUE and then discard the extra rows? – IRTFM Jun 26 '21 at 05:06
  • 1
    Well, as the error messages suggests the problem is at line 119 so having the first 10 rows would not help to solve the exact problem. Can you share the text at line 119 or is it possible to share the complete text file? – Ronak Shah Jun 26 '21 at 05:06

2 Answers2

1

Your table seems malformed. Even if you just want to select the first 4 columns, R reads all of them and can't cope with rows that contain more or less elements. You'll have to manually split and select the values:

lin = readLines("test.txt")
cells = strsplit(lin," ")
data = c()
for(line in cells){
  found = 0
  cell = 1
  while(found<4){
    c = line[[cell]]
    print(line)
    print(cell)
    print(c)
    if(nchar(c)>0){
      found = found+1
      data=c(data,c)
    }
    cell = cell+1
  }
}

df = as.data.frame(matrix(data,ncol=4,byrow=T))

This results in the data frame:

> df
   V1 V2 V3 V4
1   3  a  1  4
2   3  a  1  4
3   4  a  1  4
4   4  a  1  4
5   3  a  1  4
6   3  a  1  4
7   3  a  1  4
8   3  a  1  4
9   3  a  1  4
10  3  a  1  4
11  5  a  1  4
12  5  a  2  6
13  5  a  1  4
14  5  a  1  4
15  5  a  2  6

You can now change the object class of certain columns (e.g. df[,1] = as.integer(df[,1]), as they are all character at the moment. You might want to get numeric values. But that's up to you.

Martin Wettstein
  • 2,771
  • 2
  • 9
  • 15
1

Here is a base R solution. It uses readLines to read the file and a series of *apply loops to parse it.

# read the file as text lines
txt <- readLines("test.txt")
# split by one or more spaces
txt <- strsplit(txt, " +")
# keep only the vector elements with more than 0 chars
txt <- lapply(txt, function(x) x[sapply(x, nchar) > 0])
# the last line may have a '\n' only, remove it
txt <- txt[lengths(txt) > 0]
# now extract the first 4 elements of each vector
txt <- lapply(txt, '[', 1:4)
# and rbind to data.frame
df1 <- do.call(rbind.data.frame, txt)
names(df1) <- paste0("V", 1:4)

head(df1)
#  V1 V2 V3 V4
#1  3  a  1  4
#2  3  a  1  4
#3  4  a  1  4
#4  4  a  1  4
#5  3  a  1  4
#6  3  a  1  4
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66