1

I have a series of txt files formatted in the same way. The first few rows are all about file information. There are no variable names. As you can see spaces between factors are inconsistent but Columns are left-aligned or right-aligned.I know SAS could directly read data with this format and wonder if R provide any function similar.

I tried read.csv function to load these data and I want to save them in a data.frame with 3 columns, while it turns out the option sep = "\s"(multiple spaces) in the function cannot recognize regular expression.

So I tried to read these data in a variable first and use substr function to split them as following. step1

 Factor<-data.frame(substr(Share$V1,1,9),substr(Share$V1,9,14),as.numeric(substr(Share$V1,15,30)))

step2

But this is quite unintelligent, and need to count the spaces between. I wander if there is any method to directly load data as three columns.

    > Factor
   F  T      S
1   +B2P       A     1005757219
2   +BETA      A      826083789
Mengqiu
  • 13
  • 5

1 Answers1

0

We can use read.table to read it as 3 columns

read.table(text=as.character(Share$V1), sep="", header=FALSE, 
                 stringsAsFactors=FALSE, col.names = c("FactorName", "Type", "Share"))
#  FactorName Type      Share
#1       +B2P    A 1005757219
#2      +BETA    A  826083789
#3       +E2P    A  499237181
#4      +EF2P    A   38647147
#5     +EFCHG    A  866171133
#6    +IL1QNS    A  945726018
#7    +INDMOM    A  862690708

Another option would be to read it directly from the file, skipping the header line and change the column names

read.table("yourfile.txt", header=FALSE, skip=1, stringsAsFactors=FALSE,
              col.names = c("FactorName", "Type", "Share"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I see where I coded wrong. The default argument sep = ""(no space) refers to one or more spaces, but I typed sep = " "(one space) which lead to exact match and incorrectly split. the columns.Thus function read.csv also work for this problem. – Mengqiu Jul 27 '16 at 08:08