0

1) I have 48 tar.gz files in a directory "C:/Users/Me/Desktop/JUNETEST/". The files all have the same format with the exception of a number that represents the hour of the day for each file (i.e. voa_20170601-110000.tar.gz as opposed to voa_20170601-120000.tar.gz. I need to import all 48 files and untar the files and pull out just the data file which has the same name in each ".tar.gz". The name is "hit_data.tsv". I want to assign each "hit_data.tsv" as an element in a list of data frames. My code appears to untar all 48 files correctly.

However, the problem is in reading the hit_data.tsv file. It attempts to read in every column, but has problems because there is no column header, so it reads only the first column. If pulling each file individually, I would assign the colnames(hit_data.tsv) the value of a vector called Headers. Each file as the same column names so the Header vector can apply to each hit_data.tsv. My question is, how do I assign the column names to each file during the loop? Or how do I correct my code to read in all columns of the "hit_data.tsv" file?

The code in its current state is below:

    files <- list.files(path = "C:/Users/Reginald/Desktop/JUNETEST/",pattern = "tar.gz")
VOA<-length(files)

for (i in files){ 
  eval(parse(text = paste0("untar(\"C:/Users/Reginald/Desktop/JUNETEST/",i,"\",files=\"hit_data.tsv\")",sep="" )))
  VOA[i] <- read_tsv("~/hit_data.tsv")
  #VOA[i]<- as.data.frame(VOA[i])
  #colnames(VOA[i])<-Headers[1,]
VOA.df <-do.call(rbind,VOA[i])

}
Reric
  • 21
  • 2

1 Answers1

0

I guess you are using read_tsv in readr package.

If it is the case, you may try the following codes in your for loop

read_tsv('~/hit_data.tsv', col_names=Headers)

It is also a good practice to use help() or read the package document (readr.pdf) to understand the function you are using.

pe-perry
  • 2,591
  • 2
  • 22
  • 33
  • Thanks kitman0804 for the advice but col_names only takes TRUE and FALSE not other objects according to the error I received. I tried column names = FALSE still no solution. – Reric Jul 11 '17 at 04:19
  • oops I still only read in one column but atleast I don't get an error for header. I will read the documentation more to see what's wrong – Reric Jul 11 '17 at 04:31
  • @Reric If `col_names` is a character vector, the values will be used as the names of the columns, and the first row of the input will be read into the first row of the output data frame. – pe-perry Jul 11 '17 at 06:51