1) I have 48 tar.gz files in a directory "C:/Users/Me/Desktop/JUNETEST/". The files all have the same format with the exception of a number that represents the hour of the day for each file (i.e. voa_20170601-110000.tar.gz as opposed to voa_20170601-120000.tar.gz. I need to import all 48 files and untar the files and pull out just the data file which has the same name in each ".tar.gz". The name is "hit_data.tsv". I want to assign each "hit_data.tsv" as an element in a list of data frames. My code appears to untar all 48 files correctly.
However, the problem is in reading the hit_data.tsv file. It attempts to read in every column, but has problems because there is no column header, so it reads only the first column. If pulling each file individually, I would assign the colnames(hit_data.tsv) the value of a vector called Headers. Each file as the same column names so the Header vector can apply to each hit_data.tsv. My question is, how do I assign the column names to each file during the loop? Or how do I correct my code to read in all columns of the "hit_data.tsv" file?
The code in its current state is below:
files <- list.files(path = "C:/Users/Reginald/Desktop/JUNETEST/",pattern = "tar.gz")
VOA<-length(files)
for (i in files){
eval(parse(text = paste0("untar(\"C:/Users/Reginald/Desktop/JUNETEST/",i,"\",files=\"hit_data.tsv\")",sep="" )))
VOA[i] <- read_tsv("~/hit_data.tsv")
#VOA[i]<- as.data.frame(VOA[i])
#colnames(VOA[i])<-Headers[1,]
VOA.df <-do.call(rbind,VOA[i])
}