0

I have a folder that includes hundreds of txt files with specific financial data for each company code/year/month combination (for specific companies that are designated by the four digit number in the front of the file). I need a way to loop through all these files and create an xlsx for each one for audit purposes.

For example, the folder will include txt files like this:

8800-2015-01.txt
8800-2015-02.txt
......and many more like this......

All files have the exact same columns, column names, etc. I need to read them in and create xlsx files for each one and save them to the same folder.

In the example above, this folder would then include the txt and xlsx files like this:

8800-2015-01.txt
8600-2015-02.txt
8800-2015-01.xlsx
8800-2015-02.xlsx

I can write code to list out all the files for specific patters using this:

library(writexl)
setwd(path)

list = list.files(pattern = flob2rx('*8800*.txt*')

Then could do something like this:

txt_files <- lapply(list, function(x) read_delim(x,
                                                 "|",
                                                 escape_double = FALSE,
                                                 trim_ws = TRUE,
                                                 skip = 6))

But how would you then loop through keeping the same naming convention for each file and writing all the xlsx to the same folder?


Editing to show what I did. Solutions provided gave me ideas and lead to final of what I used. I also had to do some filtering and select specific columns which wasn't in original post.

library(writexl)

setwd(path)

file_list = list.files(pattern = glob2rx('*880*.TXT*'))

for (i in file_list){
  write_xlsx(
    filter(
      select(
        read_delim(i,
                   "|",
                   skip = 6,
                   escape_double = FALSE,
                   trim_ws = TRUE),
        "col1","col2",......),
      !is.na(col1) & col2 != 'something'),
    paste0(unlist(strsplit(i,"\\."))[1], ".xlsx"))
}
rgh_dsa
  • 107
  • 8

2 Answers2

2

I'm going to assume your files have the columns separated with tab, that is, '\t' in R.

For writing xlsx you're goint o need install.packages("xlsx"). I'm not familiar with writexl

then you can just:

library(xlsx)
setwd(path)
files=dir(pattern="*.txt")
for(i in files){
   write.xlsx(read.csv(i,sep="\t"),paste0(unlist(strsplit(i,"\\."))[1],".xlsx"))
}
1

Alternatively you can do everything at the same time. First create a data.frame containing all the necessary information about the files:

files_inp <- list.files(pattern ='.txt*')
files_out <- gsub(x = files_inp, pattern = ".txt", replacement = ".xlsx")

df_files <- data.frame(files_inp = files_inp, 
                       files_out = files_out, 
                       stringsAsFactors = FALSE)

Now you know all input names and also all output names. Then you can create a function to read a txt and save a xlsx at the same time:

read_and_save <-function(files_dir){
  file_inp <- files_dir[1]
  file_out <- files_dir[2]

  df <- read.table(file = file_inp, sep = "|")
  write_xlsx(x = df, path = file_out)
}

I am using read.table because I don't have your datasets. Now you can apply this function for the entire set of files that you have:

apply(X = df_files, MARGIN = 1, FUN = read_and_save)
Douglas Mesquita
  • 1,011
  • 7
  • 12