I am new to coding currently working on a project, which requires me to parse NDJSON strings that are located in .txt files. I have hundreds of .txt files, each containing up to 1 million NDJSON strings. I have the below code, which I know parses one individual file successfully (if I explicitly state the name of the .txt input file and the name of the .csv output file):
library('ndjson')
library('tidyverse')
parsed_df <- tbl_df(ndjson::stream_in("test.txt"))
selected_df <- parsed_df[,c(3,26,30,51,54,57,76,93,99,125,143,169,173,246,
250,251,253,254,267,269,370,431,432,450)]
write.csv(selected_df, 'test_reduced.csv')
In this above example, I simply set the directory to a folder and make sure the files are located in the folder.
I now want to repeat this process but I want to loop through the all of the files in the folder, rather than manually type in the name of each file and adjust the output file. Each file contains tweet information relating to a specific disaster, so I'd like to be able to create logical names for each file, such as Nepal01.txt, Nepal02.txt, HurricaneSandy01.txt, etc. I say this because the names of each file are long, so if I rename them, I'd like to enable this process to work but keep the name logical. For this reason, I need to find a dynamic way of selecting all files that end in .txt and dynamically writing output files with relevant names in a .csv format, e.g. Nepal_reduced01.csv, Nepal_reduced02.csv, HurricaneSandy_reduced01.csv, etc.
Below is my failed attempt so far:
library('ndjson')
library('tidyverse')
filenames= list.files(".", ".txt")
for( i in 1:length(filenames) )
parsed_df <- tbl_df(ndjson::stream_in(filenames[1]))
selected_df <- parsed_df[,c(3,26,30,51,54,57,76,93,99,125,143,169,173,246,
250,251,253,254,267,269,370,431,432,450)]
write.csv(selected_df, cbind(i,'.csv'))
})
Below is an image of the error message: