3

I'm an R beginner and trying to read in a number of csv files, remove/skip the last 5 rows from each one, and then rbind them together. I can't figure out which step to do the removal of the rows and what function to use? I've tried readLines below and then tried to use nrow, but I'm pretty sure its in the wrong place.

This was what I started with:

alldata <- do.call(rbind, lapply(list.files(path = "./savedfiles", full.names = TRUE), read.csv))

I wasn't sure where to remove the rows in that code so I split it up to understand it and try to use readLines:

files<- list.files(path = "./savedfiles", full.names = TRUE)
c <- lapply(files, readLines) - to count the rows
alldata<- do.call(rbind,lapply(files, nrow = length(f) - 5, full.names = TRUE), read.csv)

This is just throwing an error that argument FUN is missing, so I know I'm not doing it right but not sure how to fix it.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • I would do this in a couple steps. `alldata<-data.frame(); for (i in filelist){input<-read.csv(i); #todo cut off last five rows from input; alldata<-rbind(alldata, input);}` – M.Viking Sep 12 '19 at 00:20
  • 1
    It may be easier to read all the CSV files into a data frame with a column containing ID for each file. Then you can `dplyr::group_by` the ID and remove the last 5 lines using _e.g._ `dplyr::slice`. – neilfws Sep 12 '19 at 00:24
  • Also, you might like the enhanced functions `dplyr::bind_rows()` and `readr::read_csv()`. – M.Viking Sep 12 '19 at 00:26

2 Answers2

1

Something like this should put you on the right track. This reads the files first, then removes last 5 rows, and finally binds them together. Would also suggest not to use variable names that might conflict with function names. files and c are functions in base R. Here, I am using all_files instead of files. -

all_files <- list.files(path = "./savedfiles", full.names = TRUE)

do.call(rbind, # assuming columns match 1:1; use dplyr::bind_rows() if not 1:1
  lapply(all_files, function(x) {
    head(read.csv(x, header = T, stringsAsFactors = F), -5) # change as per needs
  })
)
Shree
  • 10,835
  • 1
  • 14
  • 36
  • 1
    thank you!! The function in the lapply was the thing I was missing. This worked great, I wasn't sure if I could do negative in head() either..(i actually tried tail() as well but that didn't work). Also thanks for calling out the lazy naming! – vuvuzelling Sep 12 '19 at 01:38
0

Using tidyverse functions you can do

library(purrr)
library(dplyr)

all_files <- list.files(path = "./savedfiles", full.names = TRUE)
map_df(all_files, ~read.csv(.x) %>% slice(seq_len(n()-5)))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213