0

I am using the package read_bulk to read in a large number of CSV files.

  dfc <- data.frame(read_bulk(directory = "C:/place/with/data", 
            subdirectories = FALSE, 
            extension = ".csv",
            data = NULL,
            verbose = TRUE, 
            fun = utils::read.csv, stringsAsFactors = FALSE, is.na(" ")))

  names(dfc) <- c("Headers", "I", "Want", "Instead")

  write_csv(dfc, path = paste("Data"," ",Sys.Date(),".csv"))

which works fine, but I'd like the headers to be removed. headers = FALSE does not work in read_bulk. I thought this would be a simple fix by doing

  dfc %>%
     filter(Headers != "undesirable headers from read_bulk") 

after I assign the names but this has not worked. I also tried str_extract_all for the "undesirable headers from read_bulk" but this hasn't worked either.

the str of all the data are all characters, though the first column header of all the data has  before the column name after read_bulk. Is this an encoding problem? is this causing my data not to be filtered?

dummy data

  CSV Dataset 1           CSV Dataset2              ...etc more datasets

  Facility ID Status      Facility ID Status
  abc      1  A           def      5  A
  efg      2  B           lmo      8  B
  hij      3  A           pqr      9  C
  abc      4  B           xyz      7  B

R output after read_bulk of dummy data

  Facility ID Status
  abc            1  A
  efg            2  B
  hij            3  A
  abc            4  B
  Facility ID Status
  def            5  A
  lmo            8  B
  pqr            9  C
  xyz            7  B

I would like to remove these headers from my data set

Sean
  • 43
  • 5
  • Can you use `readr::read_csv` with either `col_names = FALSE` or `skip = 1`? – Tung Sep 19 '18 at 20:31
  • See this [answer](https://stackoverflow.com/a/48105838/786542). I updated it to include your case – Tung Sep 19 '18 at 20:43
  • col_names did not work, skip = 1 worked for the first file but returned NAs for the rest of my data. I will try fread() and purrr:map_df() as suggested in the above answer. – Sean Sep 19 '18 at 20:51
  • There must be something else going on because I've just tested with the `csv` I have and it worked fine. The data frame had `fileName X1 X2 X3 X4 X5 X6 X7 ....` instead of normal header name – Tung Sep 19 '18 at 20:54

0 Answers0