-1

I really just need another set of eyes on this code. As you can see, I am searching for files with the pattern "*45Fall...". But every time I run it, it pulls up the files "*45Sum..." and one time pulled up the "*45Win..." files. It seems totally random and the code clearly asks for Fall. I'm confused.

What I am doing is importing all files with "Fall_2040301", (there are many other numbers associated with "Fall" as well as many other names associated with "*Fall_2040301", as well as Win, Spr, and Sum). I am truncating them at 56 lines by removing the last 84 lines, and binding them together so that I can write them out as a group.

fnames <- dir("~/Desktop/RprojPuddle/modified_files", pattern = "*45Fall_2040301.csv")
read_data <- function(z){
   dat <- fread(z, skip = 0, select = 1:3, )
   return(dat[1:(nrow(dat)-84),])
}

datalist <- lapply(fnames, read_data)
bigdata <- rbindlist(datalist, use.names = T)
datalist = do.call("rbind", bigdata)
datalist

splitByHUCs <- split(bigdata, f = bigdata$HUC8 , sep = "\n", lex.order = TRUE)

saveFun_WRITE <- function(splitByHUCs, name_i) {
   fwrite(splitByHUCs, file = paste0("~/Desktop/RprojPuddle/splitByHUCs/b02040301/splFall/", name_i, ".csv")) # save the file to the computer
}
mapply(FUN = saveFun_WRITE, splitByHUCs, name_i = names(splitByHUCs), SIMPLIFY = FALSE)

I used the same code used for the other seasons and it worked well. I know that it's Frankenstein code and I would welcome suggestions for tightening it up also, but really, it's working. Except for Fall. Thanks.

  • It sounds like you've isolated the problem to the filenames. Can you please share 2 or 3 example file names you are trying to match, and 2 or 3 that are matching that you do not want to match? It would be most helpful if you could put them in a single character vector. – Gregor Thomas Jun 08 '21 at 04:05
  • 2
    At a glance, the pattern `"*45Fall_2040301.csv"` looks like bad regex - `*` is a quantifier (0 or more) that must have something before it to quantify. (It's more strict and more powerful more basic wildcard syntax). I'd suggest changing to `".*45Fall_..."`, using the `.` special which means "any character". – Gregor Thomas Jun 08 '21 at 04:08
  • 1
    What does `fnames` return? – Ronak Shah Jun 08 '21 at 04:46
  • @RonakShah - `fnames` returns this: `> fnames [1] "bcc1_45Fall_1020004.csv" "bcc1_M_45Fall_1020004.csv" "bnuesm_45Fall_1020004.csv" "canesm2_45Fall_1020004.csv" "ccsm4_45Fall_1020004.csv" "cnrmcm5_45Fall_1020004.csv" ...` and then some. – David Montana Jun 08 '21 at 22:57
  • @GregorThomas - I've shared (above) some of the names. I am new and don't know what you mean by putting them in a single character vector. I searched and found this `stringcsvs <- as.vector(str_split_fixed(fnames, pattern = "", n = nchar(fnames)))` (I put `fnames` in), but it returned more than 628 single characters in quotes. I can't put that in a comment. – David Montana Jun 08 '21 at 23:01
  • @GregorThomas - I tried `".*45Fall_..."` and it is still calling the "Sum" version of the files: `bcc1_45Sum_2040301, bcc1_M_45Sum_2040301, bnuesm_45Sum_2040301, canesm2_45Sum_2040301, ccsm4_45Sum_2040301, cnrmcm5_45Sum_2040301` I don't understand it. `fnames` calls the Fall version. BTW, the numbers are wrong in my comment above, I ran an earlier version. (Which *worked* for Fall at that time.) – David Montana Jun 08 '21 at 23:13
  • I am also getting this error, which I do not understand: `Error in `[.data.table`(dat, 1:(nrow(dat) - 84), ) : Item 3 of i is -1 and item 1 is 1. Cannot mix positives and negatives.` But now ANY section of the code that I run returns the same `...* 45Sum_2040301" no matter what the numbers or season is that I call! – David Montana Jun 08 '21 at 23:29

1 Answers1

0

Ok, it doesn't seem to matter whether I use ".45Fall_2222" or "*45Fall_2222", both return the same result. The problem turned out to be with the read_data function. I had originally tried this:

read_data <- function(z){
   dat <- fread(z, skip = 0, select = 1:3, )
   return(dat[1:(nrow(dat)-84),])
}

When I changed it to be a positive number (below) it now works fine, for all inputs.

read_data <- function(z){
  dat <- fread(z, skip = 0, select = 1:3, )
  return(dat[1:(nrows=56)])
}

Thanks all.