1

This code reads a xlsx file and creates individualy named files based on sheet number and a value found at a particular location (in this case temp[2,1]). However because each file and sheet is slightly different the names are inconsistant.

sheet_to_read <- c(11,12,13,14)
for( excelsheet in files) {
for (sheet in sheet_to_read) {
temp <- read_excel( path = excelsheet, sheet = sheet, col_names = FALSE)
write.csv( temp, file = paste0( "./", gsub("./", "", excelsheet), temp[2,1], sheet,".csv") )
}}

I would like is a way of naming the files with TRUE or FALSE if a specific character string is present any where within the sheet, in this case 'vivax'. So in this case:

GBD2016_2_915_Boletin Epidemiologico_2016_37sheet21true.xls
GBD2016_2_915_Boletin Epidemiologico_2016_37sheet22false.xls`

Example file: https://drive.google.com/file/d/1p4HAuFl7Codine1Vvb8SzA7OHTzraaHz/view?usp=sharing

oguz ismail
  • 1
  • 16
  • 47
  • 69
Bohnston
  • 69
  • 10

1 Answers1

1

Since you have a tibble, and it is not known which column has which type, I created this:

isWordInTibble <- function(word, tibble) {
  # case insensitive
  any(unlist(
    sapply(1:ncol(tibble), 
           function(i) tolower(word) %in% tolower(as.character(unlist(tibble[, i]))))))
}

It looks whether the word is in any of the column vectors - looks through all columns.

Replace your file argument in your write.csv command by:

file = gsub(".xls", 
            paste0(substr(temp[2, 1], 
                          1, 
                          5), # just first 5 letters
                   gsub("sheet", "", sheet), 
                   substr(tolower(as.character(isWordInTibble("vivax", tmp))),
                          1, 
                          1), # just first letter ("t" or "f") 
                   ".csv"), 
            excelsheet)

Then it might work!

I tried to shorten the names using substr(, start, end) and gsub().

Appendix

Since you asked how to print then only the files containing the word but not those which don't contain it:

In your example, instead of the write.csv() command, replace it with:

if (isWordInTibble("vivax", tmP)) {
  write.csv(temp, 
            file = gsub(".xls", 
                        paste0(substr(temp[2, 1], 
                                      1, 
                                      5), # just first 5 letters
                               gsub("sheet", "", sheet), 
                               substr(tolower(as.character(isWordInTibble("vivax", temp))),
                                      1, 
                                      1), # just first letter ("t" or "f") 
                               ".csv"), 
                        excelsheet))
}

Then it prints out only if the isWordInTibble returns TRUE.

Gwang-Jin Kim
  • 9,303
  • 17
  • 30
  • I will do some testing with more files but it seems to work. – Bohnston Oct 18 '18 at 16:34
  • Thanks! It looks more complex than the other solution - but it is because I wanted to catch the cases where the word is in a column typed as `factor`. – Gwang-Jin Kim Oct 18 '18 at 16:37
  • The only problem I have with it, is it tries to save really really long file names, is there a way to limit this but still keep the sheet number and true or false? – Bohnston Oct 18 '18 at 16:46
  • you can replace things and shorten things by `gsub()` since it can do regex. What you want to shorten? - can you give some examples? – Gwang-Jin Kim Oct 18 '18 at 16:48
  • 1
    If you want to limit `tmp[2, 1]` to let's say 10 characters, you could do: `substr(temp[2, 1], 1, 10)` – Gwang-Jin Kim Oct 18 '18 at 16:50
  • How about the version now? – Gwang-Jin Kim Oct 18 '18 at 16:53
  • Sometime temp[2,1] is a really long string so when it tries to save it as a file name it has an error, I guess becuase there is a cahracter limit? – Bohnston Oct 18 '18 at 16:57
  • 1
    yes, file names hould not be too long. I limited in the code above to 7 charcters. You can adjust it as you want. And with `gsub` I remove "sheet". try the new `file=` definition above!) – Gwang-Jin Kim Oct 18 '18 at 16:59
  • Where do I put arguments to ignore case etc? – Bohnston Oct 22 '18 at 15:00
  • `tolower()` makes that everything is lower case and since `"vivax"` is lower case, this in endeffect is the case ignoring. – Gwang-Jin Kim Oct 22 '18 at 16:21
  • how can I stop it creatingfiles that end in false? ie only make files with the word in them? – Bohnston Nov 21 '18 at 12:03
  • 1
    @Bohnston: see my modified answer. – Gwang-Jin Kim Nov 22 '18 at 16:02
  • it works. I don't think the `tolower` works as if i search any other term its case sensitive. Ie it wouldn't find Vivax – Bohnston Nov 23 '18 at 09:15
  • @Bohnston: I added `tolower` in `isWordInTibble` definition to make it case insensitive. The `tolower` in `write.csv` was to make `T` and `F` to lower case and not for the search. – Gwang-Jin Kim Nov 23 '18 at 12:30
  • Oh that makes sense! Is it possible to make the search not case sensitive? – Bohnston Nov 23 '18 at 14:05
  • isn't it not case sensitive? if you compare two strings which were made lowercase by `tolower` - this is how case insensitivity is realized actually - you see that trick quite often. The tolower from before was not for the comparison but for the output - that is why it had no influence. – Gwang-Jin Kim Nov 24 '18 at 21:14