2

I've got a dataset with feedback comments on multiple criteria from a customer survey conducted on many sites, where each row represents a single response.

For simplicity's sake, I have simplified the original dataset and produced a reproducible dataframe with comments for only three sites.

The criteria are listed from columns 4 - 10.

comments = data.frame(RESPONDENT_ID=c(1,2,3,4,5,6,7,8),
             REGION=c("ASIA","ASIA","ASIA","ASIA","ASIA","EUROPE","EUROPE","EUROPE"),
             SITE=c("Tokyo Center","Tokyo Center","Tokyo Center","PB Tower","PB Tower","Rome Heights","Rome Heights","Rome Heights"),
             Lighting=c("Dim needs to be better","","Good","I don't like it","Could be better","","",""),
             Cleanliness=c("","very clean I'm happy","great work","","disappointed","I like the work","","nice"),
             Hygiene=c("","happy","needs improvement","great","poor not happy","nice!!","clean as usual i'm never disappointed",""),
             Service=c("great service","impressed","could do better","","","need to see more","cant say","meh"),
             Punctuality=c("always on time","","loving it","proper and respectful","","","punctual as always","delays all the time!"),
             Efficiency=c("generally efficient","never","cannot comment","","","","","happy with this"),
             Motivation=c("always very motivated","driven","exceeds expectations","","poor service","ok can do better","hmm","motivated"))

I've got a second dataset, which contains the bottom 3 scoring criteria for each of the three sites.

bottom = data.frame(REGION=c("ASIA","ASIA","EUROPE"),
             SITE=c("Tokyo Center","PB Tower","Rome Heights"),
             BOTTOM_1=c("Lighting","Cleanliness","Motivation"),
             BOTTOM_2=c("Hygiene","Service","Lighting"),
             BOTTOM_3=c("Motivation","Punctuality","Cleanliness"))                 

My Objective:

1) From the comments dataframe, for each SITE, I'd like to filter the bottom dataframe, and extract the comments for the bottom 3 criteria per site only.

2) Based on this extraction, for each unique SITE, I'd like to create an Excel file with three sheets, each sheet named after the bottom 3 criteria for that given site.

3) Each Sheet would contain a list of comments extracted for that particular site.

4) I'd like all Excel files saved in the format:

REGION_SITE_Comments2017.xlsx

Desired Final Output:

3 Excel files (or as many files as there are unique sites), each Excel file having three tabs named after their bottom 3 criteria, and each sheet with a list of comments corresponding to the given criterion for that site.

So as an example, one of the three files generated would look like this:

  • The file name would be ASIA_TokyoCenter_Comments2017.xlsx
  • The file would contain 3 sheets, "Lighting","Hygiene" & "Motivation" (based on the three bottom criteria for this site)
  • Each of these sheets would contain their respective site-level comments.

My Methodology:

I tried using a for loop on the comments dataframe, and filtering the bottom dataframe for each site listed.

Then using the write.xlsx function from the xlsx package to generate the Excel files, with the sheetName argument set to each of the bottom three citeria per site.

However I cannot seem to get the desired results. I have searched on Stackoverflow for similar solutions, but haven't found anything yet.

Any help with this would be highly appreciated!

Varun
  • 1,211
  • 1
  • 14
  • 31

1 Answers1

1

This probably can be formatted better... But for each level in Region and Site, for each 'bottom', we extract each independent combination and write to file.

bottom <- sapply(bottom, as.character) # Get out of factors.
sp <- split(comments, comments$REGION) # Split data into a list format for ease.
for(i in unique(bottom[,1])){
   for(j in unique(bottom[,2])){
       x <- sp[[1]][sp[[i]][,3]==j,]
       y <-  x[,colnames(x)%in%bottom[bottom[,1]==i& bottom[,2]==j,3:5]]
       for(q in colnames(y)){
       if(nrow(x) > 0) {
         write.xlsx(x=y[,q],
                    file=paste(i,j, 'Comments2017.xlsx', sep='_'),
                    sheetName=q, append=T)
       }
     }
   }
 }

Is this what you were looking for?

TJGorrie
  • 386
  • 3
  • 13
  • This seems quite close, but the final files do not seem to be in the right format. After testing your code, I see that each sheet is not named after a single 'bottom' criterion. And each sheet seems to contain 3 criteria instead of one. And these criteria do not seem to correspond to those listed in the bottom dataframe for the given site. – Varun Nov 29 '17 at 10:29
  • @Varun Your question (regarding how the results should look) wasn't exactly clear in the first place. I have updated my answer so that the code now produces spreadsheet according to bottom_3 and separates each question into its own sheet. I am afraid you will either need to be more clear regarding what you want. Or use the frame-work I have provided to generate the output you want. – TJGorrie Nov 29 '17 at 10:46
  • Thanks @TJGorrie, and apologies if I wasn't clear in my explanation. I've updated the original question to include an example under the Desired Output part. I tested your updated code, now I have the three tabs named after their respective tabs. This is exactly what I wanted. However the three criteria don't seem to correspond exactly to those listed in the 'bottom' dataframe for a given site, i.e. I seem to have some criteria that don't belong in the 'bottom' dataframe for a given site. – Varun Nov 29 '17 at 10:52
  • it seems to be giving me an error message: Error: $ operator is invalid for atomic vectors – Varun Nov 29 '17 at 11:53
  • @Varun I have fixed the code. It runs on my end perfectly. – TJGorrie Nov 29 '17 at 12:06
  • One last question for you @TJGorrie. What does this line signify x <- sp[[1]][sp[[i]][,3]==j,] and how would I adapt the code if I were to increase the 'bottom' criteria to 5, instead of 3? – Varun Nov 29 '17 at 15:11
  • @Varun if you want the bottom five you will change this line: y <- `x[,colnames(x)%in%bottom[bottom[,1]==i& bottom[,2]==j,3:5]]` to 3:n (e.g. 3:7). The line you have quoted is simply extracting the rows that satisfy Region and Site. – TJGorrie Nov 29 '17 at 15:38
  • Thanks again @TJGorrie. Each sheet contains the comments I need, but also contains rows with empty cells (no comments). Is there a way the code can be modified to exclude those rows that contain blanks? – Varun Nov 29 '17 at 17:45