0

Im trying to generate sentences from a dataframe Below is the dataframe

# Code
mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD")
mycode <- sample(mycode, 20, replace = TRUE)

# Date
mydate <-c("2016-10-17","2016-10-18","2016-10-19","2016-10-20")
mydate <-sample(mydate, 20, replace = TRUE)

# resort
myresort <-c("GB","IE","GR","DK")
myresort <-sample(myresort, 20, replace = TRUE)

# Number of holidaymakers
HolidayMakers <- sample(1000, 20, replace = TRUE)

mydf <- data.frame(mycode,
                  mydate,
                  myresort,
                  HolidayMakers)

So if we take mycode as an example, I want to create a sentence like "For the code mycode, the biggest destinations are myresorts where the top days of visiting were mydate with a total of HolidayMakers"

If we assume that there are multiple lines per code. What i want is a single sentence where for example instead of having one sentence per mydate and myresort, i would like to say something like

"For the code AAABBB, the biggest destinations are GB,GR,DK,IE where the top days of visiting were 2016-10-17,2016-10-18,2016-10-19 with a total of 650"

The 650 would basically be a sum of all the holiday makers for all those countries for those days per mycode

Any anyone help?

Thank you for your time

John Smith
  • 2,448
  • 7
  • 54
  • 78
  • When you say the "top days of visiting" do you perform any kind of calculations ? Top 3 days by HolydayMakers, most frequent days.. ? Also, for `AAABBB` why isn't 2016-10-20 included ? – Steven Beaupré Oct 26 '16 at 10:55
  • Hi @StevenBeaupré The dataframe is the end result of the calculation. which contains top 3. For the purposes of explanation i thought it would be better just to have a general solution as days change and coupon numbers do too and not all days will be present – John Smith Oct 26 '16 at 11:07

1 Answers1

2

You could try:

library(dplyr)
res <- mydf %>%
  group_by(mycode) %>%
  summarise(d = toString(unique(mydate)), 
            r = toString(unique(myresort)), 
            h = sum(HolidayMakers)) %>%
  mutate(s = paste("For the code", mycode, 
                   "the biggest destinations are", r, 
                   "where the top days of visiting were", d, 
                   "with a total of", h))

Which gives:

> res$s

#[1] "For the code AAABBB the biggest destinations are GB, GR, IE, DK 
#     where the top days of visiting were 2016-10-17, 2016-10-18, 
#     2016-10-20, 2016-10-19 with a total of 6577"
#[2] "For the code AAABBD the biggest destinations are IE 
#     where the top days of visiting were 2016-10-17, 2016-10-18 
#     with a total of 1925"                                    
#[3] "For the code AAACCC the biggest destinations are IE, GR, DK 
#     where the top days of visiting were 2016-10-20, 2016-10-17, 
#     2016-10-19, 2016-10-18 with a total of 2878"    

Note: Since you didn't provide any guidance as to how you intend to calculate the "top visiting days", I simply included all days. You could easily edit the above to fit your actual situation.

Steven Beaupré
  • 21,343
  • 7
  • 57
  • 77