Building Sentences from a dataframe in R

Question

Im trying to generate sentences from a dataframe Below is the dataframe

# Code
mycode <- c("AAABBB", "AAABBB", "AAACCC", "AAABBD")
mycode <- sample(mycode, 20, replace = TRUE)

# Date
mydate <-c("2016-10-17","2016-10-18","2016-10-19","2016-10-20")
mydate <-sample(mydate, 20, replace = TRUE)

# resort
myresort <-c("GB","IE","GR","DK")
myresort <-sample(myresort, 20, replace = TRUE)

# Number of holidaymakers
HolidayMakers <- sample(1000, 20, replace = TRUE)

mydf <- data.frame(mycode,
                  mydate,
                  myresort,
                  HolidayMakers)

So if we take mycode as an example, I want to create a sentence like "For the code mycode, the biggest destinations are myresorts where the top days of visiting were mydate with a total of HolidayMakers"

If we assume that there are multiple lines per code. What i want is a single sentence where for example instead of having one sentence per mydate and myresort, i would like to say something like

"For the code AAABBB, the biggest destinations are GB,GR,DK,IE where the top days of visiting were 2016-10-17,2016-10-18,2016-10-19 with a total of 650"

The 650 would basically be a sum of all the holiday makers for all those countries for those days per mycode

Any anyone help?

Thank you for your time

When you say the "top days of visiting" do you perform any kind of calculations ? Top 3 days by HolydayMakers, most frequent days.. ? Also, for `AAABBB` why isn't 2016-10-20 included ? — Steven Beaupré, Oct 26 '16 at 10:55
Hi @StevenBeaupré The dataframe is the end result of the calculation. which contains top 3. For the purposes of explanation i thought it would be better just to have a general solution as days change and coupon numbers do too and not all days will be present — John Smith, Oct 26 '16 at 11:07

score 2 · Accepted Answer · answered Oct 26 '16 at 11:06

You could try:

library(dplyr)
res <- mydf %>%
  group_by(mycode) %>%
  summarise(d = toString(unique(mydate)), 
            r = toString(unique(myresort)), 
            h = sum(HolidayMakers)) %>%
  mutate(s = paste("For the code", mycode, 
                   "the biggest destinations are", r, 
                   "where the top days of visiting were", d, 
                   "with a total of", h))

Which gives:

> res$s

#[1] "For the code AAABBB the biggest destinations are GB, GR, IE, DK 
#     where the top days of visiting were 2016-10-17, 2016-10-18, 
#     2016-10-20, 2016-10-19 with a total of 6577"
#[2] "For the code AAABBD the biggest destinations are IE 
#     where the top days of visiting were 2016-10-17, 2016-10-18 
#     with a total of 1925"                                    
#[3] "For the code AAACCC the biggest destinations are IE, GR, DK 
#     where the top days of visiting were 2016-10-20, 2016-10-17, 
#     2016-10-19, 2016-10-18 with a total of 2878"

Note: Since you didn't provide any guidance as to how you intend to calculate the "top visiting days", I simply included all days. You could easily edit the above to fit your actual situation.

Thank you so much, This is exactly what i was looking for and far more elegant than what i was trying — John Smith, Oct 26 '16 at 11:08

Building Sentences from a dataframe in R

1 Answers1