I've a collection of text messages scraped from a forum into a data frame. Here's a reproducible example:
example.df <- data.frame(author=c("Mikey", "Donald", "Mikey", "Daisy", "Minnie", "Daisy"),
message=c("Hello World! Mikey Mouse",
"Quack Quack! Donald Duck",
"I was born in 1928. Mikey Mouse",
"Quack Quack! Daisy Duck",
"The quick fox jump over Minnie Mouse",
"Quack Quack! Daisy Duck"))
My idea is to find the longest common suffix found on every message for the same author for all those who have written more than on message. For all others, well, I'll find a regex way that gracefully degradates.
I found the bioconductor package RLibstree that looks promising, thanks to the function getLongestCommonSubstring, but I don't know how to group the function to all the messages from the same author.