0

I'm trying to analyse qualitative responses to a survey using tidy text mining in R. I have tokenised my data via sentences. In some cases, I have found that in one sentence, participants have reported multiple behaviours that I want to analyse separately (e.g. "apples and oranges"). Is it possible to recode the initial data to separate them during the tokenisation stage? I have tried separating the data by adding a full stop between the behaviours using the following code but it has not worked:

data <- data %>% mutate(behaviour = recode(column, "apples and oranges" = "apples. Oranges")) tidy_text_data <- data %>% unnest_tokens(output = "sentences", input = behaviour, token = "sentences")

Any suggestions?

  • 1
    It would be easier to help if you create a small reproducible example along with expected output. Read about [how to give a reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Sep 07 '21 at 01:05
  • You might want to try [an approach like this](https://stackoverflow.com/questions/57303849/how-to-include-select-2-word-phrases-as-tokens-in-tidytext/57341620#57341620) with `str_replace_all()`. – Julia Silge Sep 08 '21 at 22:02

0 Answers0