1

Simple question here, perhaps a duplicate of this?

I'm trying to figure out how to count the number of times a word appears in a vector. I know I can count the number of rows a word appears in, as shown here:

temp <- tibble(idvar = 1:3, 
               response = (c("This sounds great",
                      "This is a great idea that sounds great",
                      "What a great idea")))
temp %>% count(grepl("great", response)) # lots of ways to do this line
# answer = 3

The answer in the code above is 3 since "great" appears in three rows. However, the word "great" appears 4 different times in the vector "response". How do I find that instead?

Daniel
  • 415
  • 1
  • 6
  • 16
  • Are you planning to provide a specific word and get the number you want? Or you want to get that number for every word that appears in all sentences? – AntoniosK Aug 29 '18 at 15:35
  • 1
    Just planning to provide a specific word and get the number. I can use `tidytext` unnest to split sentences into tokens and then count the words. (But if you have recommendations for a different way to do it, I'm all ears!) – Daniel Aug 29 '18 at 15:40
  • I had `tidytext` in mind as well :) – AntoniosK Aug 29 '18 at 15:42

2 Answers2

3

We could use str_count from stringr to get the number of instances having 'great' in each row and then get the sum of that count

library(tidyverse)
temp %>% 
   mutate(n = str_count(response, 'great')) %>%
   summarise(n = sum(n))
# A tibble: 1 x 1
#      n
#   <int>
#1     4

Or using regmatches/gregexpr from base R

sum(lengths(regmatches(temp$response, gregexpr('great', temp$response))))
#[1] 4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thanks for the addition of `base R` - that may actually be simpler in some of my use cases. – Daniel Aug 29 '18 at 15:46
2

Off the top of my head, this should solve your problem:

library(tidyverse)
temp$response %>% 
  str_extract_all('great') %>%
  unlist %>%
  length
Vlad C.
  • 944
  • 7
  • 12