How to extract counts of adjacent strings in an R dataframe?

Question

I have sentences with keywords, of which I want to identify what are the most common occurrences. Suppose I have the following dataset:

df <- data_frame(word1 = c("director", "John", "Peter", "financial", "setting", "board"),
                 word2 = c("board", "seat", "outsider", "independent", "irrelevant", "dissident"),
                 word3 = c("independent", "director", "flaw", "yes", "oversight", "John"), 
                 word4 = c("outsider", "independent", "dependence", "poorly", "material", "seat"),
                 n = c(6, 3, 2, 2, 1, 1))

I want to identify what words appear (columns "word1-4") and how often (column "n") whenever the row contains the keyword "director". This analysis should yield something similar to:

director_analysis <- data_frame(test1 = c("director", "director", "director", "director", "director", "director"),
                   test2 = c("board", "independent", "outsider", "John", "seat", "independent"),
                   n = c(6, 6, 6, 3, 3, 3))

This last dataframe informs for each row that has a director instance (in whichever column), what were the strings in the adjacent columns and their "n" value.

What is the most concise way of doing this?

score 0 · Answer 1 · answered Aug 17 '22 at 18:09

Here's a tidyverse method:

library(tidyverse)

df %>% 
  filter(apply(., 1, function(x) "director" %in% x)) %>%
  pivot_longer(-n, values_to = "test2") %>%
  select(test2, n) %>%
  mutate(test1 = "director", .before = 1) %>%
  filter(test2 != "director")
#> # A tibble: 6 x 3
#>   test1    test2           n
#>   <chr>    <chr>       <dbl>
#> 1 director board           6
#> 2 director independent     6
#> 3 director outsider        6
#> 4 director John            3
#> 5 director seat            3
#> 6 director independent     3

^{Created on 2022-08-17 by the reprex package (v2.0.1)}

How to extract counts of adjacent strings in an R dataframe?

1 Answers1