Counting the variables that occur most often next to each other

Question

I'm working with a large database and I want to see which variables are most often found in each other's environment (i.e. in the same row). There can be more than 20 variables in a row and I don't know all of them.

Here is an example:

  var1 <- c("x", "x", "x", "x", "x", "x", "y", "y", "y", "y")
  var2 <- c("a", "a", "b", "b", "c", "d", "e", "e", "g", "h")
  data <- data.frame(cbind(var1, var2))

The result should look like this:

|frequent contacts|n|
x&a 2
x&b 2
y&e 2
x&c 1

etc.

score 1 · Accepted Answer · answered Sep 28 '22 at 11:08

1

Are you looking for this?

with(data, table(paste(var1, var2, sep = "&")))  |>
  as.data.frame()

#   Var1 Freq
# 1  x&a    2
# 2  x&b    2
# 3  x&c    1
# 4  x&d    1
# 5  y&e    2
# 6  y&g    1
# 7  y&h    1

answered Sep 28 '22 at 11:08

s_baldur

29,441
4
36
69

This seems a great solution. Can it be made to work with any number of variables (without NAs)? – onlyjust17 Sep 28 '22 at 11:19
1

Yes, you can have more variables: `table(paste(var1, var2, var3, sep = "&"))` – s_baldur Sep 28 '22 at 11:24
I tried it with five variables and unfortunately it also reports NAs. (It would be important to omit them.) And one more question: is it possible to include, for example, not only twenty variables at a time, but also two or three or four variables? – onlyjust17 Sep 28 '22 at 11:51
Easiest to omit them first from the data – s_baldur Sep 28 '22 at 12:39

Counting the variables that occur most often next to each other

1 Answers1