2

I'm new to this platform and quite new to R, so if you see any improvements in posting my questions, please feel free to advise me.

I'm currently working on evaluating a survey using R.

Question: What is the best way to compare the values in all but the first columns of a Data Frame to a vector and count the number of occurrences of the variables in the vector?

In Excel I would use countif for my problem, in R I tried the combination of different filter, group_by and loop functions - unfortunately with no success. I searched Stack Overflow for a solution, but didn't find one so far. Maybe I looked with the wrong keywords.

Example:

df<-data.frame(c("A","B","A","B","C"),c(-1,1,3,2,3),c(-3,-1,2,-1,2))
colnames(df)<- c("Participant-Type","Item 1","Item 2")

 Participant-Type Item 1 Item 2
1                A    -1     -3
2                B     1     -1
3                A     3      2
4                B     2     -1
5                C     3      2

vec <- c(-3,-2,-1,0,1,2,3)
[1] -3 -2 -1  0  1  2  3

I want to query the Data Frame to know how many times the different Participant-Groups choose the values in the Vector for Item 1 and Item 2. The Result should look something like the following for Participant-Type A:

TypeA <- data.frame(c("Item 1","Item 2"),c(0,1), c(0,0),c(1,0),c(0,0),c(0,0),c(0,1),c(1,0), row.names=1)
colnames(TypeA)<-c("-3","-2","-1","0","1","2","3")

       -3 -2 -1 0 1 2 3
Item 1  0  0  1 0 0 0 1
Item 2  1  0  0 0 0 1 0

In the original Data-Set there are also a lot of NA values.

I know how to select only the rows from a certain Participant-Type. I also found, that table(df$Item1) gives me the count of the variables of Column Item 1 in the Data Frame, but values which where not used are not part of the result.

I assume that there is no solution to query the data frame as a whole, so that I must go Column by Column and merge the results to a DataFrame.

But so far I didn't find out how to compare the Column with the vector respectively how to get the count for the variables of the Vector per Column of the DataFrame. Counting the unique Values per Column would show any count (= 0) for the variables in the Vector which were not used by the Participants.

Any help or advise is really appreciated!

Edit:

The original DataFrame has 75 observations (Rows) of 22 variables (Columns), if that is relevant.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
thuettel
  • 165
  • 1
  • 11

2 Answers2

3

One dplyr and tidyr option could be:

df %>%
 pivot_longer(-1) %>%
 count(name, value) %>%
 complete(value = vec, nesting(name), fill = list(n = 0)) %>%
 pivot_wider(names_from = "value", values_from = "n")

  name    `-3`  `-2`  `-1`   `0`   `1`   `2`   `3`
  <chr>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Item 1     0     0     1     0     1     1     2
2 Item 2     1     0     2     0     0     2     0

Or if aggregation at Participant-Type is needed:

df %>%
 pivot_longer(-1) %>%
 count(name, `Participant-Type`, value) %>%
 complete(value = vec, nesting(name, `Participant-Type`), fill = list(n = 0)) %>%
 pivot_wider(names_from = "value", values_from = "n")

  name   `Participant-Type`  `-3`  `-2`  `-1`   `0`   `1`   `2`   `3`
  <chr>  <fct>              <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Item 1 A                      0     0     1     0     0     0     1
2 Item 1 B                      0     0     0     0     1     1     0
3 Item 1 C                      0     0     0     0     0     0     1
4 Item 2 A                      1     0     0     0     0     1     0
5 Item 2 B                      0     0     2     0     0     0     0
6 Item 2 C                      0     0     0     0     0     1     0
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • This is exactly what I wanted and works fine for the example. Unfortunately with the original Data it doesn't work. Already pivot_longer(-1) throws an error message: Fehler: Can't combine `FW03_01` and `FW03_02` . x Some attributes are incompatible. ℹ The author of the class should implement vctrs methods. ℹ See . FW03_01 and FW03_02 are column names and represent data like Item 1 and Item 2 in the example. – thuettel Jul 31 '20 at 14:10
  • Perhaps try to update to the newest version of `dplyr` and `tidyr`. – tmfmnk Jul 31 '20 at 14:14
  • It seems like there is no binary for the newest version of dplyr and tidyr and I don't know how to set it up from source =/ – thuettel Jul 31 '20 at 14:28
  • Finally I made it work. I used the method described in https://stackoverflow.com/a/53627541/14027466 to remove all attributes from the DataFrame and then your approach did work on the original data. Thanks a lot! – thuettel Jul 31 '20 at 16:04
1

In base R you could do:

grps <- interaction(df$`Participant-Type`, col(df[-1]))# Create the groupings   
use_vec <- factor(unlist(df[-1]), vector) # Unlist your data
do.call(rbind, tapply(use_vec, grps, table)) #Resulting table
    -3 -2 -1 0 1 2 3
A.1  0  0  1 0 0 0 1
B.1  0  0  0 0 1 1 0
C.1  0  0  0 0 0 0 1
A.2  1  0  0 0 0 1 0
B.2  0  0  2 0 0 0 0
C.2  0  0  0 0 0 1 0
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Works for the Example! For my real data interaction() returns a warning: "In ans * length(l) + if1 : Length of the longer object is not a multiplicate of the length of the shorter object" (translated from german). d.call() then returns an error: "error in tapply(use_vec, grps, table): Arguments must have the same length". The Original DataFrame has 75 observations (Rows) of 22 variables (Columns), if that is relevant. – thuettel Jul 31 '20 at 14:51
  • since the data is a dataframe, `length(unlist(df[-1]))` should be indeed a multiple of `length(df[,1])` – Onyambu Jul 31 '20 at 14:55
  • For the original Data length(unlist(df[-1])) returns 1575 and length(df[,1]) returns 75, so it is a multiple. Do you have an idea what is causing the warning / error? – thuettel Jul 31 '20 at 15:01
  • @thuettel could you try rerunning `interaction(df[,1], col(df[-1]))`? Ensure that you do have the correct arguments, ie `df` is indeed your whole dataset – Onyambu Jul 31 '20 at 15:03
  • I'm sorry, you were totally right. I must have forgotten to substitute df with the right name at one point. I doublechecked and now it works! Thanks a lot. Just for my understanding, what the different parts do, is this right: 1. interaction() combines the column "Participant-Type" with the Index-No of the remaining Columns – thuettel Jul 31 '20 at 15:22
  • Is this right? 1. interaction() combines the column "Participant-Type" with the Index-No of the remaining Columns for grouping 2. factor() encodes the data frame as a factor with the levels of vector 3. do.call( calls function rbind() to bind together the rows received by tapply(). Tapply() produces a table out of the factoriseåd data frame with index of the groupings – thuettel Jul 31 '20 at 15:31
  • @thuettel that is correct. Take your time to understand each line of code slowly. – Onyambu Jul 31 '20 at 15:32