2

suppose I have the following data:

A <- c(4,4,4,4,4)
B <- c(1,2,3,4,4)
C <- c(1,2,4,4,4)
D <- c(3,2,4,1,4)

filt <- c(1,1,10,8,10)


data <- as.data.frame(rbind(A,B,C,D,filt))
data <- t(data)
data <- as.data.frame(data)

> data
    A B C d filt
 V1 4 1 1 3    1
 V2 4 2 2 2    1
 V3 4 3 4 4   10
 V4 4 4 4 1    8
 V5 4 4 4 4   10

I want to get counts on the occurances of 1,2,3, & 4 for each variable, after filtering. In my attempt to achieve this below, I get Error: length(rows) == 1 is not TRUE.

  data %>%
     dplyr::filter(filt ==1) %>%
      plyr::summarize(A_count = count(A),
                      B_count = count(B))

I get the error - its because some of my columns do not contain all values 1-4. Is there a way to specify what it should look for & give 0 values if not found? I'm not sure how to do this if possible, or if there is a different work around.

Any help is VERY appreciated!!!

Ellie
  • 415
  • 7
  • 16
  • It's not the part `data %>% dplyr::filter(filt ==1)` that raises the error, so you can get rid of it and simplify the question, making it more to the point (smaller sample data, a single function call, etc.). This will increase the chances of you getting an answer. – byouness May 16 '18 at 20:12

1 Answers1

2

This was a bit of a weird one, I didn't use classical plyr, but I think this is roughly what you're looking for. I removed the filtering column , filt as to not get counts of that:

library(dplyr)

data %>% 
  filter(filt == 1) %>% 
  select(-filt) %>%
  purrr::map_df(function(a_column){
    purrr::map_int(1:4, function(num) sum(a_column == num))
    })

# A tibble: 4 x 4
      A     B     C     D
  <int> <int> <int> <int>
1     0     1     1     0
2     0     1     1     1
3     0     0     0     1
4     2     0     0     0
zack
  • 5,205
  • 1
  • 19
  • 25
  • Thank you!! I've never used purrr before, can you just explain what the function(a_column) is doing? @zack – Ellie May 16 '18 at 20:41
  • 1
    No worries. It's generally a package that fits functional programming into the tidyverse style of coding in R. The `function(a_column)...` portion of my code applies the defined function to each column in the data.frame. I then definite an anonymous function to map over the values you're concerned with (1:4), and check how often they occur in each column. I've also edited the answer with curly brackets to make it more clear, hopefully that helps. – zack May 16 '18 at 21:14