2

I have a dataset in the following format -

Item Year
A    2018
B    2018
B    2019
A    2017
Z    2019

I select items only from 2018 using:

library(dplyr)
data2 <- data %>% filter(Year == "2018")

Now, when I get the counts of items using table(), there's a problem. The output looks like -

table(data2$Item)

A B Z
1 1 0

I don't understand why Z is included here. There are no Z items in data2. It messes up summary statistics.

Is there any way to prevent items from the original dataset being included? I tried filtering the original dataset without dplyr, but table() still returns the same output.

davey_j
  • 35
  • 4

1 Answers1

2

If the 'Item' is factor, it could be the unused levels. If we check the levels, it still exist

levels(data2$Item)
#[1] "A" "B" "Z"

Either we can use droplevels

table(droplevels(data2$Item))
#  A B 
#1 1 

Or specify the .drop in count

library(dplyr)
data %>%
    filter(Year == "2018") %>% 
    count(Item, .drop = TRUE)
#  Item n
#1    A 1
#2    B 1

data

data <- structure(list(Item = structure(c(1L, 2L, 2L, 1L, 3L), .Label = c("A", 
"B", "Z"), class = "factor"), Year = c(2018L, 2018L, 2019L, 2017L, 
2019L)), row.names = c(NA, -5L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662