0

I would like to remove certain people from my dataset if a condition is fulfilled. I have panel data and ideally would like to count the number of completions for every person and delete them from my dataset if a person has never completed anything.

people <- c(1,1,1,2,2,3,3,4,4,5,5)
activity <- c(1,1,1,2,2,3,4,5,5,6,6)
completion <- c(0,0,1,0,1,1,1,0,0,0,1)

for completion 0 indicates no completion and 1 indicates completion.

So, in this case i need to detect that person number 4 has never completed activity 5, and therefore will be removed from the dataset completely all rows. However, this only gives me an idea about activitys never completed, even though some activitys eventually will be completed. Then i would remove them like that. I have tried running the ifelse condition:

df$nevercompleted <- ifelse(df$completion == 0)
df<-subset(df,completion == 0)
Lisam
  • 31
  • 7
  • Something like this would get you the completed `df[which(df$completion == 0),]`. Your subset should work just the same. Could you give your desired output? – at80 Jun 06 '20 at 15:07
  • hey, i would like to delete those that were never completed, but since i have panel data it is crucial to find the ones that were never completed by the user – Lisam Jun 06 '20 at 15:13
  • Do you mean delete *all rows* for an user if the user never completed *any* activities? Or do you want to delete only rows for acitivites that were never completed? Could you add an appropriate desired output (one that covers corner cases)? – dario Jun 06 '20 at 15:18
  • @dario i have edited my post. if the person never completed any activities all rows for that person should be deleted – Lisam Jun 06 '20 at 15:21

2 Answers2

0

A dplyr solution.

## Create the dataframe
df <- tibble(
    people = c(1,1,1,2,2,3,3,4,4,5,5),
    activity = c(1,1,1,2,2,3,4,5,5,6,6),
    completion = c(0,0,1,0,1,1,1,0,0,0,1))

df %>% 
## Group observations by people
group_by(people) %>% 
## Create total completions per individual
mutate(tot_completion = sum(completion)) %>% 
## Keep only people with strictly positive number of completions
filter(tot_completion > 0)
Roland
  • 377
  • 4
  • 14
  • Wow, this looks good. how would i delete the ones that were never completed from the df? Or are the removed through that last line of code? Because nothing in my df changes, which is quite good – Lisam Jun 06 '20 at 15:13
  • The current code removes people that have `completion==0` for all their observations. Is that what you mean by "never completed"? In which case, this should work. In your example, the script removes all observations of `people==4` and none of the other people. – Roland Jun 06 '20 at 15:16
  • perfect, exactly what i needed and wanted, i was just insecure because in my original df nothing changes, but that is good so all my users completed everything at some point. Huge thank you! This comes in handy – Lisam Jun 06 '20 at 15:18
  • @Lisam: Just fyi: Your statement *that is good so all my users completed everything at some point.* is not tested with the code above! it tests if the users completed *anything" (i.e if at least one acitvity was completed) – dario Jun 06 '20 at 15:21
  • @Lisam To furher illustrate my point made in the comment above: 'user' 3 in your example data has entries for activity 3 and 4. With the code abvoe as it is now, this user would be kept if even if only one acitvity was completed (i.e. user did not complete everything) – dario Jun 06 '20 at 15:27
  • @dario you are right, thank you for bringing up that point. But that is enough to keep them in my df. – Lisam Jun 06 '20 at 15:28
  • 1
    @Lisam in that case you could even use `df[df$people %in% df[df$completion != 0, ]$people, ]` – dario Jun 06 '20 at 15:31
0

An option with base R

df[with(df, ave(completion, people, FUN = sum)) > 0,]
akrun
  • 874,273
  • 37
  • 540
  • 662