How to select only first non NA value of each group in R?

Question

I have a data frame like

mydata <- data.frame(Id=c(01,01,01,01,01,01,02,02,02,02),
                     VISIT=c("Screeing","Baseline","Baseline","Baseline","Week 9","Week 9","Baseline","Week 2",
                             "Week 2","Week 2"),
                    Score=c(1,2,4,5,78,9,5,NA,3,4))

> mydata
   Id    VISIT Score
1   1 Screeing     1
2   1 Baseline     2
3   1 Baseline     4
4   1 Baseline     5
5   1   Week 9    78
6   1   Week 9     9
7   2 Baseline     5
8   2   Week 2     NA
9   2   Week 2     3
10  2   Week 2     4

What I am trying to do is to group by Id and VISIT and choose the first non NA value of each group as

> mydata

      Id VISIT    Score 
   <dbl> <fct>    <dbl> 
 1     1 Screeing     1     
 2     1 Baseline     2     
 5     1 Week 9      78    
 7     2 Baseline     5     
 9     2 Week 2       3

This came to my mind

mydata<-mydata %>%
 group_by(Id,VISIT) %>% 
 mutate(first = dplyr::first(na.omit(Score)))

But it does not remove other rows, and it just create a new column with repeated values of first non NA of each group.

You have a typo in your code. `dplyr::first(na.omit(SCORE)))` Score should not be in Capital, but Score. The same as the name of the column. — Bloxx, Nov 30 '21 at 23:35

score 4 · Accepted Answer · answered Nov 30 '21 at 23:44

4

A dplyr alternative. Assuming that by "first" you simply mean the first row, in the order given, by group.

Note that (Id, VISIT) in your example data gives 2 groups for Baseline.

library(dplyr)

mydata %>% 
  group_by(Id, VISIT) %>% 
  filter(!is.na(Score)) %>% 
  slice(1) %>% 
  ungroup()

Result:

# A tibble: 5 x 3
     Id VISIT    Score
  <dbl> <chr>    <dbl>
1     1 Baseline     2
2     1 Screeing     1
3     1 Week 9      78
4     2 Baseline     5
5     2 Week 2       3

answered Nov 30 '21 at 23:44

neilfws

32,751
5
50
63

Any idea how to do that if there are NAs in different columns? E.g., `tibble(id = c(rep(1:3, each = 2), 4), x1 = c(1, NA, NA, 4, 5, NA, 7), x2 = c(NA, 100, 30, NA, 3, NA, NA))` So that the output should look like `tibble(id = 1:4, x1 = c(1, 4, 5, 7), x2 = c(100, 30, 3, NA))`, combining all the first rows per single column which is not NA? – moremo May 02 '23 at 16:31

Andre Wildberg · Answer 2 · 2021-11-30T23:57:23.567

3

If base R is ok try this. NAs are omitted by default.

aggregate( Score ~ Id + VISIT, mydata, function(x) x[1] )

  Id    VISIT Score
1  1 Baseline     2
2  2 Baseline     5
3  1 Screeing     1
4  2   Week 2     3
5  1   Week 9    78

edited Nov 30 '21 at 23:57

answered Nov 30 '21 at 23:40

Andre Wildberg

12,344
3
12
29

2

@Katie VISIT has not been repeated. Note that ID changes. – Onyambu Nov 30 '21 at 23:44
@Onyambu Thanks for your hint. Trying to build an example and find a fix. – Andre Wildberg Nov 30 '21 at 23:48
I deleted the comment. OP editted the question and it seems that the NA are only present in `Score` column, hence your code is correct. If NA's were present in other columns, then we would have to restrict ourselves to only the Score column. – Onyambu Nov 30 '21 at 23:49
@Onyambu Gotcha. Thanks anyways! – Andre Wildberg Nov 30 '21 at 23:51

How to select only first non NA value of each group in R?

2 Answers2