Filling in missing NA based upon factor level

Question

I have been trying to convert my data from a vertical configuration to a horizontal configuratoin. Using spread (), I have managed to get it in the following structure:

plot, x1, x2, x3, x4, x5, x6, x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA
7 rows

However, obviously I want to fill in the NAs with the information from the other rows. Then my data would look like

plot , x1, x2, x3, x4, x5, x6, x7
1   0.06011071  0.09756118  -0.1422974NA    0.1809486   0.143701     0.1584451      0.1151581

I have tried multiple approaches but thus far unsuccesfull. Does anyone know how I could achieve this?

This issue was likely preventable by adjusting your call to 'spread()' or 'pivot_wider' — GuedesBF, Nov 22 '21 at 15:56

score 1 · Accepted Answer · answered Nov 22 '21 at 13:44

This is a solution using dplyr. It assumes that you want to keep the first non-NA value from the top for each plot.

library(dplyr)

dat2 <- dat %>%
  group_by(plot) %>%
  summarize(
    across(
      everything(),
      .fns = ~first(.x[!is.na(.x)])
    )
  ) %>%
  ungroup()
dat2
# # A tibble: 1 x 8
#    plot     x1     x2     x3    x4    x5    x6    x7
#   <int>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
# 1     1 0.0601 0.0976 -0.142 0.181 0.144 0.158 0.115

DATA

dat <- read.table(text = "plot  x1  x2  x3  x4  x5  x6  x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA",
                  header = TRUE)

Thank you, this just did actually what I wanted. The code you over is way above my proficiency, but I will try to understand it. It is the first time I have used the community and absolutely amazed by the help everyone is offering. — McPauwl, Nov 22 '21 at 14:13
@McPauwl You can let me know if you have questions for the codes. Glad to help. — www, Nov 22 '21 at 14:21

score 1 · Answer 2 · answered Nov 22 '21 at 13:44

Or:

library(tidyverse)

d <- read.table(text = "plot, x1, x2, x3, x4, x5, x6, x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA", header = TRUE)

d %>%
  summarise(across(starts_with("x"), sum, na.rm = TRUE))

#x1.        x2.        x3.       x4.      x5.       x6.        x7
#1 0.06011071 0.09756118 -0.1422974 0.1809486 0.143701 0.1584451 0.1151581

score 1 · Answer 3 · answered Nov 22 '21 at 15:06

1

We can also use na.omit with dplyr:

df %>% group_by(plot) %>% summarise(across(matches('x\\d+'), na.omit))

answered Nov 22 '21 at 15:06

GuedesBF

8,409
5
19
37

score 0 · Answer 4 · answered Nov 22 '21 at 13:43

0

something like this?

df <- data.frame(x1 = c(1,NA,NA,NA,NA),
                 x2 = c(NA,2,NA,NA,NA),
                 x3 = c(NA,NA,3,NA,NA),
                 x4 = c(NA,NA,NA,4,NA),
                 x5 = c(NA,NA,NA,NA,5))
> df
  x1 x2 x3 x4 x5
1  1 NA NA NA NA
2 NA  2 NA NA NA
3 NA NA  3 NA NA
4 NA NA NA  4 NA
5 NA NA NA NA  5

for (i in 1:ncol(df)){
  df[,i][is.na(df[,i])] <- df[,i][!is.na(df[,i])]
}

> df
  x1 x2 x3 x4 x5
1  1  2  3  4  5
2  1  2  3  4  5
3  1  2  3  4  5
4  1  2  3  4  5
5  1  2  3  4  5

answered Nov 22 '21 at 13:43

BillyBouw

314
2
10

OP's expected output has only one row. You have several, so I guess you are close but not what OP wants. – www Nov 22 '21 at 13:44
Although OP did not provide a reproducible code to create the example data frame, he or she did provide an example. I have spent some time to create a reproducible code to create OP's example. If you want, you can take my example for your answer. There is no need to invent another example that may not be what OP needs. – www Nov 22 '21 at 13:49
@www is right, I wanted only one line output. Thank you for your time anyway, I am amazed by the help everyone is offering on so short notice. – McPauwl Nov 22 '21 at 14:16

jpdugo17 · Answer 5 · 2021-11-22T16:04:59.453

Maybe:

library(tidyverse)

df <- read.table(text = "plot, x1, x2, x3, x4, x5, x6, x7
1   0.06011071  NA  NA  NA  NA  NA  NA
1   NA  0.09756118  NA  NA  NA  NA  NA
1   NA  NA  NA  NA  0.143701    NA  NA
1   NA  NA  NA  NA  NA  0.1584451   NA
1   NA  NA  NA  0.1809486   NA  NA  NA
1   NA  NA  NA  NA  NA  NA  0.1151581
1   NA  NA  -0.1422974  NA  NA  NA  NA", header = TRUE)


#if every column has only one value and the rest are NA's

tibble(plot = 1, map_dfc(df[, -1], ~.[!is.na(.)]))
#> # A tibble: 1 × 8
#>    plot    x1.    x2.    x3.   x4.   x5.   x6.    x7
#>   <dbl>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 0.0601 0.0976 -0.142 0.181 0.144 0.158 0.115

#or gathering and spreading approach

pivot_longer(df,cols = matches('^x\\d'), names_to = 'x') %>% 
  filter(!is.na(value)) %>% 
  arrange(x) %>% 
  pivot_wider(names_from = 'x', values_from = 'value')
#> # A tibble: 1 × 8
#>   plot.    x1.    x2.    x3.   x4.   x5.   x6.    x7
#>   <int>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1     1 0.0601 0.0976 -0.142 0.181 0.144 0.158 0.115

^{Created on 2021-11-22 by the reprex package (v2.0.1)}

Filling in missing NA based upon factor level

5 Answers5