1

I have a dataset with a lot of NA but on the same row's are data which are important so I cannot remove the rows. I also cannot replace the NA with a 0 because i want to get an average. What can i do?

example of R script:

data_week1 <- data_calc %>%
  dplyr::group_by(room, pen, block, treatment)%>%
  dplyr::filter(week == '1')%>%
  dplyr::summarise(adfi_week1 = sum(feedintake/7),
                   ph = mean(ph), 
                   ds = mean(ds),
                   fecalscore = mean(fecalscore))

ADFI is measured every day while pH is measured once a week

Update: I tried the code below, but it's not working:

data_calc <- data_ruw %>% 
               dplyr::group_by(datum, afdeling, hok, blok, behandeling) %>%
               dplyr::mutate(voerinname_ochtend_kgbrij = voerinname_ochtend_kgbrij*3,
                             voerinname_middag_kgbrij = voerinname_middag_kgbrij*3,
                             voeropname_dag_brij = voerinname_ochtend_kgbrij + voerinname_middag_kgbrij - voer_uit_ochtend - voer_uit_middag, na.rm = TRUE,
                             voeropname_kgdroogvoer = voeropname_dag_brij * 0.25/0.88, na.rm = TRUE,
                             mest_ds = mest_gewicht_in - mest_gewicht_uit - 5.6 - 7, na.rm = TRUE)
Mark
  • 7,785
  • 2
  • 14
  • 34
Kristel
  • 21
  • 1
  • I'm not clear on why you couldn't add `na.rm = TRUE` in your `sum()` and `mean()` calls. – Phil Jul 04 '23 at 16:02
  • not working: data_calc <- data_ruw %>% dplyr::group_by(datum, afdeling, hok, blok, behandeling)%>% dplyr::mutate(voerinname_ochtend_kgbrij = voerinname_ochtend_kgbrij*3)%>% dplyr::mutate(voerinname_middag_kgbrij = voerinname_middag_kgbrij*3)%>% dplyr::mutate(voeropname_dag_brij = voerinname_ochtend_kgbrij + voerinname_middag_kgbrij - voer_uit_ochtend - voer_uit_middag, na.rm = TRUE)%>% dplyr::mutate(voeropname_kgdroogvoer = voeropname_dag_brij * 0.25/0.88, na.rm = TRUE) %>% dplyr::mutate(mest_ds = mest_gewicht_in - mest_gewicht_uit - 5.6 - 7, na.rm = TRUE) – Kristel Jul 04 '23 at 16:12
  • 1
    The `na.rm` arguments need to be within `sum()` and `mean()`. e.g. `sum(feedintake/7, na.rm = TRUE)` – Phil Jul 04 '23 at 16:23

1 Answers1

0

Here's a simple example to illustrate:

df <- tibble(
    x = 1:5,
    y = c(1, NA, 2, NA, 3)
)

df |>
  rowwise() |>
  mutate(mean = mean(c(x, y), na.rm = TRUE),
         sum = sum(c(x, y), na.rm = TRUE)) |>
  ungroup()

# Output:
# A tibble: 5 × 4
      x     y  mean   sum
  <int> <dbl> <dbl> <dbl>
1     1     1   1       2
2     2    NA   2       2
3     3     2   2.5     5
4     4    NA   4       4
5     5     3   4       8

In the example, I'm finding the mean and sum of x and y for each row, but you could alter this to group by another column or not group by anything to get the mean and sum for the entirety of x and y.

Mark
  • 7,785
  • 2
  • 14
  • 34