Why does my lead-function generate NA values?

Question

I have had to construct a reference table to keep track of what amount of academic credits our students should have taken given the current date. I have one row per admissionround and course.

I want to code a finished-variable that takes the value 1 for the last course for each admissionround and 0 for every other value (this will let me deal with students who should have finished their programmes already).

I write

ekon_program<-ekon_program%>%mutate(finished=ifelse(lead(kull)=kull,0,1))

Where kull is my admissionround variable, which will change by +1 in the row directly succeding the last course of the current admissionround. Strangely enough, the last course for each admissionround is now coded as "NA", but all other values are coded as 0.

I could easily correct this by converting all NA-values to 1, but why is this happening in the first place?

Excerpt of data:

ekon_program <- structure(list(sd = structure(c(17042, 17042, 17042, 17042, 17042, 
17042, 17042, 17042, 17042, 17042, 17042, 17042, 17042, 17042, 
17406, 17406, 17406, 17406, 17406, 17406), class = "Date"), points_ekon = c(15, 
15, 15, 15, 7.5, 7.5, 15, 7.5, 7.5, 15, 15, 15, 30, 0, 15, 15, 
15, 15, 7.5, 7.5), summer_break_ekon = c(0, 0, 0, 0, 1, 1, 1, 
1, 1, 1, 2, 2, 2, 2, 0, 0, 0, 0, 1, 1), weeks_course = c(10, 
10, 10, 10, 5, 5, 10, 5, 5, 10, 10, 10, 20, 0, 10, 10, 10, 10, 
5, 5), points_expected = c(0, 15, 30, 45, 60, 67.5, 75, 90, 97.5, 
105, 120, 135, 150, 180, 0, 15, 30, 45, 60, 67.5), order = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 1L, 
2L, 3L, 4L, 5L, 6L), starttermin = c(1, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0), kull = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2), start_date = structure(c(17041, 
17041, 17041, 17041, 17041, 17041, 17041, 17041, 17041, 17041, 
17041, 17041, 17041, 17041, 17405, 17405, 17405, 17405, 17405, 
17405), class = "Date"), start_date_points = structure(c(17041, 
17132, 17202, 17272, 17342, 17461, 17496, 17566, 17601, 17636, 
17706, 17860, 17930, 18070, 17405, 17496, 17566, 17636, 17706, 
17825), class = "Date"), end_date_points = structure(c(17131, 
17201, 17271, 17341, 17460, 17495, 17565, 17600, 17635, 17705, 
17859, 17929, 18069, 18069, 17495, 17565, 17635, 17705, 17824, 
17859), class = "Date"), finished_date = structure(c(18070, 18070, 
18070, 18070, 18070, 18070, 18070, 18070, 18070, 18070, 18070, 
18070, 18070, 18070, 18434, 18434, 18434, 18434, 18434, 18434
), class = "Date")), class = c("grouped_df", "tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -20L), groups = structure(list(
    start_date = structure(c(17041, 17405), class = "Date"), 
    .rows = list(1:14, 15:20)), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE))

score 2 · Accepted Answer · answered Dec 02 '19 at 12:57

2

One issue is that = is not ==, secondly, lead by default creates a NA at the end, if we need to change, change the default. Also, we don't need ifelse to coerce, it can be done with as.integer

library(dplyr)
ekon_program %>%
   mutate(finished = as.integer(lead(kull, default = last(kull)) != kull))

answered Dec 02 '19 at 12:57

akrun

874,273
37
540
662

If I interpret this answer correctly the lead function will (for some unknown reason) consider the last value in a group (admissionround in this case) as NA unless otherwise specified? I don't understand the meaning of this rule but it "does" seem to work when I set default to 0 instead. – Magnus Dec 02 '19 at 13:14
@Magnus `lead(1:5)#[1] 2 3 4 5 NA` The last value is by `default = NA` I changed the `NA` to another value because `==` returns `NA` when there is a comparison with `NA` i.e. `NA==2` – akrun Dec 02 '19 at 13:16
Sure, if the lead value does not exist (such as the lead value to 5 in 1:5) I could understand that but....now we really "have" lead values in the very next group? I'm marking this as the correct answer, thank you for your time! – Magnus Dec 02 '19 at 13:20
1

@Magnus If you need this to be used within each group, you need a `group_by` before applying tat. The lead will remove the first observation and move the values while the `lag` does the reverse – akrun Dec 02 '19 at 13:22

Why does my lead-function generate NA values?

1 Answers1