recode data function other data

Question

I want to recodate my data, function of decisions rules

An example of rules :

Data with more than 3 variables years

First rule :we corrected data if only one error :
y ≤ y+2 and y+1 < y then y+1 = y

After the previous correction, corrected with the second rule :

More than 2 identical years, keep the most frequent
Equality of frequencies : keep the higher

Maybe with an example it's little bit more clear :

ID  y1  y2  y3   y4 y5
1   6   7   6   8   
2   6   7   7   6   8
3   6   7   8   7   8
4   6   7   8   6   7
6   3   4   5   6   3

the corrected data

ID  y1  y2  y3   y4 y5
1   6   7   7   8   
2   6   7   7   7   8
3   6   7   8   8   8
4   6   7   7   7   7
6   3   4   5   6   3

If you have any idea to corrected variable function of other variable, many thank's

If I have an ID with 8 years of data, line 4 doesn't work. Do you know why ? It's problem with a lot of NA? Before code :

ID  y1 y2  y3   y4  y5  y6  y7  y8
1   6   7   6   8   NA  NA  NA  NA
2   6   7   7   6   8   NA  NA  NA
3   6   7   8   7   8   NA  NA  NA
4   6   7   8   6   7   NA  NA  NA
5   3   4   5   6   3   NA  NA  NA
6   7   7   8   8   7   8   7   8

after code

   y1  y2  y3  y4   y5  y6  y7  y8
1   6   7   7   8   NA  NA  NA  NA
2   6   7   7   7   8   NA  NA  NA
3   6   7   8   8   8   NA  NA  NA
4   6   7   8   6   7   NA  NA  NA
5   3   4   5   6   3   NA  NA  NA
6   7   7   8   8   8   8   8   8

If you have a solution otherwise I will make a select according to the number of non empty fields

row 4 the rule More than 2 identical years, keep the most frequent and row2 the same rule like raw1 (rule1) y3 ≤ y5 and y4 < y2 then y4 = y3 — Nic, Aug 09 '21 at 10:36
excuse me, I do a mistake the Rule :Equality of frequencies : keep the higher — Nic, Aug 09 '21 at 10:41
Yes you could omit y1. I will look in detail results to see the overlapping conditions — Nic, Aug 09 '21 at 11:33
I made a slight modification to my solution so that problem with row 4 of the second data set with 8 `y` values has been fixed. — Anoushiravan R, Aug 09 '21 at 16:50

Anoushiravan R · Accepted Answer · 2021-08-09T19:48:06.907

Updated Solution I made a slight modification so that it can be used for with observations containing NA values:

I used pmap_df function from purrr package that is used for row-wise operation on data frames as you can pass multiple arguments into it
c(...) captures all values of y in each row except the value of ID which I omitted by c(...)[-1]
For your y + 2 I omitted the first two values of every row since they cannot be y + 2 and since we are also checking y + 1 for every y the length of two expressions must be the same. So I only chose those y + 1 where there is a y + 2
With regard to other rules I created a vector called z which only requires to omit y1 from x and check if there are 3 unique values meaning 2 are the same then transform all others to that value

library(dplyr)
library(purrr)

df %>%
  pmap_df(~ {x <- c(...)[!is.na(c(...))][-1]
  y_2 <- x[-c(1, 2)]
  y_1 <- x[2:(length(y_2) + 1)]
  ids <- which((x[seq_along(y_2)] <= y_2) & (y_1 < x[seq_along(y_1)]))
  x[ids + 1] <- x[ids]
  x
  z <- x[-1]
  if(length(unique(z)) == 3 & sum(is.na(z)) == 0) {
    z[1:length(z)] <- z[duplicated(z)]
    c(x[1], z)
  } else {
    c(x[1], z)
  }})

# A tibble: 5 x 5
     y1    y2    y3    y4    y5
  <int> <int> <int> <int> <int>
1     6     7     7     8    NA
2     6     7     7     7     8
3     6     7     8     8     8
4     6     7     7     7     7
5     3     4     5     6     3

Second data sample

# A tibble: 6 x 8
     y1    y2    y3    y4    y5    y6    y7    y8
  <int> <int> <int> <int> <int> <int> <int> <int>
1     6     7     7     8    NA    NA    NA    NA
2     6     7     7     7     8    NA    NA    NA
3     6     7     8     8     8    NA    NA    NA
4     6     7     7     7     7    NA    NA    NA
5     3     4     5     6     3    NA    NA    NA
6     7     7     8     8     8     8     8     8

Thank you. It's possible to explain a little bit your code so I can complete it with my other rules. — Nic, Aug 09 '21 at 11:57
Sure, some edits might be need for rule 2 and 3 but so far it gives your desired results. I will add some notes now. — Anoushiravan R, Aug 09 '21 at 12:04
Check my updates please and let me know if I need to explain more. — Anoushiravan R, Aug 09 '21 at 12:11
thank you for explanations. the line with ids is to be sur we have more than 3 variables y. And many thanks for the code and the quickness. — Nic, Aug 09 '21 at 12:35

recode data function other data

1 Answers1