1

I have a data recoding puzzle. Here is how my sample data looks like:

df <- data.frame(
  id = c(1,1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,3),
  scores = c(0,1,1,0,0,-1,-1, 0,0,1,-1,-1,-1, 0,1,0,1,1,0,1),
  position = c(1,2,3,4,5,6,7, 1,2,3,4,5,6, 1,2,3,4,5,6,7),
  cat = c(1,1,1,1,1,0,0, 1,1,1,0,0,0, 1,1,1,1,1,1,1))

   id scores position cat
1   1      0        1   1
2   1      1        2   1
3   1      1        3   1
4   1      0        4   1
5   1      0        5   1
6   1     -1        6   0
7   1     -1        7   0
8   2      0        1   1
9   2      0        2   1
10  2      1        3   1
11  2     -1        4   0
12  2     -1        5   0
13  2     -1        6   0
14  3      0        1   1
15  3      1        2   1
16  3      0        3   1
17  3      1        4   1
18  3      1        5   1
19  3      0        6   1
20  3      1        7   1

There are three ids in the dataset and rows were ordered by a positon variable. For each id, the first row after the scores start by -1 needs to be 0, and the cat variable needs to be 1. For example, for id=1, the first row would be 6th position and in that row, score should be 0 and the cat variable needs to 1. For those ids do not have scores=-1, I keep them as they are.

The desired output should look like below:

   id scores position cat
1   1      0        1   1
2   1      1        2   1
3   1      1        3   1
4   1      0        4   1
5   1      0        5   1
6   1      0        6   1
7   1     -1        7   0
8   2      0        1   1
9   2      0        2   1
10  2      1        3   1
11  2      0        4   1
12  2     -1        5   0
13  2     -1        6   0
14  3      0        1   1
15  3      1        2   1
16  3      0        3   1
17  3      1        4   1
18  3      1        5   1
19  3      0        6   1
20  3      1        7   1

Any recommendations?? Thanks

amisos55
  • 1,913
  • 1
  • 10
  • 21

4 Answers4

1

You could do something along these lines using the dplyr package:

library(dplyr)

df = mutate(df, cat = ifelse(scores == -1, 1, cat),
                scores = ifelse(scores == -1, 0, scores))

Using the mutate() function, I am re-assigning the values for the scores and cat fields according to ifelse() conditional statements. For scores, if the score is -1, the value is replaced by 0, otherwise it keeps the score as is. For cat, it also checks if scores is equal to -1, but would assign a value of 1 when the condition is met, or the already existing value of cat when the condition is not met.

EDIT

After our discussion in the comments, I think something along these lines should be helpful (you may have to modify the logic since I don't exactly follow what the desired output is here):

for(i in 1:nrow(df)){
    # Check if score is -1
    if(df[i, 'scores'] == -1){
        # Update values for the next row
        df[i+1, 'scores'] <- 0
        df[i+1, 'cat'] <- 1
    }
}

Sorry that I don't really follow the desired output, hopefully this is helpful in getting you to your answer!

Ricky
  • 1,005
  • 1
  • 12
  • 15
  • Hi @Ricky, thanks for your response. your solution recode all `-1` scores to `0` and did not change the `cat` variable. Are you getting the desired output? On my end, I could not. Should be aware of any other library? – amisos55 Feb 21 '21 at 17:25
  • @amisos55 whoops, sorry about that. This was because I was overwriting the `scores` variable before `cat`. I adjusted the answer to change the order, and the output now looks correct to me, let me know if it's still not! – Ricky Feb 21 '21 at 17:32
  • `6th` row is correctly modified. But it did not keep `7th` row, for example. The `7th` row should have scores `-1` and `cat` `0`. It should stay the same. It should only modify the `6th` row, which the first row after that has the first score `-1`. – amisos55 Feb 21 '21 at 17:37
  • 1
    Sorry, I misunderstood your question. If you need to offset things relatively to each other this way, I would recommend a for loop approach instead, give me a minute to update my answer – Ricky Feb 21 '21 at 17:48
  • I updated the answer with a for loop option. Might not be what you need yet, but hopefully you should be able to adapt it for what you need, sorry I don't entirely follow – Ricky Feb 21 '21 at 17:56
  • this solution modifies the second score `-1` instead of the first one for each `id` but thanks for your input and time, really appreciated :) – amisos55 Feb 21 '21 at 18:01
1

This may be what you are after

df %>% 
group_by(id) %>%
mutate(i = which(scores == -1)[1]) %>% # find the first row == -1
mutate(scores = case_when(position == i & scores !=0 ~ 0, T ~ scores), # update the score using position & i
cat = ifelse(scores == -1,0,1)) %>% # then update cat
select (-i) # remove I

e.matt
  • 836
  • 1
  • 5
  • 12
  • Hi @e.matt, thanks for your response. this worked well for the first case but your solution recoded the last row for the second case. – amisos55 Feb 21 '21 at 18:23
  • which id and values are incorrect, pretty sure this matches your desired output – e.matt Feb 21 '21 at 18:27
  • in row 13, this procedure recodes the score as `0` but it should recode row 11's `score` as 0, which is the first condition of case 2. – amisos55 Feb 21 '21 at 18:30
1

After trying a few things and getting ideas from @Ricky and @e.matt, I came up with a solution.

df %>%
  filter(scores == -1) %>%                         # keep cases where var = 1
  distinct(id, .keep_all = T) %>%           # keep distinct cases based on group
  mutate(first = 1) %>%                        # create first column
  right_join(df, by=c("id","scores","position","cat")) %>% # join back original dataset
  mutate(first = coalesce(first, 0)) %>%       # replace NAs with 0
mutate(scores = case_when(
  first == 1 ~ 0,
  TRUE~scores)) %>%
mutate(cat = case_when(
    first == 1 ~ 1,
    TRUE~cat))

This provides my desired output.

   id scores position cat first
1   1      0        1   1     0
2   1      1        2   1     0
3   1      1        3   1     0
4   1      0        4   1     0
5   1      0        5   1     0
6   1      0        6   1     1
7   1     -1        7   0     0
8   2      0        1   1     0
9   2      0        2   1     0
10  2      1        3   1     0
11  2      0        4   1     1
12  2     -1        5   0     0
13  2     -1        6   0     0
14  3      0        1   1     0
15  3      1        2   1     0
16  3      0        3   1     0
17  3      1        4   1     0
18  3      1        5   1     0
19  3      0        6   1     0
20  3      1        7   1     0
amisos55
  • 1,913
  • 1
  • 10
  • 21
1

here is a data.table oneliner

library( data.table )
setDT(df)
df[ df[, .(cumsum( scores == -1 ) == 1), by = .(id)]$V1, `:=`( scores = 0, cat = 1) ]

#     id scores position cat
#  1:  1      0        1   1
#  2:  1      1        2   1
#  3:  1      1        3   1
#  4:  1      0        4   1
#  5:  1      0        5   1
#  6:  1      0        6   1
#  7:  1     -1        7   0
#  8:  2      0        1   1
#  9:  2      0        2   1
# 10:  2      1        3   1
# 11:  2      0        4   1
# 12:  2     -1        5   0
# 13:  2     -1        6   0
# 14:  3      0        1   1
# 15:  3      1        2   1
# 16:  3      0        3   1
# 17:  3      1        4   1
# 18:  3      1        5   1
# 19:  3      0        6   1
# 20:  3      1        7   1
Wimpel
  • 26,031
  • 1
  • 20
  • 37