0

I am new to R, and I have been using Stata for multiple years. I am trying to create a loop and a set of conditions within the loop that will do some recoding of data. I am, truly, lost, so any suggestions are welcomed. Please, help!

Specifically, I am trying to use conditions so R will perform actions and I am using dplyr. As you will notice from the code below. I am calling my libraries, simulating minimal data, and trying to get through a for loop where R examines multiple conditions. To initiate the conditions, I want R to capture the minimum and maximum, then, go through the conditions, and when TRUE, perform the recode procedure that creates a new variable in the data frame. Also, code is nice, but understanding is better, please help me understand what I am doing wrong with this--I am new to R.

library(tidyverse)
library(dplyr)

#my simulated data

v1 <- sample(c(0,1), 200, replace = TRUE)
v2 <- sample(c(0,1), 200, replace = TRUE)
v3 <- sample(c(1:7), 200, replace = TRUE)
v4 <- sample(c(1:5), 200, replace = TRUE)
v5 <- sample(c(1:10), 200, replace = TRUE)
v6 <- sample(c(0:200), 200, replace = TRUE)
dat<-data.frame(v1, v2, v3, v4, v5, v6)

for(m in c("v1", "v2", "v3", "v4", "v5", "v6")){
  
z <-get(m)
j <-min(z)
k <-max(z)

if (j == 0 & k == 1) {
dat <- dat %>% mutate(across(everything()), ifelse (z =="1", 1, 2))
} else if (j>= 1 & k <=5 {
dat <- dat %>% mutate(v4_1 = recode(v4, "1" = 2, "2" = 2, "3" = 2, "4" = 2, "5" = 2)) 
}
}
View(dat)  

To give some perspective of how I intend this to look. Note: I haven't set a seed so these numbers are random, but I present them to provide some perspective as to how I intend this to work. After going through this code, three new variables should be created in the data frame--dat. It should look like the following:

v1 v2 v3 v4 v5 v6 v1_1 v2_1 v4_1
1  1  1  5  10  100 1   1     2
0  1  2  3  9   80  2   1     2
1  0  3  2  8   70  1   2     2
0  1  3  1  7   20  2   1     2

Please help! I am lost with this, and am open to any and all suggestions. This should get me on my way, but please help me understand with some explanation of potential code so I may be able to modify the code to be able to handle other situations.

Thank you in advance.

George
  • 95
  • 7
  • akrun, thank you. I tried with the problem fixed, but I still get two other issues. It is not giving my v1 recoded in its new column. In other words, it only provides recodes for v2 and v4. How do I get v1 in a new column with the proper name? Also, the v2 column is not named v2_1 but named "ifelse (z =="1", 1, 2)". How do I get it to be named v2_1? I understand now that v4 will always work because I invoked it within the mutate statement, but I wonder how it will work if it is not invoked? Thank you – George Oct 08 '22 at 18:52
  • akrun, this is the problem that I am having. I do not know how to create v1_1 or v2_1. Oh, I see what you mean by I am looping across all of the columns and updating those columns. How would I change this? Thank you – George Oct 08 '22 at 18:54

1 Answers1

0

The code in OP's post is looping over all the column names, then it does the loop again in across while assigning (<-) the data in each iteration, thus the columns are updated multiple times. Instead, we just need to loop once in across, apply a function with if/else already created or a lambda function created on the fly. The new columns are created with .names by appending the original column names (.col) with _1

library(dplyr)
dat1 <- dat %>%
   mutate(across(c(v1, v2, v4), 
    ~ if(min(.x) == 0 && max(.x)== 1) {
       ifelse(.x == 1, 1, 2)
    } else {
    recode(.x, "1" = 2, "2" = 2, "3" = 2, "4" = 2, "5" = 2)
    }, .names = "{.col}_1")) 

-output

> head(dat1)
   v1 v2 v3 v4 v5  v6 v1_1 v2_1 v4_1
1  1  1  7  4 10 147    1    1    2
2  0  1  4  1  9 184    2    1    2
3  1  0  2  4  6 135    1    2    2
4  0  0  5  2 10 194    2    2    2
5  1  1  1  5  1  72    1    1    2
6  0  0  3  2  9 181    2    2    2
akrun
  • 874,273
  • 37
  • 540
  • 662
  • akrun, I tried it, and it is close. I will use the first 4 lines of your output to further explain my intention. line 1 for v1 = 0 so v1_1 = 2; line 2 v1=0 so v1_1 = 2; line 3 v1 = 0 so v1_1=2; line 4 v1 = 1 so v1_1 = 1. The same should happen for v2 and v2_1. In other words, my intention is to, truly, recode 0 to 2 and 1 to 1 for v1 and v2 in new variables v1_1 and v2_1. Then, I would tackle v4. – George Oct 08 '22 at 19:03
  • If I am understanding you correctly, the code is looping over all the columns at once. How would I coerce z to loop over each column 1:1 instead of all of the columns at once? In other words, take one variable from the list, work through the recode, create a new column with the new name and then do the recode? The code you provided is very very close to what I am looking for it to do, but it isn't really exact. – George Oct 08 '22 at 19:16
  • How would the statement look? How would that look within the conditional statements? I would have to take out z wouldn't I? I am trying make this to be something general to assist with multiple data frames I will be dealing with. – George Oct 08 '22 at 19:26