0

I want to research group A's effect on B regarding certain dependent variables I dubbed "target_n". Due to the way in which the data was generated I have "layers" of information in my dataset that are ordered by group. That means, in rows for which Group=="B" I have information on B's values on "target_n" and for rows where Group=="A", I have information on the A's values on "X_n". Group "C" is basically a "other"-category but I would need to have them in the same row as A and B as well to make sure that A's effects are on B and not on C. The following should add some clarity:

My data (df) are structured like this:

df<-data.frame(
"Date"=c(1990-03,2000-01,2010-09,1990-03,2000-01,2010-09,1990-03,2000-01,2010-09),
"Group"=c("A","A","A","B","B","B","C","C","C"),
"X_1_A"=c(9,4,7,NA,NA,NA,NA,NA,NA),
"X_2_A"=c(1,2,6,NA,NA,NA,NA,NA,NA),
"target_1_B"=c(NA,NA,NA,0,2,9,NA,NA,NA),
"target_2_B"=c(NA,NA,NA,9,2,1,NA,NA,NA),
"target_1_C"=c(NA,NA,NA,NA,NA,NA,5,3,1),
"target_2_C"=c(NA,NA,NA,NA,NA,NA,1,9,2)
)

What I want is to compute new variables both for group "A" and group "C" so that everything falls within the same rows. If I were to do that manually,I would take A's column "X_1" score at date "1990-03" and assign it to B's place in A's column for the same date.

So in the end, my data would look like this:

df<-data.frame(
"Date"=c(1990,2000,2010,1990,2000,2010,1990,2000,2010),
"Group"=c("A","A","A","B","B","B","C","C","C"),
"X_1_A"=c(9,4,7,NA,NA,NA,NA,NA,NA),
"X_2_A"=c(1,2,6,NA,NA,NA,NA,NA,NA),
"target_1_B"=c(NA,NA,NA,0,2,9,NA,NA,NA),
"target_2_B"=c(NA,NA,NA,9,2,1,NA,NA,NA),
"target_1_C"=c(NA,NA,NA,NA,NA,NA,5,3,1),
"target_2_C"=c(NA,NA,NA,NA,NA,NA,1,9,2),
"NEW_X_1_A"=c(NA,NA,NA,9,4,7,NA,NA,NA),
"NEW_X_2_A"=c(NA,NA,NA,1,2,6,NA,NA,NA),
"NEW_target_1_C"=c(NA,NA,NA,5,3,1,NA,NA,NA),
"NEW_target_2_C"=c(NA,NA,NA,1,9,2,NA,NA,NA)
)

(I have a number of these "X_"s and exactly the same number of "target_" variables. I also do not just have this group of A, B and C, but A1,A2,A3,C1,C2,C3 and even more Bs. For each set of A1,B1,C1 I also have a "set" of dates that does not match anothers "set". But that would be less of a problem as I could simply slice my dataset horizontally into sets, do the trick for all of them separately and merge them again.)

But how would I bring A's and C's values into B's rows based on Group=="B" and based on date?

Dr. Fabian Habersack
  • 1,111
  • 12
  • 30
  • Having trouble following this question. Your desired result is additional columns on your old dataset? if not can you replace the old columns instead of adding columns that say NEW? And you mention target values and target variables like they are separate things... but I only see one in the data. And the moving down of values from A (9,4,7) to B (0,2,9) that you mention... I don't understand the numerical change. Can you make some edits to make your question clearer? – Evan Friedland Sep 05 '18 at 01:55
  • Sorry, those are valid points. I edited my question and hope to have added some clarity to it. Let me know if I succeeded. In asking the question yesterday I was kind of sharing my thought process with you guys. – Dr. Fabian Habersack Sep 05 '18 at 06:40

1 Answers1

1

Using data.table you can try

df<-data.frame(
  "Date"=c("1990-03","2000-01","2010-09","1990-03","2000-01","2010-09","1990-03","2000-01","2010-09"),
  "Group"=c("A","A","A","B","B","B","C","C","C"),
  "X1_A"=c(9,4,7,NA,NA,NA,NA,NA,NA),
  "X2_A"=c(1,2,6,NA,NA,NA,NA,NA,NA),
  "target_value_1_B"=c(NA,NA,NA,0,2,9,NA,NA,NA),
  "target_value_2_B"=c(NA,NA,NA,9,2,1,NA,NA,NA),
  "target_value_1_C"=c(NA,NA,NA,NA,NA,NA,5,3,1),
  "target_value_2_C"=c(NA,NA,NA,NA,NA,NA,1,9,2)
)

library(data.table)
setDT(df)[,`:=` (NEW_X1 = ifelse(Group=="B",X1_A[Group=="A"],NA),
                 NEW_X2 = ifelse(Group=="B",X2_A[Group=="A"],NA),
                 NEW_target_value_1_C =ifelse(Group=="B",target_value_1_C[Group=="C"],NA),
                 NEW_target_value_2_C =ifelse(Group=="B",target_value_2_C[Group=="C"],NA)
                 )]

Which results in:

df
      Date Group X1_A X2_A target_value_1_B target_value_2_B target_value_1_C target_value_2_C NEW_X1 NEW_X2 NEW_target_value_1_C NEW_target_value_2_C
1: 1990-03     A    9    1               NA               NA               NA               NA     NA     NA                   NA                   NA
2: 2000-01     A    4    2               NA               NA               NA               NA     NA     NA                   NA                   NA
3: 2010-09     A    7    6               NA               NA               NA               NA     NA     NA                   NA                   NA
4: 1990-03     B   NA   NA                0                9               NA               NA      9      1                    5                    1
5: 2000-01     B   NA   NA                2                2               NA               NA      4      2                    3                    9
6: 2010-09     B   NA   NA                9                1               NA               NA      7      6                    1                    2
7: 1990-03     C   NA   NA               NA               NA                5                1     NA     NA                   NA                   NA
8: 2000-01     C   NA   NA               NA               NA                3                9     NA     NA                   NA                   NA
9: 2010-09     C   NA   NA               NA               NA                1                2     NA     NA                   NA                   NA
Chriss Paul
  • 1,101
  • 6
  • 19
  • Impressive. But don't we need to specify some `date`-condition? I mean it worked properly, but is this by coincidence? -- To explain what I mean: Don't we have to make sure that the "9" of A's X1-variable gets assigned to a random date in the B-rows? – Dr. Fabian Habersack Sep 05 '18 at 06:24
  • If you want to get a sample just add `sample` to the assignation of the `if` clause, so for instance for `NEW_X1` it could be `NEW_X1 = ifelse(Group=="B",sample(X1_A[Group=="A"]),NA)`, change accordingly the rest. – Chriss Paul Sep 05 '18 at 20:19