Create new column depending on multiple other column character strings in R

Question

I'm working on food consumption among mother-infant dyads and my data shows who is in proximity to my individual of interest when the eating behaviour is recorded. The data structure looks like this (very simplified):

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID
Ap	A		Aap	Re	1
Ap	B				1
Ap	A		Aap	Re	1
Re	C	Red	Aap		2
Aap	A	Ap	Red		1
Red	C	Re	Aap		2
Red	A		Aap	Ap	2

In here, Ap-Aap and Re-Red are two dyads (infant-mother). Each dyad has a DyadID number to link the two individuals together. I want R to be able to recognize if Ap or Aap (and the same for the following Dyad Re-Red) is in proximity to the other when they eat, and have another binary column where 1 = In proximity (appears in the cells 'In Contact', 'In 2m' or 'In 5m') and 0 = Not in proximity :

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID	Dyad.Proximity
Ap	A		Aap	Re	1	1
Ap	B				1	0
Ap	A		Aap	Re	1	1
Re	C	Red	Aap		2	1
Aap	A	Ap	Red		1	1
Red	C	Re	Aap		2	1
Red	A		Aap	Ap	2	0

My real data actually has a lot of different proximity distance columns, so I need a way that will help me avoid having to state each column every time. I also have 12 different groups of dyads (compared to the 2 in this exemple), and the only methods I found to allow me to do this (which were all unsuccessful) would mean I would have to do everything again for each Dyad.

As of now, I tried using the 'mutate' function:

data1 <- data %>% 
    mutate(Dyad.Proximity = ifelse(Individual == "Ap" & 
                          find(c_across(In.Contact:In.5m) = "Aap"),
                       "1", "0"))

I've also found this alternative:

data1 <- data %>% mutate(Dyad.Proximity = c("0", "1")[(find(across(In.Contact:In.5m)) == "Aap" &
                                 Individual == "Ap")])

There is a syntaxe error in the first one and the second one gives me this error message:

'Error in across(): ! Must be used inside dplyr verbs.'

As I was saying, these methods (once I figure out what is wrong in my syntaxe) are problematic because they do not allow for me to look at every dyad at the same time, and I would need to repeat this operation for all my 24 individuals.

If feel like there should be an easy way to do this, but I simply cannot find it. Can anyone please help me?

Thank you!

score 0 · Answer 1 · answered Apr 25 '22 at 21:09

0

If you set up a small dyad dictionary, like this:

dyad_dict = list(c("Ap","Aap"), c("Re", "Red"))

then you can use data.table like this:

f <- function(ind, pcols,dyad, dyad_dict) {
  1*any(setdiff(dyad_dict[[dyad]],ind) %in% pcols)
}
df[, Dyad.Proximity:=f(Individual,c(In.Contact,In.2m, In.5m), DyadID, dyad_dict), by=1:nrow(df)]

Output:

   Individual Food Consumed In.Contact  In.2m  In.5m DyadID Dyad.Proximity
       <char>        <char>     <char> <char> <char>  <int>          <num>
1:         Ap             A               Aap     Re      1              1
2:         Ap             B                               1              0
3:         Ap             A               Aap     Re      1              1
4:         Re             C        Red    Aap             2              1
5:        Aap             A         Ap    Red             1              1
6:        Red             C         Re    Aap             2              1
7:        Red             A               Aap     Ap      2              0

answered Apr 25 '22 at 21:09

langtang

22,248
1
12
27

Thank you for your answer! I'm encountering a problem with my results, getting only 0 as a value in my Dyad.Proximity column. I was at first getting this Error message: `Error in dyad_dict[[dyad]] : no such index at level 1`. I realized that 'dyad' was not an integer, which I specified like this in the function: `f <- function(ind, pcols,dyad, dyad_dict) { dyad=as.integer(dyad) 1*any(setdiff(dyad_dict[[dyad]],ind) %in% pcols) }` The code worked, but doesn't seem to detect anyone in proximity (all cells = 0 in Dyad.Proximity column). What do you think I could've done wrong? – Juliette Apr 26 '22 at 18:18
hmm. Lets see if we can figure out the problem. I (perhaps dangerously), assumed that DyadID column in df is integer -- looks like you've figured that out. rather than change in the function, you might want to just do df$DyadID = as.integer(df$DyadID)`. Second, the dictionary might not be set up correctly. For example DyadID=1 must correspond to the first element of the dyad_dict, and DyadID=2 must correspond to the 2nd element of dyad_dict, and so on. – langtang Apr 26 '22 at 18:36
I made sure that my dyad_dict elements corresponded properly to the DyadID assigned in the column. I'm confused with what the 'ind' and 'dyad' terms represent in the function, was I supposed to change those for my variable names 'Individual' and 'DyadID'? Sorry if I misunderstood that part! However, I tried these as well and it still gives the same result (0 everywhere) – Juliette Apr 26 '22 at 19:42
you are using `library(data.table) ` and `setDT(df)` right (i..e you have converted your df to a data.table, correct?) I think you must be, other wise the last line of my solution won't run at al.. As to why you are getting all zeros, I can't say. Any chance you have embeded spaces around the values in Individual, and In.Contact, and In.2m, etc columns? (i.e. " Ap " instead of "Ap")? – langtang Apr 26 '22 at 19:49
No, you don't have to change the column names in your df. You just past the column for Indiviual to the "ind" parameter of the function, and the the list of proximty column to pcols, etc. – langtang Apr 26 '22 at 19:52
indeed initially I was not working with data.table and I fixed all of this so that the code could run. I've double checked and I didn't find any extra spaces present in the code. I will try to reduce my dataset and my amount of columns like in my exemple here and see if it works when I do this, it might help bring something I did wrong to the surface! I will let you know :) – Juliette Apr 26 '22 at 20:32
I'm getting an error message when I install data.table `This installation of data.table has not detected OpenMP support. It should still work but in single-threaded mode.` Do you think this could be the issue? Because apart from this, the code you suggested runs perfectly but doesn't give me the results I need. I've tried with a smaller dataset and I still get a whole column filled with 0. – Juliette Apr 27 '22 at 14:59
no that shouldn't be the case. However you continued problems point to the importance of providing the EXACT structure of your input, rather than the way your have provided it. Can you at least confirm that the code I provided runs on your machine with the data you provided in the example? – langtang Apr 27 '22 at 19:42

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID
Ap	A		Aap	Re	1
Ap	B				1
Ap	A		Aap	Re	1
Re	C	Red	Aap		2
Aap	A	Ap	Red		1
Red	C	Re	Aap		2
Red	A		Aap	Ap	2

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID	Dyad.Proximity
Ap	A		Aap	Re	1	1
Ap	B				1	0
Ap	A		Aap	Re	1	1
Re	C	Red	Aap		2	1
Aap	A	Ap	Red		1	1
Red	C	Re	Aap		2	1
Red	A		Aap	Ap	2	0

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID
Ap	A		Aap	Re	1
Ap	B				1
Ap	A		Aap	Re	1
Re	C	Red	Aap		2
Aap	A	Ap	Red		1
Red	C	Re	Aap		2
Red	A		Aap	Ap	2

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID	Dyad.Proximity
Ap	A		Aap	Re	1	1
Ap	B				1	0
Ap	A		Aap	Re	1	1
Re	C	Red	Aap		2	1
Aap	A	Ap	Red		1	1
Red	C	Re	Aap		2	1
Red	A		Aap	Ap	2	0

Create new column depending on multiple other column character strings in R

1 Answers1

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID
Ap	A		Aap	Re	1
Ap	B				1
Ap	A		Aap	Re	1
Re	C	Red	Aap		2
Aap	A	Ap	Red		1
Red	C	Re	Aap		2
Red	A		Aap	Ap	2

Individual	Food Consumed	In.Contact	In.2m	In.5m	DyadID	Dyad.Proximity
Ap	A		Aap	Re	1	1
Ap	B				1	0
Ap	A		Aap	Re	1	1
Re	C	Red	Aap		2	1
Aap	A	Ap	Red		1	1
Red	C	Re	Aap		2	1
Red	A		Aap	Ap	2	0