I have the following 2 dataframes-
dataframe_a
+----------------+---------------+
| user_id| domain|
+----------------+---------------+
| josh| wanadoo.fr|
| samantha| randomn.fr|
| bob| eidsiva.net|
| dylan| vodafone.it|
+----------------+---------------+
dataframe_b
+----------------+---------------+
| user_id| domain|
+----------------+---------------+
| josh| oldwebsite.fr|
| samantha| randomn.fr|
| dylan| oldweb.it|
| ryan| chicks.it|
+----------------+---------------+
I want to do a full outer join but retain the value from the domain
column of dataframe_a
in cases where I get 2 different domains for a single user_id
. So, my desired dataframe would look like-
desired_df
+----------------+---------------+
| user_id| domain|
+----------------+---------------+
| josh| wanadoo.fr|
| samantha| randomn.fr|
| bob| eidsiva.net|
| dylan| vodafone.it|
| ryan| chicks.it|
+----------------+---------------+
I think I can do something like-
desired_df = dataframe_a.join(dataframe_b, ["user_id"], how="full_outer").drop(dataframe_b.domain)
But I'm worried if this will give me ryan
in my desired dataframe or not. Is this the right way?