1

Comparing two items from two different datasets and mutate variable accordingly

summary

Dear Stackoverflow-Community, I'm trying to compare a column/variable (item1) from one dataset (data1) with a column/variable (item1) from a different dataset (data2). I would like to mutate the compared column/variable (item1) in dataset1 to a third variable (letter) of dataset data2.

Unfortunately I'm receiving the ereror message "Error in UseMethod("mutate_") : inapplicable method for 'mutate_' applied to object of class "logical"." with my code.

I've created two data example sets and a dataset showing the output that I'm trying to generate with R you will find in the dropbox link below.

download to example dataset (+ visualization of desired output)

https://www.dropbox.com/sh/eido04eiocuw06l/AABiCr2EpRf4PPsb1HYLLGFna?dl=0

My Code

data1 <- read.csv2("data 1.csv")
data2 <- read.csv2("data 2.csv")

attach(data1)
attach(data2)

data1 <- as.data.frame(data1)
data2 <- as.data.frame(data2)

if(data1$item.1 = data2$item.1) %>%
  mutate(data1$item.1 == data2$letter)

Background

I downloaded a big dataset from moodle and I need to transform the dataset in order to do my analyses. This afternoon I've been trying this for way too long with my colleague and now we hope for some advice (as we just started with R).

Thanks in advance and have a great day!

Karla

r2evans
  • 141,215
  • 6
  • 77
  • 149
Karla
  • 15
  • 5
  • 1
    There are several problems with your code. Up front, I suggest you go through more of the tutorials at https://dplyr.tidyverse.org/. – r2evans Jan 12 '21 at 23:04
  • 1
    (1) `if(data1$item.1 = data2$item.1)` is doing an *assignment*, not a *comparison*; see https://stackoverflow.com/q/28176650/3358272. Also, since I'm inferring that `data1` has more than one row, this will fail, since `if` **requires** length 1, but a comparison here will return a vector as long as the number of rows in the frame. – r2evans Jan 12 '21 at 23:04
  • 1
    (2) `if (...) %>% mutate(...)` makes no sense, I have no idea what you are trying to do here. Typically, you'll see `if (...) { some_code; } else { other_code; }`, and `somedata %>% mutate(...)`. (3) `mutate` requires a named set of expressions, so `mutate(expression)` should be `mutate(newvar = expression)`. – r2evans Jan 12 '21 at 23:04
  • 1
    (4) Don't `attach(data1)` and then `data1 = as.data.frame(data1)`. In your case here, just remove the `attach` calls, you aren't using them. (In fact, if a tutorial you are reading is recommending `attach` and you have the choice of using another tutorial, I recommend it. The use of `attach` is sloppy, enhances very little, has risks, and encourages bad habits.) – r2evans Jan 12 '21 at 23:05
  • Maybe check out `?inner_join()` instead of `mutate()` if you want to conditionally add a variable based on another data.frame (and as @r2evans suggests, pipe it to a data.frame, e.g. `data1 %>% inner_join(data2, by = "item.1")`). – CzechInk Jan 12 '21 at 23:17
  • @ Revans: Thanks for your advice! We've learned in our statistics course at universtiy to attach data before working with them - but I guess I better don't do this in the future ;) – Karla Jan 13 '21 at 11:07
  • @Revans: Thank you for your comment, I did not know this function before. – Karla Jan 13 '21 at 11:08

1 Answers1

1
data1 <- read.csv2("stackoverflow/data_1.csv")
data2 <- read.csv2("stackoverflow/data_2.csv")

# Get data in format where there are only two columns
long_data1 <- tidyr::gather(data1, key = "key", value = "value", -person) 
long_data2 <- tidyr::gather(data2, key = "key", value = "value", -letter)

# Merge on those two columns
merged_data <- merge(long_data1, long_data2, by = c("key", "value"))

# Tidy up the results
merged_data <- subset(merged_data, select = c(person, letter, key))

final_data <- tidyr::spread(merged_data, key = key, value = letter)

The cleanest solution I can come up with getting the datain the long format - where each observation has its own row - and then merging the columns. The tidyr package does this best, which will need to be installed with install.packages(tidyr) if you don't have it installed already.

ashetty
  • 61
  • 4