0

I'm just going to apologize in advance for anything confusing and/or dumb about this question. I am completely new to R but because of larger project restrictions, I am currently forced to use it for this task.

Right now I have two tables that I would like to join, RMS1 and RMS2. RMS1 is larger, and I only want to carry over matching columns from RMS2 (left join). For the most part, RMS1 and RMS2 are separate data sets with a unique ID for every entry, but there are a few overlapping IDs between the two tables, and in that case, I would like to get a weighted average of the columns they share in common when I do a join.

For example, I have columns (ID, sev1, freq1, score1, count1) in both tables, and if there are two of the same IDs in both tables, the counts will be different, so I want a new table with the weighted average of sev1, freq1, and score1 based on the counts.

I found this old question which I could probably make work for me, but since I would need to do this calculation 13*3 times and I do not have any experience with vectors in R, I thought I would ask and see if there was a more efficient way to get what I want.

Basically, at the end of the day, I am looking to make a new table with all the exact same columns as RMS1, but with sev1, freq1, score1, etc. being weighted averages, if necessary.

EDITS: My bad, looks like I want a full join. Doesn't really matter in the context of this question though, I'm assuming I can tweak the kind of join later, I just need to know how to do the weighted average. I guess to make it more clear, I'll write out a simplified table example:

RMS1:   id  freq1   sev1    score1  count1
        W123    1   5   3   40
        F456    2   2   4   55
        Y789    0   3   6   25

  RMS2: id  freq1   sev1    score1  count1
        S012    3   3   6   25
        Y789    3   0   3   50

Joined: id      freq1   sev1    score1  
        W123    1   5   3   
        F456    2   2   4   
        Y789    2*  1*  4*  
        S012    3   3   6

So the starred values are the weighted averages of id Y789 (weighted on the counts) because it appears in both RMS tables. Otherwise I just take the raw values from either table. Hope this helps. Again, new to all this, sorry for bad formatting.

www
  • 38,575
  • 12
  • 48
  • 84
  • 3
    Please provide a reproducible example. – www Mar 01 '18 at 01:35
  • You misunderstand what a left join does - it keeps all _rows_ from the left table and all _columns_ from the left AND right tables after the join. So if your id variable in the right table takes a value in some row that is not in the left table, that row will not be in the joined table. – Calum You Mar 01 '18 at 01:48
  • Added some updates, hope it helps – Michelle Nguyen Mar 01 '18 at 16:14

1 Answers1

0

A solution using . We can combine the two data frames by rows, and then calculated the weighted mean by each id. The last as.data.frame() is not required if you are fine to work on the tibble.

library(dplyr)

Joined <- bind_rows(RMS1, RMS2) %>%
  group_by(id) %>%
  summarise_at(vars(-count1), funs(weighted.mean(., count1))) %>%
  as.data.frame()
Joined
#     id freq1 sev1 score1
# 1 F456     2    2      4
# 2 S012     3    3      6
# 3 W123     1    5      3
# 4 Y789     2    1      4

DATA

RMS1 <- read.table(text = "id  freq1 sev1 score1 count1
        W123    1   5   3   40
        F456    2   2   4   55
        Y789    0   3   6   25",
                   header = TRUE, stringsAsFactors = FALSE)

RMS2 <- read.table(text = "id  freq1 sev1 score1 count1
        S012    3   3   6   25
        Y789    3   0   3   50",
                   header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
  • Thanks! I assume binding the rows keeps all columns from both tables? Also, my mistake not clarifying it in my example, but I will have 12 more sets of sev,freq, and score that will need to weighted with different count values. Is it still possible to use this kind of code and separately apply the weights? – Michelle Nguyen Mar 02 '18 at 17:01
  • You can put the columns you want in `vars`, such as `vars(freq1, sev1, score1)`. – www Mar 02 '18 at 17:05
  • If I want freq1 to be a weighted mean on count1, but freq2 is a weighted mean on count2, is this possible to do at the same time? – Michelle Nguyen Mar 02 '18 at 17:13
  • Possible but that would be a different question. If my post solves your current example, you may consider accept my post as the answer for this question, and then start a new question with updated examples and the desired output. – www Mar 02 '18 at 17:16
  • I posted a new question if you're interested https://stackoverflow.com/questions/49074570/joining-data-with-weighted-averages-and-multiple-weights-in-r – Michelle Nguyen Mar 02 '18 at 17:44