Match rows and columns of two dataframes/tibble and copy values

Question

I have two dataframes/tibble.

The first one (initialized with zeros except for the first column):

> df_
# A tibble: 33 x 101
   sample_id  SOM1  SOM2  SOM3  SOM4  ..... SOM100
   <fct>     <dbl> <dbl> <dbl> <dbl> 
 1 1             0     0     0     0 
 2 2             0     0     0     0
 3 3             0     0     0     0
 4 4             0     0     0     0     
 5 5             0     0     0     0     
.
.
.  33

The second one:

> sum
# A tibble: 3,300 x 3
# Groups:   sample_id [33]
   sample_id cluster_id count
   <fct>     <chr>      <int>
 1 1         SOM1         875
 2 1         SOM2        1102
 3 1         SOM3        1028
 4 1         SOM4         925
 5 1         SOM5       10164
 6 1         SOM6        5425
 7 1         SOM7        1593
 8 1         SOM8         290
 9 1         SOM9        6180
10 1         SOM10      10850
# … with 3,290 more rows

I want to match "sample_id" from sum to "sample_id" from df_ and also match "cluster_id" from sum to colnames from df_ and if the match is true for both the conditions then copy the corresponding value in the "count" column from sum to the cell in df_ where the condition is found to be true. In the end, I want to replace all the zeros in df_ with a value from the count cloumn in sum.

I want my end result to look like the dataframe below.

> df_
# A tibble: 33 x 101
   sample_id  SOM1  SOM2  SOM3  SOM4  ..... SOM100
   <fct>     <dbl> <dbl> <dbl> <dbl> 
 1 1           875  1102  1028   925 
 2 2             0     0     0     0
 3 3             0     0     0     0
 4 4             0     0     0     0     
 5 5             0     0     0     0     
.
.
.  33

Any help or suggestion would be much appreciated.

You can just use tidyverse, then `sum %>% spread(cluster_id, count, fill = 0)` — David Klotz, Aug 09 '19 at 20:20
Note that you probably want to give your tibble a different name, since `sum` is a base function — David Klotz, Aug 09 '19 at 20:31
@DavidKlotz btw after spread my column names are not sorted. It's like SOM1 SOM10 instead of SOM1 SOM2 ... — Rohit Farmer, Aug 09 '19 at 20:40
That's because R is ordering the columns by string, not integers. You can manually set factor levels to override that: `sum %>% mutate(cluster_id = as_factor(cluster_id)) %>% spread(cluster_id, count, fill = 0)` — David Klotz, Aug 12 '19 at 14:18

score 1 · Answer 1 · answered Aug 09 '19 at 20:32

If I understand right, you are trying to take the sum dataframe and transform it into the same view as _df (never call a dataframe sum by the way as it is one of the most common R functions).

You should be able to use this:

library(tidyverse)
sum %>% spread(key = 'cluster_id', value = 'count')

Match rows and columns of two dataframes/tibble and copy values

1 Answers1