0

First, here's the context. I have data from multiple samples that were all analysed with 3 intruments for the same measurements. One of the instrument is the reference one and we want to plot (and analyse) the measurements from the other instruments relative to this reference.

The dataframe looks like this:

| Sample | Instrument | M1 | M2 | M3 |...
| -------| --------   |--- |--- |--- |
| 1      | Ref        | 
| 2      | Ref        |
.
.
| 1      | Test1      |
| 2      | Test2      |
.
.

In which M1, M2, M3 (and so on) are the different measurements.

So, is there an easy way in R to make a plot where X would be the values of a variable Mx from the Ref group and Y the Mx value from one of the other group ?

The group_by function is fine to get statistics but doesn't seem very useful for plots (although it really could be me not knowing how to use it).

Everything I find for dot plots (geom_points) with a linear regression line (lm) ask for X and Y to be in different columns. Although pivot_wide can be used to get these multiple columns (M1_Ref, M2_Ref, M1_Test1..), the resulting dataframe is an ugly mess creating its own problems (mostly that the variables are now ungrouped)... Thus I'm wondering if there's some more elegant way to achieve this.

Something like:

ggplot(df, aes(X, Y, color="Instrument")) + geom_point() + geom_smooth(method = "lm")

The idea being to get a X,Y dot plot with a regression line (and points) for the tested intruments by function of the reference instrument for a specified measurement.

Thanks for any help!

Carl S
  • 1
  • 1

1 Answers1

0

You could get all your plots on the same page for comparison using facets, but you need to reshape your data first.

Suppose your data look something like this:

set.seed(1)

df <- data.frame(Sample = rep(1:10, 3),
                 Instrument = rep(c('ref', 'Test1', 'Test2'), each = 10),
                 M1 = c(1:10, 1:10 + rnorm(10), 1:10 + rnorm(10, 2)),
                 M2 = c(11:20, 11:20 + rnorm(10), 11:20 + rnorm(10, 2)),
                 M3 = c(21:30, 21:30 + rnorm(10), 11:20 + rnorm(10, 2)))

So that we have

df
#>    Sample Instrument         M1       M2       M3
#> 1       1        ref  1.0000000 11.00000 21.00000
#> 2       2        ref  2.0000000 12.00000 22.00000
#> 3       3        ref  3.0000000 13.00000 23.00000
#> 4       4        ref  4.0000000 14.00000 24.00000
#> 5       5        ref  5.0000000 15.00000 25.00000
#> 6       6        ref  6.0000000 16.00000 26.00000
#> 7       7        ref  7.0000000 17.00000 27.00000
#> 8       8        ref  8.0000000 18.00000 28.00000
#> 9       9        ref  9.0000000 19.00000 29.00000
#> 10     10        ref 10.0000000 20.00000 30.00000
#> 11      1      Test1  0.3735462 11.91898 20.83548
#> 12      2      Test1  2.1836433 12.78214 21.74664
#> 13      3      Test1  2.1643714 13.07456 23.69696
#> 14      4      Test1  5.5952808 12.01065 24.55666
#> 15      5      Test1  5.3295078 15.61983 24.31124
#> 16      6      Test1  5.1795316 15.94387 25.29250
#> 17      7      Test1  7.4874291 16.84420 27.36458
#> 18      8      Test1  8.7383247 16.52925 28.76853
#> 19      9      Test1  9.5757814 18.52185 28.88765
#> 20     10      Test1  9.6946116 20.41794 30.88111
#> 21      1      Test2  4.5117812 14.35868 13.39811
#> 22      2      Test2  4.3898432 13.89721 13.38797
#> 23      3      Test2  4.3787594 15.38767 15.34112
#> 24      4      Test2  3.7853001 15.94619 14.87064
#> 25      5      Test2  8.1249309 15.62294 18.43302
#> 26      6      Test2  7.9550664 17.58501 19.98040
#> 27      7      Test2  8.9838097 18.60571 18.63278
#> 28      8      Test2 10.9438362 19.94069 18.95587
#> 29      9      Test2 11.8212212 22.10003 21.56972
#> 30     10      Test2 12.5939013 22.76318 21.86495

Then we can reshape your data so that there are separate columns for the reference and test values of each Mx:

library(tidyverse)

plot_df <- df %>% 
  pivot_wider(names_from = Instrument, values_from = M1:M3) %>%
  pivot_longer(contains('Test'), values_to = 'test_val') %>%
  separate(name, into = c('Measurement', 'Instrument')) %>%
  pivot_longer(contains('ref'), values_to = 'ref_val') %>%
  separate(name, into = c('Measurement_ref', 'ref')) %>%
  filter(Measurement == Measurement_ref) %>%
  select(-ref, -Measurement_ref)

This gives us:

plot_df
#> # A tibble: 60 x 5
#>    Sample Measurement Instrument test_val ref_val
#>     <int> <chr>       <chr>         <dbl>   <dbl>
#>  1      1 M1          Test1         0.374       1
#>  2      1 M1          Test2         4.51        1
#>  3      1 M2          Test1        11.9        11
#>  4      1 M2          Test2        14.4        11
#>  5      1 M3          Test1        20.8        21
#>  6      1 M3          Test2        13.4        21
#>  7      2 M1          Test1         2.18        2
#>  8      2 M1          Test2         4.39        2
#>  9      2 M2          Test1        12.8        12
#> 10      2 M2          Test2        13.9        12
#> # ... with 50 more rows

We can now facet on two dimensions (instrument and measurement) and see how the values compare to the reference using some simple plotting code:

ggplot(plot_df, aes(ref_val, test_val)) +
  geom_point() +
  geom_smooth(method = lm) +
  facet_grid(Measurement ~ Instrument, scales = 'free') +
  theme_bw()

enter image description here

Created on 2022-12-22 with reprex v2.0.2

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you so much! It works just fine and even better since I was first expecting to have to draw all the plots one by one. – Carl S Dec 22 '22 at 17:33