1

My first dataset (MSdata1) looks like this

m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition 
301.14093   7646        100.00      -0.34     5.5       C16 H22 O4 Na
149.02331   4083458.5   23.60       -0.08     6.5       C8 H5 O3
279.15908   33256       18.64       -0.03     5.5       C16 H23 O4

My second dataset (MSdata2) looks like this

m.z       Intensity    Relative Delta..ppm.  RDB.equiv. Composition 
331.11233   4324         94.00      -0.33    6.5        C17 H26 O5 Na
149.02331   4083458.5    23.60      -0.08    6.5        C8 H5 O3
279.15908   42315        18.64      -0.03    5.5        C16 H23 O4

I am trying to subtract the the two datasets in the $Intensity column, based on the "ID" of each row, which is the $Composition column. (MSdata2$Intensity minus MSdata1$Intensity)

My attempt is

!(data33$Composition %in% data3$Composition)

but this and its variations either give me TRUE/FALSE output or it unfortunately subtracts everything, which is not my end goal. I have found similar questions, but they either do not deal with strings -- or their two datasets are of very similar size and content. My two datasets will have a lot of different information with similarities throughout - it's those similarities that I want to subtract to understand their differences better.

Which is,

m.z       Intensity    Relative Delta..ppm.  RDB.equiv. Composition 
301.14093   -7646       100.00      -0.34    5.5        C16 H22 O4 Na
149.02331   0            23.60      -0.08    6.5        C8 H5 O3
279.15908   9059         18.64      -0.03    5.5        C16 H23 O4

Thank you for any help you have to offer! :(

EDIT - I have changed my output so that $Composition strings not found in the first dataset, results in a negative output ( example, 0-100 = -100)

Ragstock
  • 55
  • 8

2 Answers2

1

I would suggest to use dplyr::left_join with by="Composition". Based on expected output expressed by OP it seems he is interested subtract MSdata1 from MSdata2

library(dplyr)

MSdata2 %>% left_join(select(MSdata1, Intensity, Composition), by="Composition") %>%
mutate(Intensity = ifelse(is.na(Intensity.y), Intensity.x, Intensity.x - Intensity.y)) %>%
select(-Intensity.y, -Intensity.x )

#        m.z Relative Delta..ppm. RDB.equiv.   Composition Intensity
# 1 331.1123    94.00       -0.33        6.5 C17 H26 O5 Na      4324
# 2 149.0233    23.60       -0.08        6.5      C8 H5 O3         0
# 3 279.1591    18.64       -0.03        5.5    C16 H23 O4      9059

Edited: The answer has been added to include rows from MSdata1 which doesn't has matching row in MSdata2. Those rows should get Intensity = -Intensity. Once can use anti_join to find such rows and then use bind_rows to merge all rows.

MSdata2 %>% left_join(select(MSdata1, Intensity, Composition), by="Composition") %>%
mutate(Intensity = ifelse(is.na(Intensity.y), Intensity.x, Intensity.x - Intensity.y)) %>%
select(-Intensity.y, -Intensity.x ) %>%
bind_rows(anti_join(MSdata1, MSdata2, by="Composition") %>% mutate(Intensity = -Intensity))

#        m.z Relative Delta..ppm. RDB.equiv.   Composition Intensity
# 1 331.1123    94.00       -0.33        6.5 C17 H26 O5 Na      4324
# 2 149.0233    23.60       -0.08        6.5      C8 H5 O3         0
# 3 279.1591    18.64       -0.03        5.5    C16 H23 O4      9059
# 4 301.1409   100.00       -0.34        5.5 C16 H22 O4 Na     -7646

Edit#2: If all records from MSdata1 is needed then once can try:

MSdata1 %>% left_join(select(MSdata2, Intensity, Composition), by="Composition") %>%
mutate(Intensity = ifelse(is.na(Intensity.y), -Intensity.x, Intensity.y - Intensity.x)) %>%
  select(-Intensity.y, -Intensity.x )

#        m.z Relative Delta..ppm. RDB.equiv.   Composition Intensity
# 1 301.1409   100.00       -0.34        5.5 C16 H22 O4 Na     -7646
# 2 149.0233    23.60       -0.08        6.5      C8 H5 O3         0
# 3 279.1591    18.64       -0.03        5.5    C16 H23 O4      9059

Data:

MSdata1 <- read.table(text = 
"m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition 
301.14093   7646        100.00      -0.34     5.5       'C16 H22 O4 Na'
149.02331   4083458.5   23.60       -0.08     6.5       'C8 H5 O3'
279.15908   33256       18.64       -0.03     5.5       'C16 H23 O4'",
header = TRUE, stringsAsFactors = FALSE)

MSdata2 <- read.table(text = 
"m.z       Intensity    Relative Delta..ppm.  RDB.equiv. Composition 
331.11233   4324         94.00      -0.33    6.5        'C17 H26 O5 Na'
149.02331   4083458.5    23.60      -0.08    6.5        'C8 H5 O3'
279.15908   42315        18.64      -0.03    5.5        'C16 H23 O4'",
header = TRUE, stringsAsFactors = FALSE)
MKR
  • 19,739
  • 4
  • 23
  • 33
  • @Ragstock I think you had accepted this answer and then probably acceptance has been removed accidentally. But in case it didn't matched your expectation then let me know. I think it would be better for future user if questions got clear accepted answer. – MKR May 08 '18 at 20:02
  • Question - so my plan is to highlight the differences in MSdata1, therefore whatever I see in MSdata2$Composition that does not exist in MSdata1$Composition - should show up as negative (for example zero - 100 = - (negative) 100). With this code, is this already being done? If not, I'm not sure how I would go about this. – Ragstock May 08 '18 at 20:04
  • @Ragstock That was not mentioned in your expected output hence my current solution doesnt consider it. That can be done easily. Please update your question with expected output. I'll update my answer in a while – MKR May 08 '18 at 20:09
  • You are entirely correct, I did not realize I missed a negative in one of my rows for the example. It has been fixed. Thank you – Ragstock May 08 '18 at 20:12
  • Just a note, my example (0-100 = -100) is just an example. I just need unmatched rows to give a negative intensity (instead of remaining unaltered) – Ragstock May 08 '18 at 20:27
  • You are correct. I had left it as taken from example. Answer has been updated now – MKR May 08 '18 at 20:30
0
MSdata1 = MSdata1 = data.frame(m.z = c(301.14093, 149.02331, 279.15908), Intensity = c(7647, 4083458.5, 33256), Composition = c("C16 H22 O4 Na", "C8 H5 O3", "C16 H23 O4"))
MSdata2 = data.frame(m.z = c(331.11233, 149.02331, 279.15908), Intensity = c(4324, 4083458, 42315), Composition = c("C17 H26 O5 Na", "C8 H5 O3", "C16 H23 O4"))

MSdata2 %>% merge(MSdata1, by="m.z", all.x = TRUE) %>% select(-Composition.y) %>% replace(., is.na(.), 0) %>% mutate(Intensity = Intensity.x - Intensity.y) 
FENG QI
  • 110
  • 1
  • 8