0

So I have some code that looks at two data frames and subtracts a column value named "Intensity" for certain Compositions of molecules. However for instance if the molecule is not in the other data frame, it completely gets rid of that row for some reason not too sure why.

blankdata3 and data3 are my two dataframes that I am subtracting. So I am subtracting a molecules's Intensity such as

(data3 - blankdata3) = datasubtracted

I have the code below that subtracts intensity based on if they have the same composition. However if data3 has a composition that is not found in blankdata3, it will completely get rid of that row when I print my results of datasubtracted. I'm not sure why it is doing that because wouldn't it jut be subtracting by zero if its not found in blankdata3?

#data3 looks like this but with more rows
m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition 
301.14093   7646        100.00      -0.34     5.5       C16 H22 O4 Na
149.02331   4083458.5   23.60       -0.08     6.5       C8 H5 O3
279.15908   33256       18.64       -0.03     5.5       C16 H23 O4

#blankdata3 looks like this but with more rows
m.z       Intensity    Relative Delta..ppm.  RDB.equiv. Composition 
331.11233   4324         94.00      -0.33    6.5        C17 H26 O5 Na
149.02331   3056982.3    23.60      -0.08    6.5        C8 H5 O3
279.15908   20000        18.64      -0.03    5.5        C16 H23 O4

#This is the current code I have for subtraction
datasubtracted <- blankdata3 %>% left_join(select(data3, Intensity, Composition), by ="Composition") %>%
mutate(Intensity = ifelse (is.na(Intensity.y), -Intensity.x, Intensity.y - Intensity.x)) %>%
select(-Intensity.y, -Intensity.x ) %>%
bind_rows(anti_join(data3, blankdata3, by = "Composition") %>%
          mutate( Intensity = -Intensity))

#I expect to see something like this
m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition 
301.14093   7646        100.00      -0.34     5.5       C16 H22 O4 Na
331.11233   -4324       94.00       -0.33     6.5       C17 H26 O5 Na
149.02331   1026476.2   23.60       -0.08     6.5       C8 H5 O3
279.15908   13256       18.64       -0.03     5.5       C16 H23 O4

When running your code it gave me this

m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition 
301.14093   7646        100.00      -0.34     5.5       C16 H22 O4 Na
149.02331   4083458.5   23.60       -0.08     6.5       C8 H5 O3
279.15908   33256       18.64       -0.03     5.5       C16 H23 O4
331.11233   -4324       94.00       -0.33     6.5       C17 H26 O5 Na
149.02331   -3056982.3  23.60       -0.08     6.5       C8 H5 O3
279.15908   -20000      18.64       -0.03     5.5       C16 H23 O4

It looks like it ket the data3 intensities intact and blankdata3 intensities became negative. SO it just combined both data frames but it did no subtraction of Intensities based on similar Composition.

An exact replica of my data is shown below

#data3
m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition    C  H  O  N  Na S
301.14093   7646        100.00      -0.34     5.5       C16 H22 O4 Na  16 22 4  0  1  0
149.02331   3056982.3    23.60      -0.08    6.5        C8 H5 O3       8  5  3  0  0  0
279.15908   33256       18.64       -0.03     5.5       C16 H23 O4     16 23 4  0  0  0

#blankdata3
m.z       Intensity   Relative  Delta..ppm. RDB.equiv.  Composition    C  H  O  N  Na S
331.11233   4324         94.00      -0.33    6.5        C17 H26 O5 Na  17 26 5  0  1  0
149.02331   4083458.5   23.60       -0.08     6.5       C8 H5 O3       8  5  3  0  0  0
279.15908   13256       18.64       -0.03     5.5       C16 H23 O4     16 23 4  0  0  0

David
  • 43
  • 5

1 Answers1

0

Since you're only doing operations on Intensity, I suggest doing something different than the multiple joins, anti-joins:

data3$index <- "y"
blankdata3$index <- "x"

bind_rows(blankdata3, data3) %>% 
  spread(key = index, value = Intensity, fill = 0) %>% # fill = 0 replaces NA values
  mutate(Intensity = y-x) %>% 
  select(-y, -x)
David Klotz
  • 2,401
  • 1
  • 7
  • 16
  • running this code combines both data sets but does not subtract the intensities from each other – David Jul 31 '19 at 18:31
  • I get the exact same values you showed in the "expected results" above. However, the sample data sets might not be so helpful, because in the rows that match, every value except Intensity is identical. That might not be the case with your full data set. – David Klotz Jul 31 '19 at 18:49
  • Do you have other columns you're not showing above? I get something completely different. This would be easier to confirm if you made a reproducible example of your sample data, rather than cutting and pasting. It's possible what you've posted has decimal values rounded for display, or some other issue. – David Klotz Aug 01 '19 at 00:40
  • The only thing I left out was the individual element columns that numbers how much of that element is in that molecule. Edits are seen above with an exact replica of how the columns are. – David Aug 01 '19 at 04:33