Hi I have a 2 large datasets and I'm trying to identify changes within molecular compositions. I have two dataframes... one that is pre-exposed to a reagent and another that has been exposed to this reagent (reagent = NaNO3).
The Formulas have a base structure CxHyOzNd (x,y,z, and d being variables). From the exposed sample to the pre-exposed sample data sets we will see a loss of -N1O2 and +H1 (The C should be the same within the matching formulas. The code should be able to find this loss through the elemental columns (C,H,O,N,S,P) or through the strings itself (e.g. C13H26O2N).
Ex. Pre-Exposed Sample
Composition C H O N S P
C11H13O2 11 13 2 0 0 0
C7H9O 7 9 1 0 0 0
C4H8 4 8 0 0 0 0
.....
Ex. Exposed Sample
Composition C H O N S P
C11H12O4N 11 12 4 1 0 0
C7H7O5N2 7 7 5 2 0 0
C3H6O 3 6 1 0 0 0
.....
As seen in the data frames the molecular compositions changed by the addition of +NO2 and a loss of one Hydrogen (-H). I want to be able to match these formula strings if they experience the loss of -N1O2 and +H and make a new data frame that connects these two formulas together. An example is shown below. The NO2 column shows the number of NO2 groups added and the H column shows the number of H lost. (It's possible that more than 1 NO2 group could be added).
Ex. New dataframe showing the transformations
Pre-Comp Exposed-Comp NO2 H
C11H13O2 C11H12O4N 1 1
C7H9O C7H7O5N2 2 2
If this type of analysis could be done through R, please let know. I'm guessing that the code would be some sort of if/then statement that could go through the data sets. I'm relatively new to R so any code would be helpful, thanks!