3

My data set (MSdata) looks something like this

m.z       Intensity Relative    Delta..ppm. RDB.equiv.  Composition 
301.14093   NA       100.00         -0.34   5.5         C16 H22 O4 Na
149.02331   4083458.5   23.60       -0.08   6.5         C8 H5 O3
279.15908   NA        18.64         -0.03   5.5         C16 H23 O4

and I would like it to look like

m.z       Intensity Relative    Delta..ppm. RDB.equiv.  C    H   O   Na
301.14093   NA       100.00         -0.34   5.5         16   22  4   1
149.02331   4083458.5   23.60       -0.08   6.5         8    5   3   0
279.15908   NA        18.64         -0.03   5.5         16   23  4   0

I have gotten as far as using this

library(stringr)
numextract <- function(string){
unlist(regmatches(string, gregexpr("[[:digit:]]+\\.*[[:digit:]]*"
                                  ,string)))
}
MScomp <- numextract("C14 H18 O4 Na")

However, this gives me

'14' '18' '4'

I need the 'Na' string to give me a value of 1 or 0 (or NA). I'm new to coding and a lot of this is beyond me- I have been using this website to help me. Additionally I have no idea how to merge these new columns (if this works..) into my current matrix. The website I linked previously uses a newcol() function? Thanks for any help you might have to offer!

zx8754
  • 52,746
  • 12
  • 114
  • 209
Ragstock
  • 55
  • 8
  • 1
    See `CHNOSZ::makeup` "Count the elements [...] in a chemical formula" – Henrik Feb 16 '18 at 16:58
  • Ah that looks interesting! Question... I'm using Jupyter (via Anaconda) to right my R code since I'm new to this. How would I go about installing this package in anaconda so I can use it in jupyter? (install.packages("CHNOSZ") ? I'm not sure how to make this work. – Ragstock Feb 16 '18 at 17:23
  • Sorry, I'm not familiar with Jupyter or anaconda. Good luck! – Henrik Feb 16 '18 at 17:28

1 Answers1

2

I have edited the code as needed:

library(tidyverse)
library(stringr)  



dat%>%mutate(Composition=gsub("\\b([A-Za-z]+)\\b","\\11",Composition),
              name=str_extract_all(Composition,"[A-Za-z]+"),
              value=str_extract_all(Composition,"\\d+"))%>%
   unnest()%>%spread(name,value,fill=0)
       m.z Intensity Relative Delta..ppm. RDB.equiv.    Composition  C  H Na O
1 149.0233   4083459    23.60       -0.08        6.5       C8 H5 O3  8  5  0 3
2 279.1591        NA    18.64       -0.03        5.5     C16 H23 O4 16 23  0 4
3 301.1409        NA   100.00       -0.34        5.5 C16 H22 O4 Na1 16 22  1 4
Onyambu
  • 67,392
  • 3
  • 24
  • 53