0

I have a file with the structure:

N1H3O1 C2H2
C1H4 H201
C1H1N1 N1H3
C2N1O1P1H3 P5

What I am trying to do is to count the sum of coefficients in each of the formulae. Thus, the desire output is:

1+3+1 5 2+2 4
1+4 5 2+1 3
1+1+1 3 3+1 4
2+1+1+1+3 8 5 5

What I did is a simple replacement of each letter with "+" and then deleting the first " +".

I however would like to know how to do it in a more proper way in sed, using branch and flow operators.

Inian
  • 80,270
  • 14
  • 142
  • 161
MirrG
  • 406
  • 3
  • 10
  • 2
    the least amount of work is the best way :-) ... It sounds like you have a perfectly `sed`ish solution. Don't go looking for trouble with sed, you'll soon find it. Also, if you really expect help, you'll need to include your code in the body of your Q. Good luck. – shellter May 09 '19 at 20:16
  • H201 should be H2O1? (Oh not zero) – al76 May 10 '19 at 03:13

1 Answers1

1

The problem with your input is the 0 which is used instead of O, which might make it difficult to design a regular expression for it, which you can see here:

enter image description here

([^A-Z]+)*([0-9]+)

Other than that, you might be able to capture the numbers by simply adding ([^A-Z]+).

However, you may not wish to do this task with regular expression, since your data except for that 0 is pretty structured, and you could maybe write a script to do so.

Emma
  • 27,428
  • 11
  • 44
  • 69