1

I have a code...

var userArray=userIn.match(/(?:[A-Z][a-z]*|\d+|[()])/g);

...that separates the user input of a chemical formula into its components.

For example, entering Cu(NO3)2N3 will yield

Cu , ( , N , O , 3 , ) , 2 , N , 3.

In finding the percentage of each element in the entire weight, I need to count how many times each element is entered.

So in the example above,

Cu : 1 , 
N  : 5 , 
O : 6 

Any suggestions of how I should go about doing this?

Caponera
  • 43
  • 1
  • 1
  • 9
Rygh2014
  • 135
  • 1
  • 1
  • 10
  • Does the quantifier _always_ come right after the element? Also, is nesting allowed? Are two digit numbers allowed? – Benjamin Gruenbaum Jun 28 '13 at 22:50
  • 1
    This is much more than just counting occurrences. This is parsing and multiplying. – Barmar Jun 28 '13 at 22:51
  • @Barmar Yes, this requires an actual parser - not a particularly hard one though. Tokens are letters, numbers (quantifiers) and brackets. I don't mind giving the OP a good answer on _how_ to implement it but it's not very clear yet. – Benjamin Gruenbaum Jun 28 '13 at 22:53
  • Yes, the quantifier will be right after the element, and two digits numbers ARE allowed. So entering H12, will be H, 12 . The only exception would be with parenthesis, where the following number would have to multiply by everything inside the parenthesis. – Rygh2014 Jun 28 '13 at 22:57
  • @TGH The `g` modifier makes it return all occurrences in an array. – Barmar Jun 28 '13 at 22:59

2 Answers2

2

You need to build a parser

There is no simple way around that. You need nesting and memory, a regular expression can't handle that very well (well, a real CS regulular expression can't handle that at all).

First, you get the result regexp you have. This is called Tokenization.

Now, you have to actually parse that.

I suggest the following approach I will give you pseudo code because I think it will be better deductively. If you have any questions about it let me know:

method chemistryExpression(tokens): #Tokens is the result of your regex

  1. Create an empty map called map

  2. While the next token is a letter, consume it (remove it from the tokens)

    2.1 Add the letter to the map with occurrence 1 or increment it by one if it's already inside the map

  3. If the next token is (, consume it: # Deal with nesting

    3.1 Add the occurrences from parseExpression(tokens) to the map (note, tokens changed)

    3.2 Remove the extra ) you've just encountered

  4. num = consume tokens while the next token is a number and convert to int

  5. Multiply the occurances of all tokens in the map by num

  6. Return the map

Implementation suggestion

  • The map can just be an object.

    • Adding to the map is checking if the key is there, if it is not, set it to 1, if it is there, increment its value by one.

    • Multiplying can be done using a for... in loop.

  • This solution is recursive this means you're using a function which calls itself (chemistryExpression) in this case. This parser is a very basic example of a recursive descent parser and handles nesting well.

  • Common sense and good practice necessitate two methods

    • peek - what is the next token in the tokens, this is tokens[0]
    • next - grab the next token from tokens, this is tokens.unshift()
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
0

For each value in userArray, check if there is a next element anf if that next element is a number, if so, add this number to the count of the current element type, else add 1. You can use an object as a map to store a count for each distinct element type :

var map = { }
map[userArray[/*an element*/] = ...

EDIT : if you have numbers longer than a digit, then in a loop while the next is a number, concatenate all numbers into a string and parseInt()

Virus721
  • 8,061
  • 12
  • 67
  • 123