0

So I am try to write a script that will turn a to an (when necessary). And it is harder than I thought.

var txt = "This is a apple.";

var pos = txt.search(/a\s[aeiou]/i);
txt = pos != -1 ?
      txt.substring(0,pos+1) + "n" + txt.substring(pos+1,txt.length) :
      txt;
//"This is an apple."

It is working, but when I try "There are 60 minutes in a hour.", it didn't not change it into an because of the regex. So I changed it:

var pos = txt.search(/a\s([aeiou]|hour)/i);

Now it is working (at least for "hour"). But now if I put "There are people in a university.", it will change it into an university, which is not correct.

So, is there a regular expression that can cover the rules of using a and an in the English language? Thanks!

Derek 朕會功夫
  • 92,235
  • 44
  • 185
  • 247

1 Answers1

2

There was a very good thread about this on StackOverflow a while ago: How can I correctly prefix a word with "a" and "an"?

Basically the consensus was that the best way involves a large dataset from which to learn, and the second-best way involves a pronunciation dictionary such as the CMU dict designed for speech synthesis.

To give an example from the CMU dict:

University comes out as:
Y UW N AH V ER S AH T IY . 

Umbrella is rendered as:
AH M B R EH L AH . 
Community
  • 1
  • 1
Soz
  • 957
  • 1
  • 5
  • 9
  • This really complicated what I am intended to do. – Derek 朕會功夫 Jun 08 '12 at 23:36
  • 1
    Yes - but because it has to do with the way people speak, it really is a complicated problem :) Could be reduced to a web service call per word, though, with the CMU dict approach. – Soz Jun 08 '12 at 23:37
  • There's always AJAX -- but yes, I think it probably would benefit from some server-side support if you were after a high level of accuracy. But if you really need to stick to client-side code, it might be that you could get to a somewhat acceptable level of accuracy with a relatively small number of rules. There's a SoundEx implementation in JavaScript (http://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm#JavaScriptCode), after all. All depends just what accuracy you need. – Soz Jun 08 '12 at 23:45
  • How do you use that SoundEx thing? `"Ha" --> "H000000000"`, but `"Haha" --> "H000000000"` gives me the same thing, but they don't sound the same. – Derek 朕會功夫 Jun 08 '12 at 23:50
  • It's designed to find loose matches between pronunciations: eg. 'enjoy' and 'enchoi' or 'uni' and 'youknee' (see http://en.wikipedia.org/wiki/Soundex). I don't think it's precise enough for your purposes, since it generally ignores vowels in favour of consonants. That's because English dialects tend to vary widely in the pronunciation of vowels, so it's a more difficult problem. – Soz Jun 09 '12 at 00:03
  • I tried to process the pronunciation in [here](http://jsfiddle.net/DerekL/rZ25L/), but the result is not so good – Derek 朕會功夫 Jun 09 '12 at 00:12