0

I have a table with a field that contains a bunch of neighborhood names. Some of these neighborhoods have names with 2 or more words. How can I get a list of words that are 3 or less characters and occur in the middle of name with 3 or more words?

For example:

Lake = Do nothing, only 1 word
Golden Lake = Do nothing, only 2 words
Lakes of Gold = Extract "of"

In essence I want to make a list of 'garbage' words to remove when I build metaphone sentences.

RS7
  • 2,341
  • 8
  • 34
  • 57
  • This will be difficult in MySQL. Its regex implementation only allows selecting rows where a certain field matches a certain regex. So you can't extract parts of a field to generate a new list in MySQL. And even if you could, you'd need lookaround to do exactly what you want, and POSIX regexes don't support that, either. – Tim Pietzcker Dec 27 '10 at 17:19

2 Answers2

3
SELECT  'Lake of gold' RLIKE '[[:<:]].+[[:>:]].+[[:<:]].{1,3}[[:>:]].+[[:<:]].+[[:>:]]'

Unfortunately, MySQL can only match the regexps, not extract the patterns. You will have to do the filtering in MySQL and extraction on the script side.

Quassnoi
  • 413,100
  • 91
  • 616
  • 614
0
SELECT * FROM mytable WHERE mycolumn REGEXP "[[:alnum:]]+[[:space:]]+[[:alnum:]]{1,3}[[:space:]]+[[:alnum:]]+";

will find all entries that contain at least one word of up to 3 characters in between two other words.

You can't extract the words in MySQL directly, but this will filter the relevant rows. You have to do the extraction in a separate step.

Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561