0

I want to use DoubleMetaphone to get a phonetic encoding of a given string. For example:

import org.apache.commons.codec.language.DoubleMetaphone;
String s1 = "computer";
(new DoubleMetaphone()).doubleMetaphone(s1);

Result: Computer -> KMPT

The issue arises when I try to encode longer strings.

import org.apache.commons.codec.language.DoubleMetaphone;
String s1 = "dustinhoffmanisanactor";
(new DoubleMetaphone()).doubleMetaphone(s1);

Result: dustinhoffmanisanactor -> TSTN

Clearly it's taking the first 4 encoded characters and halting. In this case Dustin -> TSTN.

I used the Python implementation of Double Metaphone and it works as expected.

>>>from metaphone import doublemetaphone
>>>doublemetaphone("dustinhoffmanisanactor")[0]
"TSTNFMNSNKTR"
Ian
  • 3,605
  • 4
  • 31
  • 66
  • Which version of `org.apache.commons.codec.language.DoubleMetaphone` are you using? – Progman Nov 14 '20 at 23:00
  • I'm using 1.9. Think the issue is solved now :) – Ian Nov 14 '20 at 23:02
  • 1
    The default size is 4, see https://github.com/apache/commons-codec/blob/fe8b24cb8b9aca990adb7a9623c7db35f0bff75c/src/main/java/org/apache/commons/codec/language/DoubleMetaphone.java#L59 – Progman Nov 14 '20 at 23:03

1 Answers1

2

Seems I needed to set the max code length.

String s1 = "dustinhoffmanisanactor";
DoubleMetaphone dm = new DoubleMetaphone();
dm.setMaxCodeLen(100);
dm.doubleMetaphone(s1);

Which gives the expected TSTNFMNSNKTR.

Ian
  • 3,605
  • 4
  • 31
  • 66