I can suggest an information-retrieval technique of doing so, but it requires a large collection of documents in order to work properly.
Index your data, using the standard IR techniques. Lucene is a good open source library that can help you with it.
Once you get a name (Obaama for example): retrieve the set of collections the word Obaama appears in. Let this set be D1
.
Now, for each word w
in D11 search for Obaama AND w
(using your IR system). Let the set be D2
.
The score |D2|/|D1|
is an estimation how much w
is connected to Obaama
, and most likely will be close to 1 for w=Obama
2.
You can manually label a set of examples and find the value from which words will be expected.
Using a standard lexicographical similarity technique you can chose to filter out words that are definetly not spelling mistakes (Like Barack
).
Another solution that is often used requires a query log - find a correlation between searched words, if obaama has correlation with obama in the query log - they are connected.
1: You can improve performance by first doing the 2nd filter, and check only for candidates who are "similar enough" lexicographically.
2: Usually a normalization is also used, because more frequent words are more likely to be in the same documents with any word, regardless of being related or not.