I doubt it is practical to completely code this in an automated fashion, but I would suggest a two step approach.
First, identify possible matches. You can use a number of potential solutions; this is far more complex than a StackOverflow solution, but you have some suggestions already, and you can look at papers on the internet, such as this paper which explains many of the SAS functions and call routines (COMPGED, SPEDIS, COMPLEV, COMPCOST, SOUNDEX, COMPARE).
Use this approach with a fairly broad stroke - ie, prefer false positives to false negatives. Simply focus on identifying words one to one; build a dataset of original, translation
, such as
Delli, Delhi
Deli, Delhi
Dalhi, Delhi
etc.
Then visually inspect the file and make corrections as needed (ie, remove false positives).
Once you have this dataset, you have a few options for utilizing the results. If you already have the city name as a separate field, or if you can put it in a separate field or work with it using scan
easily to identify just the city, you can use a format solution.
data for_fmt;
set translations;
start=original;
label=translation;
fmtname='$CITYF';
*no hlo=o record as we want to preserve nonmatches as is;
run;
proc format cntlin=for_Fmt;
quit;
data want;
set have;
city_fixed=put(city,$CITYF.);
run;
If you cannot easily identify the city in the address (ie, your address field is something like "10532 NELSON DRIVE DELHI" with no commas or such), then the TRANWRD solution is probably best. You can code a hash-based or array-based solution to implement it (rather than a lot of if statements); if your data does have this problem post a comment and I'll add to the solution later.