Problem with incorrect spelling : When you are making an application with user input, you may expect incorrect input. There are spell check libraries available to handle them, however user defined data may not be present in dictionary. For example, you are building a chat bot, where you need to enter location name to search for a restaurant.
-
It sounds like you would need to build your own dictionary and add those values to whatever library you're using – Davy M Apr 08 '18 at 19:30
-
It sounds like there should be a question mark somewhere in your post. But if you end up asking for a library: library recommendations are off-topic on Stack Overflow. – Jongware Apr 08 '18 at 19:37
-
Thanks Davy, I was just doing entity extraction, and thought of sharing the use of soundex api in python. – Bidya Apr 08 '18 at 19:37
-
You could use one or more cloud services for the particular kinds of searches you want to do. For example, Bing Entity Search is specifically for searching for places like restaurants, and will find things if the spelling is close enough. (There are similar services from other providers, I just happen to be the most familiar with this one.) – kindall Apr 08 '18 at 19:37
-
hi kindall, I will definitely look at it. Does it support adding own entity like I may search for a department, which is specific for every organization. – Bidya Apr 08 '18 at 19:43
-
No, nothing like that, unfortunately. – kindall Apr 08 '18 at 19:45
1 Answers
Solution in database layer:
To handle this you can use soundex API. These are standard APIs available across all the technologies as small libraries. They are also available in database SQL query.
Below is one of the valid SQL query for MySQL database: select distinct r_name from restuarant where area = 'South' and SOUNDEX(cost) = SOUNDEX('cheep')
In above example the database may have entry for 'cheap', but user has entered 'cheep'. So the above query will return the valid record having cost = 'cheap'.
Solution in python layer:
Fuzzy library has Soundex as well as DMetaphone API.
Steps to setup fuzzy:
a. Make sure you have python3 installed and set in the PATH 'C:\Program Files\Python36\Scripts'
b. Download Fuzzy-1.2.2.tar.gz library from https://pypi.python.org/pypi/Fuzzy
c. Extract them into a folder.
d. Execute setup.py install
Import and test in python:
import fuzzy
dmtfn = fuzzy.DMetaphone(4)
print(dmtfn('Hyderaabaad'), dmtfn('Hyderabad'))
>> [b'HTRP', None] [b'HTRP', None]
print(dmtfn('Hyderaabaad')[0], dmtfn('Hyderabad')[0])
>> b'HTRP' b'HTRP'
A real use case (entity extractor in chat bot):
When you are building a chat bot for restaurant search and you have to find a valid location, which are predefined as a entity list. So the user input location should be recognized as an entity in python layer before it has to be passed to database. In this case we can use soundex ot dmetaphone api.
Below code snippet, reads entities from a folder(all location could be in a file cities.txt) and then creates valid entity list. The entity list is then converted to valid DMetaphone codes. Finally input location will be converted to the DMetaphone code and compare with the earlier created codes.
# read all entities from the entities folder
# store them as dictionary, where key is filename
files = os.listdir('./entities/')
entities = {}
for fil in files:
lines = open('./entities/'+fil).readlines()
for i, line in enumerate(lines):
lines[i] = line[:-1]
entities[fil[:-4]] = '|'.join(lines)
# now convert the valid entities into codes
if ' ' in uinput:
codes = [dmtfn(w)[0] for w in uinput.lower().split()]
else:
codes = [dmtfn(uinput.lower())[0]]
# If the code of input location matches with valid code list
# then store the location as valid attribute for the intent
for entity in entities:
for i in entities[entity].split('|'):
# entity extraction using sound code, to avoid spell mistakes
# using soundex in database layer
currCode = dmtfn(i.lower())[0]
# print(currCode, i.lower())
if currCode in codes:
# if i.lower() in uinput.lower():
attributes[entity] = i

- 179
- 2
- 11