0

I would like to use python to convert all synonyms and plural forms of words to the base version of the word.

e.g. Babies would become baby and so would infant and infants.

I tried creating a naive version of plural to root code but it has the issue that it doesn't always function correctly and can't detect a large amount of cases.

contents = ["buying", "stalls", "responsibilities"]
for token in contents:
    if token.endswith("ies"):
        token = token.replace('ies','y')
    elif token.endswith('s'):
        token = token[:-1]
    elif token.endswith("ed"):
        token = token[:-2]
    elif token.endswith("ing"):
        token = token[:-3]

print(contents)
Dan Harmon
  • 313
  • 1
  • 13
  • This is the sort of function that large teams spend thousands of hours working on. How naive is your solution supposed to be here? – JacobIRR Jul 12 '19 at 19:32
  • It's going to be tough. How did you plan on handling plural words like "geese" or "cacti"? Or other words like "sling", "bed", "glass"? You should focus on searching for an external linguistics library to do it rather than trying to make general rules yourself. – Jason K Lai Jul 12 '19 at 19:35
  • @JacobIRR I agree with both of you. It was mainly meant to be a simple implementation to see how quickly and effectively cobbling something together would cover some bases and to provide an example, it wasn't really meant as a solution. – Dan Harmon Jul 12 '19 at 19:43

2 Answers2

1

I have not used this library before, so that this with a grain of salt. However, NodeBox Linguistics seems to be a reasonable set of scripts that will do exactly what you are looking for if you are on MacOS. Check the link here: https://www.nodebox.net/code/index.php/Linguistics

Based on their documentation, it looks like you will be able to use lines like so:

print( en.noun.singular("people") )
>>> person

print( en.verb.infinitive("swimming") )
>>> swim

etc.

In addition to the example above, another to consider is a natural language processing library like NLTK. The reason why I recommend using an external library is because English has a lot of exceptions. As mentioned in my comment, consider words like: class, fling, red, geese, etc., which would trip up the rules that was mentioned in the original question.

Jason K Lai
  • 1,500
  • 5
  • 15
  • That seems like a great solution, thanks. I'll look into implementing it now. Is there any way to handle synonyms though? I would also like to convert all words with the same meaning to one word e.g. babies and infants would both become baby. I'll go use linguistics to implement the main issue now though, thanks. – Dan Harmon Jul 12 '19 at 19:45
  • 1
    Again, I never used that library before, but the link I shared in my answer also has a section for glossary, synonyms, antonyms, etc. – Jason K Lai Jul 12 '19 at 20:05
  • The library seems to only be for MacOS and only runs inside of the nodebox environment so it isn't very useable. – Dan Harmon Jul 13 '19 at 10:32
  • Sorry, it wasn't appropriate for your situation. However, I think you get my point. There are complications writing this yourself, so it's probably best to find an external library. The NLTK, which is used by for natural language processing, is another option to look up. It's also dependent on the Wordnet database, so should give the same results as what I described above. – Jason K Lai Jul 13 '19 at 14:38
  • I found my solution in [pattern.en](https://www.clips.uantwerpen.be/pages/pattern-en), just for future reference for anyone. – Dan Harmon Jul 13 '19 at 14:49
0

I build a python library - Plurals and Countable, which is open source on github. The main purpose is to get plurals (yes, mutliple plurals for some words), but it also solves this particular problem.

import plurals_counterable as pluc
pluc.pluc_lookup_plurals('men', strict_level='dictionary')

will return a dictionary of the following.

{
    'query': 'men', 
    'base': 'man', 
    'plural': ['men'], 
    'countable': 'countable'
}

The base field is what you need.

The library actually looks up the words in dictionaries, so it takes some time to request, parse and return. Alternatively, you might use REST API provided by Dictionary.video. You'll need contact admin@dictionary.video to get an API key. The call will be like

import requests
import json
import logging

url = 'https://dictionary.video/api/noun/plurals/men?key=YOUR_API_KEY'
response = requests.get(url)
if response.status_code == 200:
    return json.loads(response.text)['base']
else:
    logging.error(url + ' response: status_code[%d]' % response.status_code)
    return None
wholehope
  • 41
  • 4