1

Python beginner here. I made this function to find the 10 most frequent words in a dictionary called "Counts". The thing is, I have to exclude all the items from the englprep, englconj, englpronouns and specialwords lists from the "Counts" dictionary, and then get the top 10 most frequent words returned as a dictionary. Basically I have to get the "getMostFrequent()" function to take the "Counts" dictionary and the specified lists of "no-no" words as an input to output a new dictionary containing the 10 most frequent words.

I have tried for hours but I can't for the life of me get this to work. expected output should be somewhere along the lines of: {'river': 755, 'party': 527, 'water': 472, etc...} but i just get: {'the': 16517, 'of': 8550, 'and': 6390, 'to': 5471, 'a': 3508, 'in': 3298, 'was': 2371, 'on': 2094, 'that': 1893, 'he': 1557}, Which contains words that i specified not to be included :/ Would really aprecciate some help or maybe even a possible solution. Thanks in advance to anyone willing to help.

PS! I use python 3.8

def countWords():

    Counts = {}
              
    for x in wordList:
        if not x in Counts:      
            Counts[x] = wordList.count(x)
     
return Counts

def getMostFrequent():

    exclWordList = tuple(englConj), tuple(englPrep), tuple(englPronouns), tuple(specialWords)
    topNumber = 10
    topFreqWords =  dict(sorted(Counts.items(), key=lambda x: x[1], reverse=True)[:topNumber])

    new_dict = {}

    for key, value in topFreqWords.items():      
        for index in exclWordList:
            for y in index:
                if value is not y:
                    new_dict[key] = value
        
    topFreqWords = new_dict

    return topFreqWords

if __name__ == "__main__":
    Counts = countWords()

    englPrep = ['about', 'beside', 'near', 'to', 'above', 'between', 'of', 
            'towards', 'across', 'beyond', 'off', 'under', 'after', 'by',
            'on', 'underneath', 'against', 'despite', 'onto', 'unlike', 
            'along', 'down', 'opposite', 'until', 'among', 'during', 'out', 
            'up', 'around', 'except', 'outside', 'along', 'as', 'for', 
            'over', 'via', 'at', 'from', 'past', 'with', 'before', 'in', 
            'round', 'within', 'behind', 'inside', 'since', 'without', 
            'below', 'into', 'than', 'beneath', 'like', 'through']

    englConj = ['for', 'and', 'nor', 'but', 'or', 'yet', 'so']

    englPronouns = ['you', 'he', 'she', 'him', 'her', 'his', 'hers', 'yours']

    specialWords = ['the']
    
    topFreqWords = getMostFrequent()
Vladislav Povorozniuc
  • 2,149
  • 25
  • 26
uio2000
  • 11
  • 2
  • 1
    Does this answer your question? [Using global variables in a function](https://stackoverflow.com/questions/423379/using-global-variables-in-a-function) – Ken Y-N May 11 '21 at 14:42
  • 1
    While @KenY-N's suggestion will get you back on track, you really want to check out `collections.Counter()` and potentially `set()` as well – JonSG May 11 '21 at 15:06
  • I looked into it, but i don't understand how it relates to my for loop not working properly. – uio2000 May 11 '21 at 16:11

2 Answers2

0

Try to pass Counts dictionary in getMostFrequent(Counts). Your function should accept it unless Counts is declared in global scope.

wts
  • 86
  • 4
  • Thanks for the suggestion, I tried it but i just got the same result. Edit: I think the issue may be in my for loop. – uio2000 May 11 '21 at 15:58
0

In your code you take top 10 most frequent words including stopwords. You need to remove stopwords from Counts before sorting dict by value.

def getMostFrequent(Counts, englConj, englPronouns, specialWords):
    exclWordList = set(englConj + englPrep + englPronouns + specialWords)
    popitems = exclWordList.intersection(Counts.keys())
    for i in popitems:
        Counts.pop(i)
        
    topNumber = 10
    topFreqWords = dict(sorted(Counts.items(), key=lambda x: x[1], reverse=True)[:topNumber])
    return topFreqWords
wts
  • 86
  • 4