This is similar but not the same as a previous question. Rather it builds upon it (although technically it is easier).
I'll link it here so you can get an idea:-
Creating a simple searching program
Still using my two nested dictionaries:-
wordFrequency = {'bit':{1:3,2:4,3:19,4:0},'red':{1:0,2:0,3:15,4:0},'dog':{1:3,2:0,3:4,4:5}}
search = {1:{'bit':1},2:{'red':1,'dog':1},3:{'bit':2,'red':3}}
The first dictionary links words a file number and the number of times they appear in that file. The second contains searches linking a word to the number of times it appears in the current search.
I want to use the Binary Independence Model, a good explanation is here:-
http://en.wikipedia.org/wiki/Binary_Independence_Model
It is simpler than my previous model because the number of times a particular word appears in a search or file is irrelevant, only the presence or absence is important i.e. it is boolean. Therefore it is similar, but if a word appears more than once in a search or file it is still treated as just 1.
The expected output is again a dictionary:-
{1: [3, 2, 1, 4], 2: [3, 4, 1, 2], 3: [3, 2, 1, 4]}
This is the output from the previous, vector space model program, so this output may be different.
My code so far:-
from collections import Counter
def retrieve():
wordFrequency = {'bit':{1:3,2:4,3:19,4:0},'red':{1:0,2:0,3:15,4:0},'dog':{1:3,2:0,3:4,4:5}}
search = {1:{'bit':1},2:{'red':1,'dog':1},3:{'bit':2,'red':3}}
results = {}
for search_number, words in search.iteritems():
file_relevancy = Counter()
for word, num_appearances in words.iteritems():
num_appearances = 1
for file_id, appear_in_file in wordFrequency.get(word, {}).iteritems():
appear_in_file = 1
file_relevancy[file_id] += num_appearances * appear_in_file
results[search_number] = [file_id for (file_id, count) in file_relevancy.most_common()]
return results
print retrieve
However, I am only getting an output of
{1: [1, 2, 3, 4], 2: [1, 2, 3, 4], 3: [1, 2, 3, 4]}
This is not correct, it is just returning the files in numeric order?