I need to calculate both the union and the intersection of the weighted score of an element in a column of 2 different files.
Input file 1 and Input file 2 are the same: 3-tab separated columns: Here is an example:
input1
abc with-1-rosette-n 8.1530
abc with-1-tyre-n 6.3597
abc with-1-weight-n 4.8932
input2
deg about-article-n 3.2917
deg with-1-tyre-n 3.2773
deg about-bit-n 3.4527
We want to calculate the sum of intersection of the score(in Col3) of each value in Col 2 of ABC, where we consider the min(value) & DEG as well as the sum of the union of scores (in Col3) of each value in Col2 of ABC & DEG. So essentially, the desired output would be as follows:
In this case: intersection = 3.2773 (with-1-tyre-n) and union = 29.3546.
where we get a score by dividing the union by the intersection: score(intersection)/ score(union) So, from this sample dataset the desired output is as follows
abc deg 0.1165
I have been working very hard on the script and have been running into some problems. I have already incorporated the suggestions from here and here and here and I have not been able to solve my problem.
Here is a sample of the function of the code that I am working with:
def polyCalc(a_dict, b_dict):
intersect = min(classA & classB)
union = classA | classB
score = sum(intersect) / sum(union)
return score
def calculate_polyCalc(classB_infile, classA_infile, outfile):
targetContext_polyCalc_A = defaultdict(dict) # { target_lemma : {feat1 : weights, feat2: weights} ...}
with open(classA_infile, "rb") as opened_infile_A:
for line_A in opened_infile_A:
target_class_A, featureA, weight = line_A.split()
targetContext_polyCalc_A[target_class_A][featureA] = float(weight)
targetContext_polyCalc_B = defaultdict(dict)
with open(classB_infile, "rb") as opened_infile_B:
for line_B in opened_infile_B:
target_class_B, featureB, weight = line_B.split()
targetContext_polyCalc_B[target_class_B][featureB] = float(weight)
classA = set(targetContext_polyCalc_A[featureA])
classB = set(targetContext_polyCalc_B[featureB])
with open(outfile, "wb") as output_file:
poly = polyCalc(targetContext_polyCalc_A[target_class_A], targetContext_polyCalc_B[target_class_B], score)
outstring = "\t".join([classA, classB, str(poly)])
output_file.write(outstring + "\n")
I have followed all of the instructions in the documentation and various different websites - and I am still producing an error with the above code. Besides giving me errors with the definition of the function union
, I also seem to have a problem with how I have defined the dictionaries in themselves.
Can anyone provide some "experience" insight on how to solve this problem to reach my desired outcome?
Thank you in advance.
PS BTW this was written with python2.* in mind.