What's wrong with my code to get weight of MAXEnt

Question

I have 8 features for maxent classifier and want to know each weight, because I need information of how important each features is.

for i in range(len(list)):
        features = {}
        features['a'] = 0
        features['b'] = 0
        features['c'] = 0
        features['d'] = 0
        features['e'] = 0
        features['f'] = 0
        features['g'] = 0
        features['h'] = 0

        for j in range(len(list[i])):
            first, second = list[i][j].split('+')
            first_lexical, first_morph = first.split('/')
            second_lexical, second_morph = second.split('/')

            if first_lexical == second_lexical:
                features['a'] += 1
            if first_morph == second_morph:
                features['b'] += 1

                if "JC" in first_morph:
                    features['d'] += 1
                elif first_lexical == second_lexical:
                    if "EF" in first_morph:
                        features['d'] += 1
                    elif "EP" in first_morph:
                        features['e'] += 1
                    elif "XS" in first_morph:
                        features['f'] += 1
                    elif "JX" in first_morph:
                        features['g'] += 1
                    elif "JC" in first_morph:
                        features['h'] += 1

I use Maximum Entropy because to calculate structural similarity between two sentences. So i use features as count of same morpheme. It's why feature values are not 0 or 1.

When i run this code :

print(classifier.weights())

it prints 64 elements of list. I think it show print only 8 elements(weight) but it returns like this :

[ 1.74089048  2.66009496  1.42702806  0.14474766  0.14210167  0.15642977
  0.07329622  0.19233666  0.30679333  1.05599702  1.60007152 -0.17416653
  0.09417338  0.16386887  0.27088739 -0.72500181 -8.48476894  0.2924295
  0.29734346  0.28692798  1.24685007  1.13583538  0.34032173  0.97472507
  1.21521307  1.31532032  1.57745202  0.5204001   0.76549421  1.79209505
  0.44465357  0.73647553 -1.08840863  7.89243891  1.08035386 10.01641604
  1.12682947  0.37774782  0.85929749  0.16311825  0.45568935 -0.04190585
 -0.06698004 -0.08507122 -0.02308924 -0.10700906  0.10775206  0.66603408
 -0.39178407  0.13196092  0.09278365  0.36485199  0.64181725 -3.63790857
  2.32751187 -0.87754617  0.63697054 -3.16749379 -8.87589551  0.1192744
 -2.68618694 -3.6713022  -3.79744038 -1.1949963 ]

I want to know what it means of each element and How can I get weights of each elements.

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

The probability that a document x belongs to class y is given as below by the Maximum -entropy-model

The document x is represented using the indicator functions f(x,y). In maxent models the indicator functions can only be binary valued. However you are using features that are not boolean valued so maxent will be work.

Convert your features to boolean for example

f1 =  1 if 0 <= features['a'] < 10 else 0
f2 =  1 if 10 <= features['a'] < 20 else 0
f3 =  1 if features['a'] >= 20 else 0

Reference: link

edited Jun 20 '20 at 09:12

Community

1
1

answered May 16 '19 at 08:14

mujjiga

16,186
2
33
51

I added " for key in features: features[key] = 1 if features[key] >= 3 else 0 " but i got 21 weights. I want to know why I got more than 8 weights, although I gave only 8 features. – seKim May 16 '19 at 09:40

What's wrong with my code to get weight of MAXEnt

1 Answers1