0

I have 8 features for maxent classifier and want to know each weight, because I need information of how important each features is.

for i in range(len(list)):
        features = {}
        features['a'] = 0
        features['b'] = 0
        features['c'] = 0
        features['d'] = 0
        features['e'] = 0
        features['f'] = 0
        features['g'] = 0
        features['h'] = 0

        for j in range(len(list[i])):
            first, second = list[i][j].split('+')
            first_lexical, first_morph = first.split('/')
            second_lexical, second_morph = second.split('/')

            if first_lexical == second_lexical:
                features['a'] += 1
            if first_morph == second_morph:
                features['b'] += 1

                if "JC" in first_morph:
                    features['d'] += 1
                elif first_lexical == second_lexical:
                    if "EF" in first_morph:
                        features['d'] += 1
                    elif "EP" in first_morph:
                        features['e'] += 1
                    elif "XS" in first_morph:
                        features['f'] += 1
                    elif "JX" in first_morph:
                        features['g'] += 1
                    elif "JC" in first_morph:
                        features['h'] += 1

I use Maximum Entropy because to calculate structural similarity between two sentences. So i use features as count of same morpheme. It's why feature values are not 0 or 1.

When i run this code :

print(classifier.weights())

it prints 64 elements of list. I think it show print only 8 elements(weight) but it returns like this :

[ 1.74089048  2.66009496  1.42702806  0.14474766  0.14210167  0.15642977
  0.07329622  0.19233666  0.30679333  1.05599702  1.60007152 -0.17416653
  0.09417338  0.16386887  0.27088739 -0.72500181 -8.48476894  0.2924295
  0.29734346  0.28692798  1.24685007  1.13583538  0.34032173  0.97472507
  1.21521307  1.31532032  1.57745202  0.5204001   0.76549421  1.79209505
  0.44465357  0.73647553 -1.08840863  7.89243891  1.08035386 10.01641604
  1.12682947  0.37774782  0.85929749  0.16311825  0.45568935 -0.04190585
 -0.06698004 -0.08507122 -0.02308924 -0.10700906  0.10775206  0.66603408
 -0.39178407  0.13196092  0.09278365  0.36485199  0.64181725 -3.63790857
  2.32751187 -0.87754617  0.63697054 -3.16749379 -8.87589551  0.1192744
 -2.68618694 -3.6713022  -3.79744038 -1.1949963 ]

I want to know what it means of each element and How can I get weights of each elements.

seKim
  • 3
  • 3

1 Answers1

0

The probability that a document x belongs to class y is given as below by the Maximum -entropy-model

enter image description here

The document x is represented using the indicator functions f(x,y). In maxent models the indicator functions can only be binary valued. However you are using features that are not boolean valued so maxent will be work.

Convert your features to boolean for example

f1 =  1 if 0 <= features['a'] < 10 else 0
f2 =  1 if 10 <= features['a'] < 20 else 0
f3 =  1 if features['a'] >= 20 else 0

Reference: link

Community
  • 1
  • 1
mujjiga
  • 16,186
  • 2
  • 33
  • 51
  • I added " for key in features: features[key] = 1 if features[key] >= 3 else 0 " but i got 21 weights. I want to know why I got more than 8 weights, although I gave only 8 features. – seKim May 16 '19 at 09:40