I am currently working on a project that focuses on relation extraction from a corpus of Wikipedia text, and I plan to use an SVM to extract these relations. To model this, I plan to use Word features, POS Tag features, Entity features, Mention features and so on as mentioned in the following paper - https://gate.ac.uk/sale/eswc06/eswc06-relation.pdf (Page 6 onwards)
Now, I have set up the pipeline for feature extraction and got the corpus annotated and I wish to use a package like SVM-Light for the purpose of the project. According to the input file format of the SVM-Light package, this is the requisite format - .=. : : ... : #
Example (from the SVM-Light webpage) -
In classification mode, the target value denotes the class of the example. +1 as the target value marks a positive example, -1 a negative example respectively. So, for example, the line
-1 1:0.43 3:0.12 9284:0.2 # abcdef
specifies a negative example for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string abcdef is stored with the vector, which can serve as a way of providing additional information for user defined kernels.
Now, I wish to know how do we model the features that I am using whose values include words, POS Tags and entity types and subtypes into the feature vector accepted by the SVM-Light package, where each feature has a real number value associated with it. How is the mapping from my choice of features to these real values done?
It would be of great help if someone who has worked at a similar problem before could just prod me in the right direction.
Thanks.