I have a dataset of 10K, and I created the following ten features:
- Distance - (0 or 1)
- IsPronoun - (True or False)
- String Match - (True or False)
- Demonstrative NP - (True if i and j is demonstrative pronoun)
- Number Agreement - (check if i or j is singular or plural pronoun)
- Semantic compatibility - (if i and j semantically compitable)
- Gender agreement - (check if i or j is male/female)
- IsProperNoun - (find i or j is proper noun or not)
- Appositive - (find if i is opposit of j)
- Alias - (find if i is alias of j or vice verses)
Each of the features has an output from the dataset. Now I want to make the tree. But first, how should I calculate the entropy and information gain?