Questions tagged [information-gain]
34 questions
12
votes
1 answer
result of rpart is a root, but data shows Information Gain
I have a dataset with an event rate of less than 3% (i.e. there are about 700 records with class 1 and 27000 records with class 0).
ID V1 V2 V3 V5 V6 Target
SDataID3 161 ONE 1 FOUR 0 0
SDataID4 11 TWO 2 …

Rachit Jain
- 123
- 5
10
votes
1 answer
Feature importance 'gain' in XGBoost
I want to understand how the feature importance in xgboost is calculated by 'gain'. From https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7:
‘Gain’ is the improvement in accuracy brought by…

nellng
- 103
- 1
- 1
- 5
3
votes
1 answer
Python Information gain implementation
I am currently using scikit-learn for text classification on the 20ng dataset. I want to calculate the information gain for a vectorized dataset. It has been suggested to me that this can be accomplished, using mutual_info_classif from sklearn.…

Roman Purgstaller
- 910
- 3
- 11
- 24
2
votes
1 answer
What should I do in case I have dominant feature in XGB model?
I've recently faced a "strange" observation in my dataset. After XGB modeling with 20 features I plot top 10 features with the highest gain values. Result is shown below:
F1 140027.061202
F2 11242.470370
F3 9957.161039
F4 …

Galilej25
- 128
- 5
2
votes
0 answers
Different results - Weka.infogain vs sklearn.mutual_info_classif
My data looks like :
DATA | FEATURE1 | FEATURE2 | ...
I | 0.3213 | 1.231 | ...
A | 5.0945 | 0.923 | ...
I | 0.3213 | 0.761 | ...
... | ... | .... | ...
I'm using that code :
import csv
import numpy as np
from…

0xhido
- 104
- 9
2
votes
0 answers
Information Gain in R
I found packages being used to calculating "Information Gain" for selecting main attributes in C4.5 Decision Tree and I tried using them to calculating "Information Gain".
But the results of calculation of each packages are different like the code…

Archimpressom
- 41
- 1
- 5
2
votes
1 answer
What is Weka's InfoGainAttributeEval formula for evaluating Entropy with continuous values?
I'm using Weka's attribute selection function for Information Gain and I'm trying to figure out what the specific formula Weka uses when dealing with continuous data.
I understand the usual formula for Entropy is this for when the values in the data…

edthealchemist
- 31
- 7
2
votes
1 answer
Information gain vs minimizing entropy
In what scenario is maximizing of information gain not equivalent to minimizing of entropy? The broader question is why do we need the concept of information gain? Is it not sufficient to work only with entropy to decide the next optimal attribute…

Pradeep Vairamani
- 4,004
- 3
- 36
- 59
1
vote
0 answers
I got error message 'Boolean array expected for the condition, not int64'. Can anybody help me solve this problem?
I have a dataset in .csv format. It has 136 columns and 15036 rows. I want to compute entropy and information gain of each column of dataset. Here my code:
def calc_entropy(column):
counts = np.bincount(column)
prob = counts/(len(column))
entropy =…

Anastasia Harlow
- 11
- 2
1
vote
1 answer
How to Use TF-IDF and combine it with Information Gain for feature selection in text classification?
i don't know the concept of how to combine TF-IDF result and use it in information gain mathematically .
can someone explain it for me please?

victorxu2
- 494
- 1
- 5
- 10
1
vote
0 answers
Visualizing decision jungle in Azure Machine Learning Studio
I have trained a decision jungle model on Azure Machine Learning, and now I want to visualize the trees, to see if I can identify the root nodes that are the most determinant in the decision.
When I right-click and click Visualize on the Train…

Dee
- 199
- 4
- 17
1
vote
2 answers
Calculating the entropy of a specific attribute?
This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to
I don't understand how the entropy for each…

ribs2spare
- 229
- 1
- 4
- 12
1
vote
0 answers
Calculating Information Gain Ratio
I was searching for a piece of code that does Information Gain Ratio (IGR), in R or Python. I have found a handy R package, but it is not maintained, and has been removed from CRAN. However, I have found some old version and I took the liberty and…

striatum
- 1,428
- 3
- 14
- 31
0
votes
0 answers
I don’t get information gain equation in XGBoost
I’m currently studying XGBoost, and I learned that information gain in XGBoost is computed like this:
XGBoost information gain
What I’m curious is that, previously about information gain, I learned that it is computed (entropy of parent node - sum…

em seoyk
- 1
- 1
0
votes
0 answers
Feature importance 'gain' in XGBoost for multiclassification tasks
I'd like to ask what is the formula for the gain in XGBoost models for multi classification tasks.
I know that for regression tasks it's calculated as SIMILARITY_LEFT_CHILD + SIMILARITY_RIGHT_CHILD - SIMILARITY_PARENT
and that for binary…

ohadbh
- 39
- 4