Questions tagged [information-gain]

34 questions
12
votes
1 answer

result of rpart is a root, but data shows Information Gain

I have a dataset with an event rate of less than 3% (i.e. there are about 700 records with class 1 and 27000 records with class 0). ID V1 V2 V3 V5 V6 Target SDataID3 161 ONE 1 FOUR 0 0 SDataID4 11 TWO 2 …
10
votes
1 answer

Feature importance 'gain' in XGBoost

I want to understand how the feature importance in xgboost is calculated by 'gain'. From https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7: ‘Gain’ is the improvement in accuracy brought by…
nellng
  • 103
  • 1
  • 1
  • 5
3
votes
1 answer

Python Information gain implementation

I am currently using scikit-learn for text classification on the 20ng dataset. I want to calculate the information gain for a vectorized dataset. It has been suggested to me that this can be accomplished, using mutual_info_classif from sklearn.…
2
votes
1 answer

What should I do in case I have dominant feature in XGB model?

I've recently faced a "strange" observation in my dataset. After XGB modeling with 20 features I plot top 10 features with the highest gain values. Result is shown below: F1 140027.061202 F2 11242.470370 F3 9957.161039 F4 …
2
votes
0 answers

Different results - Weka.infogain vs sklearn.mutual_info_classif

My data looks like : DATA | FEATURE1 | FEATURE2 | ... I | 0.3213 | 1.231 | ... A | 5.0945 | 0.923 | ... I | 0.3213 | 0.761 | ... ... | ... | .... | ... I'm using that code : import csv import numpy as np from…
2
votes
0 answers

Information Gain in R

I found packages being used to calculating "Information Gain" for selecting main attributes in C4.5 Decision Tree and I tried using them to calculating "Information Gain". But the results of calculation of each packages are different like the code…
Archimpressom
  • 41
  • 1
  • 5
2
votes
1 answer

What is Weka's InfoGainAttributeEval formula for evaluating Entropy with continuous values?

I'm using Weka's attribute selection function for Information Gain and I'm trying to figure out what the specific formula Weka uses when dealing with continuous data. I understand the usual formula for Entropy is this for when the values in the data…
2
votes
1 answer

Information gain vs minimizing entropy

In what scenario is maximizing of information gain not equivalent to minimizing of entropy? The broader question is why do we need the concept of information gain? Is it not sufficient to work only with entropy to decide the next optimal attribute…
Pradeep Vairamani
  • 4,004
  • 3
  • 36
  • 59
1
vote
0 answers

I got error message 'Boolean array expected for the condition, not int64'. Can anybody help me solve this problem?

I have a dataset in .csv format. It has 136 columns and 15036 rows. I want to compute entropy and information gain of each column of dataset. Here my code: def calc_entropy(column): counts = np.bincount(column) prob = counts/(len(column)) entropy =…
1
vote
1 answer

How to Use TF-IDF and combine it with Information Gain for feature selection in text classification?

i don't know the concept of how to combine TF-IDF result and use it in information gain mathematically . can someone explain it for me please?
1
vote
0 answers

Visualizing decision jungle in Azure Machine Learning Studio

I have trained a decision jungle model on Azure Machine Learning, and now I want to visualize the trees, to see if I can identify the root nodes that are the most determinant in the decision. When I right-click and click Visualize on the Train…
1
vote
2 answers

Calculating the entropy of a specific attribute?

This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to I don't understand how the entropy for each…
ribs2spare
  • 229
  • 1
  • 4
  • 12
1
vote
0 answers

Calculating Information Gain Ratio

I was searching for a piece of code that does Information Gain Ratio (IGR), in R or Python. I have found a handy R package, but it is not maintained, and has been removed from CRAN. However, I have found some old version and I took the liberty and…
striatum
  • 1,428
  • 3
  • 14
  • 31
0
votes
0 answers

I don’t get information gain equation in XGBoost

I’m currently studying XGBoost, and I learned that information gain in XGBoost is computed like this: XGBoost information gain What I’m curious is that, previously about information gain, I learned that it is computed (entropy of parent node - sum…
em seoyk
  • 1
  • 1
0
votes
0 answers

Feature importance 'gain' in XGBoost for multiclassification tasks

I'd like to ask what is the formula for the gain in XGBoost models for multi classification tasks. I know that for regression tasks it's calculated as SIMILARITY_LEFT_CHILD + SIMILARITY_RIGHT_CHILD - SIMILARITY_PARENT and that for binary…
1
2 3