Highest Voted 'information-gain' Questions

12

votes

1 answer

result of rpart is a root, but data shows Information Gain

I have a dataset with an event rate of less than 3% (i.e. there are about 700 records with class 1 and 27000 records with class 0). ID V1 V2 V3 V5 V6 Target SDataID3 161 ONE 1 FOUR 0 0 SDataID4 11 TWO 2 …

asked Oct 31 '17 at 06:39

Rachit Jain

123
5

10

votes

1 answer

Feature importance 'gain' in XGBoost

I want to understand how the feature importance in xgboost is calculated by 'gain'. From https://towardsdatascience.com/be-careful-when-interpreting-your-features-importance-in-xgboost-6e16132588e7: ‘Gain’ is the improvement in accuracy brought by…

python scikit-learn xgboost boosting information-gain

asked Aug 05 '19 at 14:30

nellng

103
1
1
5

3

votes

1 answer

Python Information gain implementation

I am currently using scikit-learn for text classification on the 20ng dataset. I want to calculate the information gain for a vectorized dataset. It has been suggested to me that this can be accomplished, using mutual_info_classif from sklearn.…

python machine-learning scikit-learn information-gain

asked Nov 11 '17 at 18:42

Roman Purgstaller

910
3
11
24

2

votes

1 answer

What should I do in case I have dominant feature in XGB model?

I've recently faced a "strange" observation in my dataset. After XGB modeling with 20 features I plot top 10 features with the highest gain values. Result is shown below: F1 140027.061202 F2 11242.470370 F3 9957.161039 F4 …

python data-science xgboost feature-selection information-gain

asked Dec 23 '19 at 13:45

Galilej25

128
5

2

votes

0 answers

Different results - Weka.infogain vs sklearn.mutual_info_classif

My data looks like : DATA | FEATURE1 | FEATURE2 | ... I | 0.3213 | 1.231 | ... A | 5.0945 | 0.923 | ... I | 0.3213 | 0.761 | ... ... | ... | .... | ... I'm using that code : import csv import numpy as np from…

machine-learning scikit-learn weka feature-selection information-gain

asked Nov 26 '17 at 19:03

0xhido

104
9

2

votes

0 answers

Information Gain in R

I found packages being used to calculating "Information Gain" for selecting main attributes in C4.5 Decision Tree and I tried using them to calculating "Information Gain". But the results of calculation of each packages are different like the code…

r weka decision-tree c4.5 information-gain

asked Jan 10 '17 at 01:30

Archimpressom

41
1
5

2

votes

1 answer

What is Weka's InfoGainAttributeEval formula for evaluating Entropy with continuous values?

I'm using Weka's attribute selection function for Information Gain and I'm trying to figure out what the specific formula Weka uses when dealing with continuous data. I understand the usual formula for Entropy is this for when the values in the data…

machine-learning formula weka entropy information-gain

asked Feb 27 '16 at 03:29

edthealchemist

31
7

2

votes

1 answer

Information gain vs minimizing entropy

In what scenario is maximizing of information gain not equivalent to minimizing of entropy? The broader question is why do we need the concept of information gain? Is it not sufficient to work only with entropy to decide the next optimal attribute…

math decision-tree entropy information-gain

asked Nov 21 '15 at 11:08

Pradeep Vairamani

4,004
3
36
59

1

vote

0 answers

I got error message 'Boolean array expected for the condition, not int64'. Can anybody help me solve this problem?

I have a dataset in .csv format. It has 136 columns and 15036 rows. I want to compute entropy and information gain of each column of dataset. Here my code: def calc_entropy(column): counts = np.bincount(column) prob = counts/(len(column)) entropy =…

python-3.x multiple-columns entropy information-gain

asked Mar 07 '21 at 03:32

Anastasia Harlow

11
2

1

vote

1 answer

How to Use TF-IDF and combine it with Information Gain for feature selection in text classification?

i don't know the concept of how to combine TF-IDF result and use it in information gain mathematically . can someone explain it for me please?

text-classification information-retrieval tf-idf feature-selection information-gain

asked Sep 03 '19 at 05:16

victorxu2

494
1
5
10

1

vote

0 answers

Visualizing decision jungle in Azure Machine Learning Studio

I have trained a decision jungle model on Azure Machine Learning, and now I want to visualize the trees, to see if I can identify the root nodes that are the most determinant in the decision. When I right-click and click Visualize on the Train…

decision-tree azure-machine-learning-service information-gain

asked Nov 13 '18 at 03:55

Dee

199
4
17

1

vote

2 answers

Calculating the entropy of a specific attribute?

This is super simple but I'm learning about decision trees and the ID3 algorithm. I found a website that's very helpful and I was following everything about entropy and information gain until I got to I don't understand how the entropy for each…

decision-tree id3 entropy information-gain

asked Jun 15 '16 at 15:36

ribs2spare

229
1
4
12

1

vote

0 answers

Calculating Information Gain Ratio

I was searching for a piece of code that does Information Gain Ratio (IGR), in R or Python. I have found a handy R package, but it is not maintained, and has been removed from CRAN. However, I have found some old version and I took the liberty and…

r entropy information-gain

asked Jul 24 '13 at 20:47

striatum

1,428
3
14
31

0

votes

0 answers

I don’t get information gain equation in XGBoost

I’m currently studying XGBoost, and I learned that information gain in XGBoost is computed like this: XGBoost information gain What I’m curious is that, previously about information gain, I learned that it is computed (entropy of parent node - sum…

xgboost decision-tree entropy information-gain

asked Dec 17 '22 at 13:11

em seoyk

1
1

0

votes

0 answers

Feature importance 'gain' in XGBoost for multiclassification tasks

I'd like to ask what is the formula for the gain in XGBoost models for multi classification tasks. I know that for regression tasks it's calculated as SIMILARITY_LEFT_CHILD + SIMILARITY_RIGHT_CHILD - SIMILARITY_PARENT and that for binary…

r xgboost multiclass-classification information-gain

asked Dec 13 '22 at 16:44

ohadbh

39
4

Questions tagged [information-gain]