Questions tagged [gini]

The Gini coefficient (also known as the Gini index or Gini ratio) (/dʒini/ jee-nee) is a measure of statistical dispersion intended to represent the income distribution of a nation's residents, and is the most commonly used measure of inequality.

The Gini coefficient (also known as the Gini index or Gini ratio) (/dʒini/ jee-nee) is a measure of statistical dispersion intended to represent the income distribution of a nation's residents, and is the most commonly used measure of inequality.

72 questions
1
vote
0 answers

Is there any possibility to display the gini-impurity (or information gain) in rpart.plot?

I would like to see the gini-impurity on each node in rpart.plot, like in python: https://miro.medium.com/max/2408/1*aBIvTfp5gZ2F0ZSHbd3DSQ.png vs Is there any possibility to display the gini-impurity (or information gain) in rpart.plot?
chaed
  • 125
  • 7
1
vote
0 answers

Accuracy Ratio (Gini coef) computation in Python by Definition and ROC Method

Why do the following methods of computing the accuracy ratio give different results? Approach 1: Cumulative Accuracy Profile (CAP) curve The accuracy ratio is computed from definition as the difference between the area under curve of the CAP of the…
1
vote
0 answers

Can gini and entropy accuracy values ​be the same in decision trees?

I created a decision tree in the codes I wrote on the Jupyter notebook. (with gini and entropy criteria) Then I made an accuracy calculation and created a report. However, in my transaction, report and accuracy were exactly the same. Is it possible…
1
vote
0 answers

Gini Coefficient parallel/streaming implementation with unsorted input

Is there a streaming implementation to compute the Gini Coefficient (not to be confused with the Gini Impurity used in decision trees induction) of an unsorted input? Currently, I am aware of two implementation for the Gini Coefficient: One…
Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154
1
vote
0 answers

Why is Gini impurity greater than the weighted sum of Gini impurities of subnodes

Say the decision tree has k classes (c1,c2,...,ck) to classify and the dataset of the parent node is D. Pi denotes the proportion of elements labelled with class ci. And Gini impurity is: If one partitions the node to subnodes with subsets D1 and…
YOLOv4
  • 53
  • 8
1
vote
0 answers

Performance numpy vs for loop with Mean Difference (Gini)

My goal is to find a fast solution for implementing the mean difference (Gini) for qualitative data. Since some of the arrays may have millions of values, I look for the fastest implementation. While programming, I am wondering why a for loop is…
MaGarb
  • 43
  • 6
1
vote
1 answer

How can I get Gini index after doing a grid search in hyper parameter tuning of GBM for a tweedie loss function?

I am doing hyperparameter tuning for gbm model in H2o and since my loss function is Tweedie I don't want to look at mse as my model selection criteria. In H2o documentation, it says that Gini index can be calculated for both regression and…
Rio
  • 398
  • 2
  • 15
1
vote
0 answers

How is the MeanGiniDecrease for each feature calculated in randomForest package?

With my understanding that the Gini decrease can be calculated in a straightforward manner by subtracting the Gini impurity of child nodes from the parent node, how are all calculations aggregated per feature across the forest? For example I have…
brucezepplin
  • 9,202
  • 26
  • 76
  • 129
1
vote
1 answer

R: apply function to subsets based on column value

I have a data frame called income.df that looks something like this: ID region income 1 rot 3700 2 ams 2500 3 utr 3300 4 utr 5300 5 utr 4400 6 ams 3100 8 ams 3000 9 rot 4000 10 rot 4400 12 rot 2000 I want to use the Gini function to compute the…
Abdel
  • 5,826
  • 12
  • 56
  • 77
1
vote
2 answers

Find 3 subsample with the same (approximately) Gini coefficient

Let's say I have a sample of N individuals and a random variable X which represent their annual income in a foreign currency. An example of X could be the…
toyo10
  • 121
  • 1
  • 14
1
vote
1 answer

How to amend the splitting criteria (gini/entropy) in a decision tree algorithm in Scikit-Learn?

I work with a decision tree algorithm on a binary classification problem and the goal is to minimise false positives (maximise positive predicted value) of the classification (the cost of a diagnostic tool is very high). Is there a way to introduce…
Arnold Klein
  • 2,956
  • 10
  • 31
  • 60
1
vote
1 answer

R dataframe not creating properly

I have used the following code to obtain mean decrease in accuracy for random forest AAA<-randomForest(CPercentage~., data=data, importance= T) BBB<-as.data.frame(importance(AAA)) I have created the following dataframe by the above process …
Raghavan vmvs
  • 1,213
  • 1
  • 10
  • 29
1
vote
0 answers

Formula of computing the Gini Coefficient in fastgini

I use the fastgini package for Stata (https://ideas.repec.org/c/boc/bocode/s456814.html). I am familiar with the classical formula for the Gini coefficient reported for example in Karagiannis & Kovacevic (2000)…
1
vote
1 answer

Computing Gini index in tensorflow

I'm trying to write down the gini index calculation as a tensorflow cost function. Gini index is: https://en.wikipedia.org/wiki/Gini_coefficient a numpy solution would be def ginic(actual, pred): n = len(actual) a_s =…
Ilya
  • 561
  • 2
  • 17
1
vote
1 answer

Is there any function that calculates Gini Index for CART(Decision Tree Algorithm) in R?

In using CART, I would like to select primary attributes from whole attributes using Gini index. But I couldn't find any functions or packages containing it. If there are any functions or packages that calculates Gini index, Please let me know.
Archimpressom
  • 41
  • 1
  • 5