Questions tagged [gini]

The Gini coefficient (also known as the Gini index or Gini ratio) (/dʒini/ jee-nee) is a measure of statistical dispersion intended to represent the income distribution of a nation's residents, and is the most commonly used measure of inequality.

The Gini coefficient (also known as the Gini index or Gini ratio) (/dʒini/ jee-nee) is a measure of statistical dispersion intended to represent the income distribution of a nation's residents, and is the most commonly used measure of inequality.

72 questions
0
votes
0 answers

Decision Tree in R is not splitting

I am trying to split my data into categories to understand which groups have more probability of being "Default". Therefore I want to use a decision tree. My data has 809054 observations and 8 variables. And if I consider just a small sample of my…
Mariana da Costa
  • 173
  • 2
  • 12
0
votes
1 answer

How to get the feature importance of ranklib generated random forests model?

Using ranklib's learning to rank random forests generates an xml-like model. Ranklib has a tool that provides features' frequency which cannot necessarily be considered as feature importance. How can I get the Gini feature importance or Gini index…
Soroosh Sorkhani
  • 66
  • 1
  • 3
  • 15
0
votes
1 answer

Improve precision of my predictive technique in Python

I am using the following Python code to make output predictions depending on some values using decision trees based on entropy/gini index. My input data is contained in the file:…
0
votes
1 answer

Compute change in Gini coefficient through event

I have a grouped data structure of different households answering a weekly poll and I observe them over 52 weeks (in the example below four weeks). Now I want to use the Gini coefficient to quantify the degree of (in-)equality of poll answers across…
Scijens
  • 541
  • 2
  • 11
0
votes
1 answer

Gini coefficient in panel data

I have a grouped data structure (different households answering a weekly opinion poll) and I observe them over 52 weeks (in the example below four weeks). Now I want to indicate the value of a household at a given point in time using the gini…
Scijens
  • 541
  • 2
  • 11
0
votes
2 answers

adding new pandas df column based on operations row-wise

I have a Dataframe like this: Interesting genre_1 probabilities 1 no Empty 0.251306 2 yes Empty 0.042043 3 no Alternative 5.871099 4 yes Alternative…
Javiss
  • 765
  • 3
  • 10
  • 24
0
votes
1 answer

How to get the points for a Lorenz curve plot with SQL?

I'm working with BigQuery, and I'm interested in plotting a Lorenz curve (for inequality, related to the Gini coefficient). How can I produce the data for a plot like this with SQL? The curve is a graph showing the proportion of overall income or…
Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325
0
votes
1 answer

Decision Tree Splitting strategy

I have a dataset with 4 categorical features (Cholesterol, Systolic Blood pressure, diastolic blood pressure, and smoking rate). I use a decision tree classifier to find the probability of stroke. I am trying to verify my understanding of the…
0
votes
1 answer

Feature importance from benchmark experiment using nested cross-validation

I am using mlr package in R to compare two learners, i.e. random forest and lasso classifier, on a binary classification task. I would like to extract the features' importance for the best classifier, random forest in this case, in a similar way to…
FcmC
  • 143
  • 9
0
votes
0 answers

Which predictors are associated with specifically 1 of the binary outcomes on randomForest on R?

When using the importance() function on R's randomForest you can get a list of the most important predictors. I was wondering how to tell which predictors are associated with 1 of the specific binary outcomes? (i.e. which predictors are associated…
Alicia
  • 57
  • 1
  • 9
0
votes
1 answer

SAS: Proc freq by group for all variables at once?

I use Proc freq to calculate the Somers' D between the dependent variable (log salary) and the independent variable (crhits, crhome, etc.) Is there a way to get the all the results in one proc freq statement? The code I use currently is DATA…
78282219
  • 85
  • 1
  • 8
0
votes
1 answer

Calculate monthly returns from a dataframe

I have been asked to calculate the Gini coefficient (dispersion of allocation weighting) in 18 sectoral ETFs with historical data available since 2000. Here is an excerpt: > head(df) Date .SXQR .SXTR .SXNR .SXMR .SXAR .SX3R .SX6R …
Revolucion for Monica
  • 2,848
  • 8
  • 39
  • 78
0
votes
0 answers

How to interpret MeanDecreaseGini, what does the numbers mean on the scale?

I have to interpret the most important variables by speaking about the MeanDecreaseGini in randomForest. For example, I have variable x, with a value of 250. How can you explain that 250? What does that 250 mean?
Looz
  • 377
  • 2
  • 14
0
votes
0 answers

Gini coefficient and Lorenz curve for credit scoring data in R

I've got the problem with calculating Gini coefficient and drawing Lorenz curve in R for credit scoring data. My raw data is in following columns: client number (Col A), scoring points (Col B), bad/good (0 - good or 1 - bad) (Col C). Which package…
Asteme
  • 1
0
votes
0 answers

Calculating Gini and AUC in R , results depending on number of variables

I tried to find R functions to calculate the Gini-coefficient and the AUC in R. I found the packages ROCR and MLmetrics. Usually you can switch between AUC and Gini by Gini = 2 AUC -1 in the following example this is true for the case of 2…
Richi W
  • 3,534
  • 4
  • 20
  • 39