Questions tagged [gini]

The Gini coefficient (also known as the Gini index or Gini ratio) (/dʒini/ jee-nee) is a measure of statistical dispersion intended to represent the income distribution of a nation's residents, and is the most commonly used measure of inequality.

The Gini coefficient (also known as the Gini index or Gini ratio) (/dʒini/ jee-nee) is a measure of statistical dispersion intended to represent the income distribution of a nation's residents, and is the most commonly used measure of inequality.

72 questions
1
vote
1 answer

Integer overflow error using Gini function of package DescTools

I want to calculate Gini coefficients using Gini() from DescTools(because it offers an easy way to calculate "unbiased" Gini coefficients with weights, confidence intervals, etc.), but I get some errors when I use this function with "big" samples.…
1
vote
1 answer

Calculating Gini by Row in R

stackoverflow. I'm trying to calculate the gini coefficient within each row of my dataframe, which is 1326 rows long, by 6 columns (1326 x 6). My current code... attacks$attack_gini <- gini(x =…
Worville11
  • 93
  • 1
  • 1
  • 5
1
vote
0 answers

Why doesn't a decision tree result in the same significance as a Chi-square test?

I’m fairly new to decision trees. When I do a Chi-square test between a binary, categorical variable and family size, I get the following p-value and subsequent pairwise p-values from post-hoc analysis using the Bonferroni control method in the…
Naim
  • 31
  • 7
0
votes
0 answers

What loss criterion does CARET's random forest use?

What is the default loss criterion that CARET's random forest uses for classification (e.g. gini, entropy, log-loss)? In scikit-learn, gini is the default loss function of the random forest (source). However, I cannot find what the default loss…
E. Turok
  • 106
  • 1
  • 7
0
votes
0 answers

Extracting and Plotting tree from the RF model

I have developed a random forest model in R using randomForestSRC package. There are a total of 450 trees. I can extract and plot a single tree from the model using get.tree function of ggRandomForest package. I want to extract and plot a single…
0
votes
1 answer

Somersd function from scipy.stats produces type error in Python

I would like to calculate the somers d. I have the following input cross table in python as a data frame called crosstab: 0 1 2 3 4 100 80 100 4 500 50 3 2 0 38 40 0 4 0 40 2000 100 100 4 400 I try to calculate the sommers d and…
dika
  • 63
  • 4
0
votes
0 answers

How to construct an imbalanced MNIST-dataset based on a pre-defined gini-coefficient?

My goal is to make different versions of the MNIST dataset with different pre-defined levels of imbalancedness. A gini-coefficient (range: 0-1) is a measure of imbalancedness of a dataset where 0 represents perfect equality and 1 represents perfect…
0
votes
0 answers

Parallelize a loop task

I have a function 'GiniLib' with 3 input arguments. I like to have this function calculated on many columns of my pyspark dataframe. Since it's very slow I'd like to parallelize it with either pool from multiprocessing or with parallel from…
Sonny
  • 1
  • 1
0
votes
0 answers

How to get variable/model gini or gains table or...?

I'm working with a colleague concurrently between R and MS Excel looking at credit risk scorecard modelling. In Excel he has calculated what he says is the gini coefficient for certain variables, which he has calculated by ranking the variable from…
StMatthias
  • 19
  • 5
0
votes
0 answers

Quicker Model Gini

I've calculated the model gini for a regression that I have run. I've done this using the method below: ###Model gini library(MLmetrics) library(pROC) # Full Model predicted <- predict(mylogit, my_data, type="response") #calculate AUC aucc <-…
StMatthias
  • 19
  • 5
0
votes
0 answers

Confidence intervals (bootstrap) for CAP Curve

Please see if everything is correct with this code for calculating confidence intervals for the CAP Сurve. `alpha <- 0.9 B <- 2000 boot_samples <- t(replicate(B, sample(p_hat, length(p_hat), replace = TRUE))) boot_means <- apply(boot_samples, 2,…
Olesia
  • 1
  • 1
0
votes
0 answers

how to do you count manually CART algorithm and gini index for decision tree using excel

I've been learning about CART algorithm, and now i want to do counting manually CART algorithm using excel. But i still dont know how to do it. can someone help me how to do manual counting for CART algorithm and Gini index using excel ? I…
fera fani
  • 11
  • 4
0
votes
0 answers

How to count gini in dataset using excel?

Im try to manually count gini for my 300 datasets with two columns with 5 index using excel. But im still dont know how to count in exce. it's so confusing. data1: data2 data3 from my data next columns is 0,1,2,3,4 each every index have 30 data. i…
fera fani
  • 11
  • 4
0
votes
1 answer

How can I improve this Python code to calculate Information Gain from Gini impurity?

The following code is intended to calculate info gain from a dataset, using Gini impurity. I thought the code that I wrote is functional and should perform successfully in all cases, but there are several hidden test cases on Sololearn that it fails…
Jimmy T.
  • 1
  • 2
0
votes
0 answers

Comparing Gini indices and Hill numbers between groups

Also posted on CrossValidated but I think I might get more traction here. I'm looking a R packages that can test for differences in Gini indices and Hill numbers between two groups. The R package simboot's sbdiv function does that for the Shannon…
dan
  • 6,048
  • 10
  • 57
  • 125