1

I use the fastgini package for Stata (https://ideas.repec.org/c/boc/bocode/s456814.html).

I am familiar with the classical formula for the Gini coefficient reported for example in Karagiannis & Kovacevic (2000) (http://onlinelibrary.wiley.com/doi/10.1111/1468-0084.00163/abstract)

Formula I:

enter image description here

Here G is the Gini coefficient, µ the mean value of the distribution, N the sample size and y_i the income of the ith sample unit. Hence, the Gini coefficient computes the difference between all available income pairs in the data and calculates the total of all absolute differences.

This total is then normalized by dividing it by population squared times mean income (and multiplied by two?).

The Gini coefficient ranges between 0 and 1, where 0 means perfect equality (all individuals earn the same) and 1 refers to maximum inequality (1 person earns all the income in the country).

However the fastgini package refers to a different formula (http://fmwww.bc.edu/repec/bocode/f/fastgini.html):

Formula II:

fastgini uses formula:
                  i=N      j=i
                  SUM W_i*(SUM W_j*X_j - W_i*X_i/2)
                  i=1      j=1
      G = 1 - 2* ----------------------------------
                       i=N             i=N
                       SUM W_i*X_i  *  SUM W_i
                       i=1             i=1

where observations are sorted in ascending order of X.

Here W seems to be the weight, which I don't use, therefore it should be 1 (?). I’m not sure whether formula I and formula II are the same. There are no absolute differences and the result is subtracted from 1 in formula II. I have tried to transform the equations but I don’t get any further.

Could someone give me a hint whether both ways of computing (formula I + formula II) are equivalent?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • If we'e talking about _definitions_ then Gini isn't defined as being about incomes of individuals; that just is a very common application. The coefficient makes sense as characterising any distribution which is strictly positive. There is also no mystery about weights: weights are supplied for use with grouped data, so as you say weights would be identical (conveniently 1) for individuals. You're asking for a computational formula, so I agree that this isn't a programming question. – Nick Cox Oct 18 '17 at 08:10
  • If I can find a source quickly for the derivation, I will post later. If someone is quicker, well and good. – Nick Cox Oct 18 '17 at 08:12
  • Income is just the common example, which I wanted to refer to. Thank you for your quick first answer Nick, I would be glad if you could give me a source for the derivation. – user3617195 Oct 18 '17 at 08:48
  • Lambert, P.J. 2001. _The Distribution and Redistribution of Income._ Manchester University Press is one source. It's not to hand right now, but my recollection from some years back is that Nygård, F. and Sandström, A. 1981. _Measuring income inequality._ Stockholm: Almqvist & Wiksell is really detailed on formulas. – Nick Cox Oct 18 '17 at 12:15
  • Not an answer to your question, but is it possible to computer the Gini coefficient without either 1) sorting the inputs (NlogN?), and 2) do an all-pairs comparison (N^2?). I have a huge dataset and was wondering if the algorithm could be parallelized in any ways as well. – Marsellus Wallace Dec 18 '20 at 22:44

0 Answers0