I'm having some difficulties trying to calculate the gini coefficient using binned census data, and would really appreciate any help.
My data looks a little something like this (but with 14,000 observations of 13 variables).
location <- c('A','B','C', 'D', 'E', 'F')
no_income <- c(20, 1, 40, 79, 12, 2)
income1 <- c(13, 4, 56, 17, 9, 4)
income2 <- c(27, 39, 49, 12, 19, 0)
income3 <- c(0, 1, 4, 3, 27, 0)
df <- data.frame(location, no_income, income1, income2, income3)
So for each observation there is a location given, and then a series of columns indicating how many households in the area earn within the given income bracket (so for location A, 20 households earn $0, 13 earn income1, 27 income2, and 0 income3).
I've created an empty column to return the results to:
df$gini = 0
I've then created a numerical vector (x) containing the income amount I want to use for each income bin
x <- c(0, 300, 1000, 2000)
I've been trying to use the gini function within the reldist package, and have written the following for loop to cycle through each row of the data, apply the gini function and return the output to a new column.
for (i in 1:nrow(samp)){
w <- samp[i,2:5]
df$gini <- gini(x, w=rep(1, length=length(x)))
}
The problem is that the ouput returned is currently identical for each row, which is obviously not correct. I'm relatively new to this though, and not sure what I'm doing wrong...