0

I have a grouped data structure of different households answering a weekly poll and I observe them over 52 weeks (in the example below four weeks). Now I want to use the Gini coefficient to quantify the degree of (in-)equality of poll answers across all households at a given week (where 0 = all households have answered the same number of polls; 1 = one household answered all polls).

Example data:

da_poll <- data.frame(household = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), week = c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4), participation = c(1,1,1,1,0,0,0,1,0,1,0,1,1,1,1,0))

da_poll
       household week participation
    1          1    1             1
    2          1    2             1
    3          1    3             1
    4          1    4             1
    5          2    1             0
    6          2    2             0
    7          2    3             0
    8          2    4             1
    9          3    1             0
    10         3    2             1
    11         3    3             0
    12         3    4             1
    13         4    1             1
    14         4    2             1
    15         4    3             1
    16         4    4             0

I now started computing the Gini coefficient for every week:

library(DescTools)

da_poll = group_by(da_poll, household) %>%
   mutate(n_polls = cumsum(participation == 1)) %>%
   group_by(week) %>%
   mutate(gini_polls = Gini(n_polls))

da_poll
# A tibble: 16 x 5
# Groups:   week [4]
   household  week participation n_polls gini_polls
       <dbl> <dbl>         <dbl>   <int>      <dbl>
 1         1     1             1       1      0    
 2         1     2             1       2      0.143
 3         1     3             1       3      0.259
 4         1     4             1       4      0.167
 5         2     1             1       1      0    
 6         2     2             0       1      0.143
 7         2     3             0       1      0.259
 8         2     4             1       2      0.167
 9         3     1             1       1      0    
10         3     2             1       2      0.143
11         3     3             0       2      0.259
12         3     4             1       3      0.167
13         4     1             1       1      0    
14         4     2             1       2      0.143
15         4     3             1       3      0.259
16         4     4             0       3      0.167

Now I want to add a second variable indicating the change in the Gini coefficient (Gini after household h fills out poll at week w – Gini before household w fills out poll at w) through a household participating in the poll in a week. How can I solve this issue?

Scijens
  • 541
  • 2
  • 11
  • What do you mean by „... through a household.... in the poll“? Do you want the change (from week to week) if the household participated in the poll in both week? in every week? At least in 1 or conditional on column participation being 1? – Manuel R May 23 '20 at 13:34
  • I want the change in the Gini coefficient through the hiusehold's participation in the poll in this week. I think there are options: a) with or b) without considering the participation of other households in this week. Would be great to have both. I hope this answers your question. – Scijens May 23 '20 at 16:01

1 Answers1

1

I can't recap your work; you've used a Gini function without telling us what packages you're using. But just grabbing your result.

da_poll2 <- read_table("C  household  week participation n_polls gini_polls
 1         1     1             1       1      0    
 2         1     2             1       2      0.143
 3         1     3             1       3      0.259
 4         1     4             1       4      0.167
 5         2     1             1       1      0    
 6         2     2             0       1      0.143
 7         2     3             0       1      0.259
 8         2     4             1       2      0.167
 9         3     1             1       1      0    
10         3     2             1       2      0.143
11         3     3             0       2      0.259
12         3     4             1       3      0.167
13         4     1             1       1      0    
14         4     2             1       2      0.143
15         4     3             1       3      0.259
16         4     4             0       3      0.167") %>% 
  select(- C)

da_poll2 %>% 
  group_by(household) %>% 
  mutate(prevGini = lag(gini_polls),
         deltaGini = gini_polls - prevGini ) %>%
  ungroup()

Gives us

# A tibble: 16 x 7
   household  week participation n_polls gini_polls prevGini deltaGini
       <dbl> <dbl>         <dbl>   <dbl>      <dbl>    <dbl>     <dbl>
 1         1     1             1       1      0       NA        NA    
 2         1     2             1       2      0.143    0         0.143
 3         1     3             1       3      0.259    0.143     0.116
 4         1     4             1       4      0.167    0.259    -0.092
 5         2     1             1       1      0       NA        NA    
 6         2     2             0       1      0.143    0         0.143
 7         2     3             0       1      0.259    0.143     0.116
 8         2     4             1       2      0.167    0.259    -0.092
 9         3     1             1       1      0       NA        NA    
10         3     2             1       2      0.143    0         0.143
11         3     3             0       2      0.259    0.143     0.116
12         3     4             1       3      0.167    0.259    -0.092
13         4     1             1       1      0       NA        NA    
14         4     2             1       2      0.143    0         0.143
15         4     3             1       3      0.259    0.143     0.116
16         4     4             0       3      0.167    0.259    -0.092
David T
  • 1,993
  • 10
  • 18
  • Hi David, Thanks for your solution. The package I am using to calculate the Gini coefficient is DescTools. I think the way you have coded this, the new variable considers the change in the Gini coefficient through the participation of all household, but I want to have the incremental value of the participation of one particular household on the change of the Gini coefficient. So the question is, what would the Gini coefficient be, if a particular household would not have participated in a given week (...but in fact did). We the difference fo the actual Gini to get deltaGini. – Scijens May 23 '20 at 16:04