0

I have got a data frame like the following:

region        group        mid_pop   
  1             2            1146      
  2             4            1682        
  3             3            2891       
  4             1            7654       
  5             1            3289       
  6             2            1128       
  7             3            2121       
  8             4            3217      
  9             3            1616       
 10             1            1717              

I ran a multinomial regression and got the probability of belonging to each group as follows:

 mlogit <- multinom(group ~ mid_pop)
 probs <- predict(mlogit, type="probs")  

   probs1    probs2   probs3    probs4     
     0.2       0.3     0.4       0.1        
     0.3       0.4     0.15      0.15       
     0.4       0.1     0.3       0.2        
     0.7       0.1     0.1       0.1        
     0.2       0.3     0.4       0.1        
     0.6       0.1     0.1       0.2        
     0.7       0.1     0.1       0.1        
     0.3       0.2     0.1       0.4        
     0.2       0.1     0.1       0.6        
     0.1       0.2     0.1       0.6   

Then I created a weight for each region. The weight is "the probability of belonging to group one divided by the probability of belonging to the current group the region is in". The weight then was multiplied by mid_pop.

region        group        mid_pop    weight      mid_pop(weighted)
  1             2            1146      0.66           756.36
  2             4            1682       2              3364
  3             3            2891       2              5782
  4             1            7654       0.7            5357.8
  5             1            3289       0.2            657.8
  6             2            1128       0.3            338.4
  7             3            2121       0.7            1484.7
  8             4            3217       0.75           2412.75
  9             3            1616       0.33           533.28
 10             1            1717       0.16           274.72     

Now I would like to do a standardized mean difference for groups and see the difference between the mean of mid_pop before and after weighting. The result will be something like this:

SDM (group 1 vs. group 2)=....
SDM (group 1 vs. group 3)=....
SDM (group 1 vs. group 4)= ....

Anyone can help us to do it? Thanks in advance.

Eshmel
  • 49
  • 2
  • 9

1 Answers1

1

Use group_by of tidyverse library

library(tidyverse)

df <-
  tibble(
    region = 1:10,
    group = c(2, 4, 3, 1, 1, 2, 3, 4, 3, 1),
    mid_pop = c(1146, 1682, 2891, 7654, 3289, 1128, 2121, 3217, 1616, 1717)
  ) # your data set

weight <- c(.66, 2, 2, .7, .2, .3, .7, .75, .33, .16)

df_wt <-
  df %>%
  bind_cols(weight = weight) %>%
  mutate(weighted = mid_pop * weight) %>% # your second data set: mid_pop(weighted)
  group_by(group) %>%
  summarise(pop = mean(weighted)) # average

## > df_wt
## # A tibble: 4 x 2
##   group   pop
##   <dbl> <dbl>
## 1     1 2097.
## 2     2  547.
## 3     3 2600.
## 4     4 2888.

outer function with "-" operation might give pairwise difference

wt_pop <- df_wt %>% select(pop) %>% pull()

outer(wt_pop, wt_pop, "-") # symmetric matrix for the answer

##            [,1]     [,2]       [,3]       [,4]
## [1,]     0.0000 1549.393  -503.2200  -791.6017
## [2,] -1549.3933    0.000 -2052.6133 -2340.9950
## [3,]   503.2200 2052.613     0.0000  -288.3817
## [4,]   791.6017 2340.995   288.3817     0.0000

Alternatively, you can consecutively apply outer.
You need to change it to data frame with function such as as.data.frame()

df %>%
  bind_cols(weight = weight) %>%
  mutate(weighted = mid_pop * weight) %>%
  group_by(group) %>%
  summarise(pop = mean(weighted)) %>%
  do(outer(.$pop, .$pop, "-") %>% as_tibble())

## # A tibble: 4 x 4
##       V1    V2     V3     V4
##    <dbl> <dbl>  <dbl>  <dbl>
## 1     0  1549.  -503.  -792.
## 2 -1549.    0  -2053. -2341.
## 3   503. 2053.     0   -288.
## 4   792. 2341.   288.     0 
younggeun
  • 923
  • 1
  • 12
  • 19