2

I´m trying to use the srvyr package in R. The data is extracted from this link:

Expenses: https://cdn.bancentral.gov.do/documents/estadisticas/encuesta-de-gastos-e-ingresos/documents/Cuadros_Gastos.xlsx?v=1689283267553

Income: https://cdn.bancentral.gov.do/documents/estadisticas/encuesta-de-gastos-e-ingresos/documents/Cuadros_Ingresos.xlsx?v=1689283267553

Since this is a household survey, I use merge for this two datasets.

Ingresos <- read_excel("Sociodemograficas_e_Ingresos.xlsx", sheet ="Base")

gastos <- read_excel("Gasto_consumo_final_mensual.xlsx")

The expansion factor is supossed to give the population estimate which for the Dominican Republic = +10,000,000.

If I sum the FACTOR_EXPANSION varibles it´s exactly the amount needed. However, when I create my survey object, I don´t get the population estimates.

Ingresos_Filtrado <- Ingresos %>% 
      select (A204,A303,A302, A402, A404, A405, A410,GRUPO_RAMA, GRUPO_OCUPACION, 
GRUPO_CATEGORIA,GRUPO_EDAD, GRUPO_EDUCACION, GRUPO_SECTOR, GRUPO_EMPLEO, ESCOLARIDAD, 
SALARIO_PRINCIPAL, A201, A202A, A202B, A202C, A202D, A207, A208,  A212, A221, GRUPO_REGION, 
DES_PROVINCIA,DES_MUNICIPIO, A206, A213, A218, A219, A224,A303, A309, CALLES_ASFALTADAS,ESTRATO ,
ALUMBRADO_PUBLICO,FACTOR_EXPANSION, VIVIENDA, HOGAR, MIEMBRO, UPM,PET, PEA, QUINTIL, TRIMESTRE, REPLICA, ORDEN_REGION,A102,A401,A401A)


Union_Ingresos_Gastos <- merge(x=Ingresos_Filtrado,y=gastos,by=c("TRIMESTRE", "REPLICA", "UPM", "FACTOR_EXPANSION", "VIVIENDA", "HOGAR", "ORDEN_REGION", "QUINTIL"),all.x=TRUE)

survey_2 <- Union_Ingresos_Gastos %>% 
        as_survey_design(ids=UPM,strata=ESTRATO,weigths=FACTOR_EXPANSION,nest=TRUE)

survey_2  %>% group_by(QUINTIL) %>%
    summarise(total = survey_total(A401A,level=0.95,na.rm=TRUE)) %>% mutate(Total = sum(total))

Result is:

# A tibble: 5 × 4
  QUINTIL   total total_se   Total
    <dbl>   <dbl>    <dbl>   <dbl>
1       1 2213294   82873. 8513968
2       2 2127726   80353. 8513968
3       3 1765902   65479. 8513968
4       4 1483914   71456. 8513968
5       5  923132   54371. 8513968

With this formula, the population estimate is 8,513,968

I need help, because when I use formulas without the survey object, I get more precise results.

sum(Ingresos$FACTOR_EXPANSION)
[1] 10,299,551

Is the problem merging the two datasets?

Perhaps I need another argument for the as_survey_design

Help!

zx8754
  • 52,746
  • 12
  • 114
  • 209
almr27
  • 43
  • 4
  • 1
    looking at your `Union_Ingresos_Gastos` object, what records are missing on one side of the merge? – Anthony Damico Jul 14 '23 at 18:38
  • 1
    could you point me to the place in https://cdn.bancentral.gov.do/documents/estadisticas/encuesta-de-gastos-e-ingresos/documents/ENGIH_2018.pdf?v=1689359502661 that you got the structure of your `as_survey_design` ? – Anthony Damico Jul 14 '23 at 18:38
  • If that's really your code, you have misspelled `weights` in the declaration of the survey design – Thomas Lumley Jul 18 '23 at 00:53

0 Answers0