Frequency tables by groups with weighted data in R

Question

I wish to calculate two kind of frequency tables by groups with weighted data.

You can generate reproducible data with the following code :

Data <- data.frame(
     country = sample(c("France", "USA", "UK"), 100, replace = TRUE),
     migrant = sample(c("Native", "Foreign-born"), 100, replace = TRUE),
     gender = sample (c("men", "women"), 100, replace = TRUE),
     wgt = sample(100),
     year = sample(2006:2007)
     )

Firstly, I try to calculate a frequency table of migrant status (Native VS Foreign-born) by country and year. I wrote the following code using the packages questionr and plyr :

db2006 <- subset (Data, year == 2006)
db2007 <- subset (Data, year == 2007)

result2006 <- as.data.frame(cprop(wtd.table(db2006$migrant, db2006$country, weights=db2006$wgt),total=FALSE))
result2007 <- as.data.frame(cprop(wtd.table(db2007$migrant, db2007$country, weights=db2007$wgt),total=FALSE))

result2006<-rename (result2006, c(Freq = "y2006"))
result2007<-rename (result2007, c(Freq = "y2007"))

result <- merge(result2006, result2007, by = c("Var1","Var2"))

In my real database, I have 10 years so it takes times to apply this code for all the years. Does anyone know a faster way to do it ?

I also wish to calculate the share of women and men among migrant status by country and year. I am looking for something like :

Var1            Var2     Var3     y2006   y2007
Foreign born    France   men        52     55
Foreign born    France   women      48     45
Native          France   men        51     52
Native          France   women      49     48
Foreign born    UK       men        60     65
Foreign born    UK       women      40     35
Native          UK       men        48     50
Native          UK       women      52     50

Does anyone have an idea of how I can get these results?

ulfelder · Answer 1 · 2016-10-19T18:20:13.830

1

You could do this by: making a function with the code you've already written; using lapply to iterate that function over all years in your data; then using Reduce and merge to collapse the resulting list into one data frame. Like this:

# let's make your code into a function called 'tallyho'
tallyho <- function(yr, data) {

  require(dplyr)
  require(questionr)

  DF <- filter(data, year == yr)

  result <- with(DF, as.data.frame(cprop(wtd.table(migrant, country, weights = wgt), total = FALSE)))

  # rename the last column by year
  names(result)[length(names(result))] <- sprintf("y%s", year)

  return(result)

}

# now iterate that function over all years in your original data set, then 
# use Reduce and merge to collapse the resulting list into a data frame
NewData <- lapply(unique(Data$year), function(x) tallyho(x, Data)) %>%
  Reduce(function(...) merge(..., all=T), .)

edited Oct 19 '16 at 18:20

answered Oct 19 '16 at 12:19

ulfelder

5,305
1
22
40

TIL about `Reduce()` – roman Oct 19 '16 at 13:06
Many thanks @ulfelder for the answer, but I had some trouble with it. When I run the code, I get exactly the same result for 2006 and 2007 which is not correct....Do you know how I can improve it ? Do you know how I can add the information on gender ? – David Marguerit Oct 19 '16 at 15:55
Sorry, try the edited version I just posted. I think I was confusing `dplyr` by giving the function input the same name as a column. Unfortunately, I don't think you can add gender to this approach, as `wtd.table` appears only to allow two-way crosstabs. And I don't know enough about what those weights are doing to suggest an alternative solution. – ulfelder Oct 19 '16 at 18:23

Frequency tables by groups with weighted data in R

1 Answers1