1

I am trying to run a Oaxaca decomposition using the oaxaca package, but the inclusion of certain variables seems to trigger the error "non-conformable arguments." As far as I can tell, the error seems to only arise with the inclusion of certain factor/categorical variables, but not all factor/categorical variables.

Here is a minimal reproducible example of my dataset, wvs_reduc:

structure(list(emp = c(1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 
1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0), education = structure(c(4L, 3L, 2L, 2L, 3L, 
3L, 2L, 6L, 4L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 4L, 4L, 1L, 2L, 4L, 
4L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 3L, 2L, 4L, 4L, 4L, 3L, 
2L, 4L, 3L), .Label = c("No Formal Education", "Primary or Less", 
"Incomplete Secondary", "Secondary", "Incomplete University", 
"University or More"), class = "factor"), marital = structure(c(1L, 
1L, 3L, 3L, 1L, 3L, 3L, 1L, 1L, 3L, 3L, 1L, 3L, 4L, 3L, 1L, 1L, 
4L, 3L, 1L, 3L, 4L, 1L, 3L, 3L, 3L, 3L, 1L, 3L, 4L, 4L, 4L, 4L, 
3L, 3L, 4L, 3L, 3L, 4L, 3L), .Label = c("single", "cohabiting", 
"married", "previously married"), class = "factor"), Arab = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-40L), class = c("tbl_df", "tbl", "data.frame"))

When I run the command:

library(oaxaca)
oaxaca(emp ~ education + marital | Arab, 
       data = wvs_reduc, group.weights = 0, R = 10)

I get the error message: Error in t(x.mean.A) %*% delta.A : non-conformable arguments.

In case it is relevant, when I run the command on my larger dataset, I instead get a similar but not-identical error with the inclusion of the variable "marital" but not "education" or other factor variables:

Error in t(x.mean.A - x.mean.B) %*% beta.B : non-conformable arguments

Ella Wind
  • 121
  • 12
  • 1
    Hmm ok the error occurs because one of your factors ended with only one observations in the bootstrap. So the error occurs with a really buggy part in the source code, where it assumes a matrix, but if you have n=1, it's a vector – StupidWolf Mar 25 '20 at 20:45
  • 1
    this is the underlying wrapper oaxaca:::.oaxaca.wrap and the error part is this bunch of lines, E <- as.numeric(t(x.mean.A - x.mean.B) %*% beta.B) ... – StupidWolf Mar 25 '20 at 20:47
  • 1
    unlikely you can get around this.. question for you know is do you need the bootstrap? – StupidWolf Mar 25 '20 at 20:48
  • Hmm... so I set it to not do bootstrapping and with my reduced dataset that I submitted here it did not solve the problem but with my larger dataset, it allowed me to add one of my previously not-working variables but not the other one. If the problem is having an n=1, do you think collapsing some of the categories for my categorical variables could help? – Ella Wind Mar 26 '20 at 14:07
  • Yes, it would work for without bootstrap. Set R=1. For example in the example you provided, Arab is all one, so it will not work. You can always sample your variables to know it's not wrong with your data. – StupidWolf Mar 26 '20 at 14:21

1 Answers1

1

Looking at the underlying code oaxaca:::.oaxaca.wrap and the error part is this bunch of lines:

E <- as.numeric(t(x.mean.A - x.mean.B) %*% beta.B)
C <- as.numeric(t(x.mean.B) %*% (beta.A - beta.B))
I <- as.numeric(t(x.mean.A - x.mean.B) %*% (beta.A - beta.B))

If anyone of x.mean.A is a vector, then it will throw an error. Looking at your design in this example dataset:

table(wvs_reduc$education,wvs_reduc$Arab)

                         0  1
  No Formal Education    0  2
  Primary or Less        2 10
  Incomplete Secondary   4  3
  Secondary             14  4
  Incomplete University  0  0
  University or More     0  1

So those will all zeros will be dropped and I would say you need to ensure the levels are distributed across your grouping category. We can affirm this by simulating this variable:

set.seed(111)
wvs_reduc$test_education =sample(levels(wvs_reduc$education),nrow(wvs_reduc),replace=TRUE)
wvs_reduc$test_marital =sample(levels(wvs_reduc$marital),nrow(wvs_reduc),replace=TRUE)

We run this and turn off bootstrap:

oaxaca(emp ~ test_education + test_marital  | Arab, data=wvs_reduc,R=NULL)

And if we set bootstrap it crashes because when subsampling, it can run into the same error:

oaxaca(emp ~ test_education + test_marital  | Arab, data=wvs_reduc,R=2)
oaxaca: oaxaca() performing analysis. Please wait.

Bootstrapping standard errors:
1 / 2 (50%)
Error in t(x.mean.A) %*% delta.A : non-conformable arguments
In addition: There were 11 warnings (use warnings() to see them)

So for it to work on your whole dataframe, you need to check whether there are levels with n=1 (considering groups)

StupidWolf
  • 45,075
  • 17
  • 40
  • 72