Manage categorical variables with NA in R

Question

I am using a national survey to run my regression. The df is based on deographic and economic variables and sometime there are missing values that R address as "NA".

I have categorical variables but sometime I find problems: for example I have a variables q which takes value 1 if the individual is an employee, 2 if he/she is a worker but not employee and 3 if the person doesn't work.

I also know that employees can work in the private or public sector; the problem is that sometimes I don't know if the employee is a worker of private or public sector (I have NA).

I want to construct a categorical variable taking care if the employee is in the private or public sector:

df$q2 <- ifelse(d.d$q=="3",1,
ifelse(d.d$q=="2",2,
ifelse(d.d$q=="1" & d.d$priv=="1",3,
ifelse(d.d$q=="1" & d.d$pubbl=="1",4,
0))))

df$q2 <- as.factor(d.d$q2)
levels(d.d$q2)
"0","1","2","3","4"

The level 0 I suppos is referred to employee worker for which I don't know the working sector (private or public).

My desidered output is to get only levels 1,2,3,4 and drop level 0; I tried to search on the web but the only solution found is to drop observations.

Just one more question: if I create four dummies from the variable q2:

d.d$not_worker <- ifelse(d.d$q2=="1",1,0)
d.d$public_employee <- ifelse(d.d$q2=="4",1,0)
d.d$private_employee <- ifelse(d.d$q2=="3",1,0)
d.d$worker_not_employee <- ifelse(d.d$q2=="2",1,0)

and then factorized all of them with the command : as.factor() and then running a regression omitting the variable d.d$not_worker could be a soution?

d.d$not_worker <- as.factor(d.d$not_worker)
d.d$public_employee <- as.factor(d.d$public_employee)
d.d$private_employee <- as.factor(d.d$private_employee)
d.d$worker_not_employee <- as.factor(d.d$worker_not_employee)

eq1 <- lm(PIP ~ public_employee + private_employee + worker_not_employee, data=d.d)

Thank you in advance

Why do you want to drop the NA category? Include it in your model. — Roland, Nov 08 '16 at 14:30
I wouldn't drop them but I got the 0 level which is useless for me since I want: not worker / worker not employee / public employee / private employee. — Laura R., Nov 08 '16 at 14:31
Ok, than you; and so I should add na.aciont = na.omit in my lm regression — Laura R., Nov 08 '16 at 14:33
I never said that (although if you don't have any covariates you could do so). However, checking if the NA category is different from or similar to other categories might be useful. — Roland, Nov 08 '16 at 14:35

Manage categorical variables with NA in R

0 Answers0