I am using a national survey to run my regression. The df is based on deographic and economic variables and sometime there are missing values that R address as "NA".
I have categorical variables but sometime I find problems: for example I have a variables q which takes value 1 if the individual is an employee, 2 if he/she is a worker but not employee and 3 if the person doesn't work.
I also know that employees can work in the private or public sector; the problem is that sometimes I don't know if the employee is a worker of private or public sector (I have NA).
I want to construct a categorical variable taking care if the employee is in the private or public sector:
df$q2 <- ifelse(d.d$q=="3",1,
ifelse(d.d$q=="2",2,
ifelse(d.d$q=="1" & d.d$priv=="1",3,
ifelse(d.d$q=="1" & d.d$pubbl=="1",4,
0))))
df$q2 <- as.factor(d.d$q2)
levels(d.d$q2)
"0","1","2","3","4"
The level 0 I suppos is referred to employee worker for which I don't know the working sector (private or public).
My desidered output is to get only levels 1,2,3,4 and drop level 0; I tried to search on the web but the only solution found is to drop observations.
Just one more question: if I create four dummies from the variable q2:
d.d$not_worker <- ifelse(d.d$q2=="1",1,0)
d.d$public_employee <- ifelse(d.d$q2=="4",1,0)
d.d$private_employee <- ifelse(d.d$q2=="3",1,0)
d.d$worker_not_employee <- ifelse(d.d$q2=="2",1,0)
and then factorized all of them with the command : as.factor()
and then running a regression omitting the variable d.d$not_worker could be a soution?
d.d$not_worker <- as.factor(d.d$not_worker)
d.d$public_employee <- as.factor(d.d$public_employee)
d.d$private_employee <- as.factor(d.d$private_employee)
d.d$worker_not_employee <- as.factor(d.d$worker_not_employee)
eq1 <- lm(PIP ~ public_employee + private_employee + worker_not_employee, data=d.d)
Thank you in advance