0

I am using a national survey to run my regression. The df is based on deographic and economic variables and sometime there are missing values that R address as "NA".

I have categorical variables but sometime I find problems: for example I have a variables q which takes value 1 if the individual is an employee, 2 if he/she is a worker but not employee and 3 if the person doesn't work.

I also know that employees can work in the private or public sector; the problem is that sometimes I don't know if the employee is a worker of private or public sector (I have NA).

I want to construct a categorical variable taking care if the employee is in the private or public sector:

df$q2 <- ifelse(d.d$q=="3",1,
ifelse(d.d$q=="2",2,
ifelse(d.d$q=="1" & d.d$priv=="1",3,
ifelse(d.d$q=="1" & d.d$pubbl=="1",4,
0))))

df$q2 <- as.factor(d.d$q2)
levels(d.d$q2)
"0","1","2","3","4"

The level 0 I suppos is referred to employee worker for which I don't know the working sector (private or public).

My desidered output is to get only levels 1,2,3,4 and drop level 0; I tried to search on the web but the only solution found is to drop observations.

Just one more question: if I create four dummies from the variable q2:

d.d$not_worker <- ifelse(d.d$q2=="1",1,0)
d.d$public_employee <- ifelse(d.d$q2=="4",1,0)
d.d$private_employee <- ifelse(d.d$q2=="3",1,0)
d.d$worker_not_employee <- ifelse(d.d$q2=="2",1,0)

and then factorized all of them with the command : as.factor() and then running a regression omitting the variable d.d$not_worker could be a soution?

d.d$not_worker <- as.factor(d.d$not_worker)
d.d$public_employee <- as.factor(d.d$public_employee)
d.d$private_employee <- as.factor(d.d$private_employee)
d.d$worker_not_employee <- as.factor(d.d$worker_not_employee)

eq1 <- lm(PIP ~ public_employee + private_employee + worker_not_employee, data=d.d)

Thank you in advance

Laura R.
  • 99
  • 1
  • 10

0 Answers0