I want to calculate the impact that height has on earnings given the gender. I divided my data into data for male and female but when I run the lm(earnings~height+education+age, data = data_female) function it gives me an error saying: Error in model.frame.default(formula = earnings ~ height + education + : variable lengths differ (found for 'education')
Would you be able to help in either suggesting a better way to refine my model or helping to fix this particular error? Please let me know.
setwd("~/Google Drive/R Data")
data <- read.csv('data_ass5.csv')
height <- data$height
earnings <- data$earnings
gender <- data$sex
age <- data$age
education <- data$educ
multiple_regression <- lm(earnings~height+age+gender+education,data = data)
lm(earnings~height+age+gender+education,data = data)
summary(multiple_regression)
summary(linear_regression)
multiple_regression_redefined <- lm(earnings~age+gender+education,data = data)
# Now I wish to particularly assess the impact of gender on earnings
# therefore trying to refine my model doing the following:
# but the lm last line is causing an error. Would you be able to adivse on
# if this is the correct way to refine it and/or why I am getting the error.
# I even tried putting na.rm=TRUE after the lm code, but error still.
data_female <- subset(data,gender==0)
data_male <- subset(data,gender==1)
lm(earnings~height+education+age, data = data_female)