I respectfully disagree with all of the t-test-like answers above. The OP mentions he is interested in the difference in weight between domestic and foreign cars and wants to determine weight:
"...based on some variables in a given dataset"
The questions is thus about weight differences across domestic and foreign cars, controlled for other car characteristics. A t-test does not allow for that, while regression (or anova) does.
Let's use the mtcars dataset and assume that V-shaped are US-engines (VS == 0) and S-shaped are European ('foreign') engines (VS == 1).
df <- mtcars
m1 <- lm(formula = wt ~ vs, data = mtcars)
summary(m1)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6886 0.1950 18.913 < 2e-16 ***
vs -1.0773 0.2949 -3.654 0.00098 ***
The abrigded output shows that, when not controlling for other characteristics, European cars weigh on average less (3.6886+1*-1.0773) than US cars (3.6886+0*-1.0733).
However this difference may well be attributable to difference in how European / US cars are made. E.g. US cars may be more likely to be automatic rather than manual and may have on average more gears and carburettors than European cars, all contributing to the weight of a car. Let's model these factors in and see whether the US/European difference in weight still exists.
m2 <- lm(formula = wt ~ am + as.factor(carb) + as.factor(gear) + vs, data = mtcars)
summary(m2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.5658 0.4283 8.325 3.03e-08 ***
am -0.8585 0.4378 -1.961 0.0627 .
as.factor(carb)2 0.1250 0.3871 0.323 0.7499
as.factor(carb)3 0.2942 0.5257 0.560 0.5813
as.factor(carb)4 0.9034 0.4714 1.916 0.0684 .
as.factor(carb)6 0.7693 0.7966 0.966 0.3446
as.factor(carb)8 1.5693 0.7966 1.970 0.0615 .
as.factor(gear)4 -0.4427 0.5015 -0.883 0.3869
as.factor(gear)5 -0.7066 0.6228 -1.135 0.2688
vs -0.3322 0.4237 -0.784 0.4413
The last line in the abridged output now shows that differences in weight can no longer be attributed to US or European make, once car characteristics are taken into account. It also illustrates nicely how this answer differs substantively from the recommended t-test (or single variable regression in model m1).
"Also, I'm curious about the difference between lm(weight~foreign +
cylinders + ...)
vs lm(formula= ...)
"
There is no substantive difference. The former is short hand for the latter. However, when using the short hand notation the elements (formula, data, etc) must be provided in the expected order (see ?lm
).
.