How to use a subset of multiple columns with some conditions in lm function?

Question

I would like to study the effect of party identification on attitude towards an issue. I'm using a large poll data Democracy Checkup 2020. My regression function goes as follow:

mod1 <- lm(dc20_pos_jobs ~ dc20_party_id + dc20_education, data, 
           subset = ("dc20_pos_jobs" < 5 && "dc20_party_id" != c("6","7","8"))

I can't seem to get the subset to work, although I don't get any errors. Here are the specs I need in my subset:

dc20_pos_jobs: use all values but don't use 5. It's an ordinal variable (1, 2, 3, 4, 5)
dc20_party_id: use all values but don't use 6, 7 and 8. It's a nominal variable (1 through 8)

and if possible

add another specific variable in lm every time dc20_party_id has a certain value. For example: dc20_party_id equals 1? Add dc20_party_rating_23 as a control variable in lm. dc20_party_id equals 2? Add dc20_party_rating_24 as a control variable in lm. ... up to dc20_party_id equals 5? Add dc20_party_rating_24 as a control variable in lm.

All the different party_ratings are interval variables going from 0 to 100.

I tried moving the arguments around, different operators.

This might answer your question: https://stackoverflow.com/questions/33113544/how-to-subset-a-range-of-values-in-lm. Looks like you need to remove the quotes around your variable names. — Harrison Jones, May 01 '23 at 13:26

score 0 · Answer 1 · edited May 02 '23 at 14:55

0

(ps: you could try this in python)

I think the error is in the quotation marks, try this:

subset = (dc20_pos_jobs != 5 & !(dc20_party_id %in% c(6, 7, 8)))

It will include all values of dc20_pos_jobs except for 5, and all values of dc20_party_id except for 6, 7, and 8.

edited May 02 '23 at 14:55

L Tyrone

1,268
3
15
24

answered May 01 '23 at 14:27

Ulrich

25
5

1

The OP wants dc20_pos_jobs < 5, an easy fix. – L Tyrone May 01 '23 at 20:50

How to use a subset of multiple columns with some conditions in lm function?

1 Answers1