0

I would like to study the effect of party identification on attitude towards an issue. I'm using a large poll data Democracy Checkup 2020. My regression function goes as follow:

mod1 <- lm(dc20_pos_jobs ~ dc20_party_id + dc20_education, data, 
           subset = ("dc20_pos_jobs" < 5 && "dc20_party_id" != c("6","7","8"))

I can't seem to get the subset to work, although I don't get any errors. Here are the specs I need in my subset:

  1. dc20_pos_jobs: use all values but don't use 5. It's an ordinal variable (1, 2, 3, 4, 5)
  2. dc20_party_id: use all values but don't use 6, 7 and 8. It's a nominal variable (1 through 8)

and if possible

  1. add another specific variable in lm every time dc20_party_id has a certain value. For example: dc20_party_id equals 1? Add dc20_party_rating_23 as a control variable in lm. dc20_party_id equals 2? Add dc20_party_rating_24 as a control variable in lm. ... up to dc20_party_id equals 5? Add dc20_party_rating_24 as a control variable in lm.

All the different party_ratings are interval variables going from 0 to 100.

I tried moving the arguments around, different operators.

L Tyrone
  • 1,268
  • 3
  • 15
  • 24
  • 1
    This might answer your question: https://stackoverflow.com/questions/33113544/how-to-subset-a-range-of-values-in-lm. Looks like you need to remove the quotes around your variable names. – Harrison Jones May 01 '23 at 13:26

1 Answers1

0

(ps: you could try this in python)

I think the error is in the quotation marks, try this:

subset = (dc20_pos_jobs != 5 & !(dc20_party_id %in% c(6, 7, 8)))

It will include all values of dc20_pos_jobs except for 5, and all values of dc20_party_id except for 6, 7, and 8.

L Tyrone
  • 1,268
  • 3
  • 15
  • 24
Ulrich
  • 25
  • 5