Remove rows in a dataframe containing values outside multiple intervals

Question

Been searching your forum for answers to my question witout results, the nearest I came was: subset data.frame union multiple intervals! But I don't see how that can help me. Maybe my reading comprehension in english being to weak.

So, I have a data frame study_a containing 246 individuals, all given a specific code to entain anonymity. The code is one of 14 columns, the rest is answers on a questionnaire. An example:

My wish is to find those values outside the intervals 1-2999, 4251-4599 and 7000-7500 and in the end erase these rows (i.e individuals) since they are not supposed to be included in this study.

Except for the link I've also tried:

    in.interval.lo(study_a$"code", 1, 2999) #in.interval.lo(x, lo, hi)

But this function don't seem to work for multiple intervals. And furthermore, I get an error message when I try to specify which columns values I want to check (since there are 13 columns in addition to "code").

Thanks in advance for your help, and if this question already have been solved I would be most thankful for a link to it since I've failed in finding it.

Maybe `study_a[ !study_a$code %in% c(1:2999, 4251:4599, 7000:7500), ]` ? — zx8754, Oct 13 '16 at 10:01
possible match also: http://stackoverflow.com/a/39977463/3871924 — agenis, Oct 13 '16 at 10:17

score 2 · Accepted Answer · edited Oct 13 '16 at 10:06

2

You should take a look at indexing in R using squared brackets and logicals. E.g.

study_a[(study_a$Code <= 2999) & (study_a$Code >= 4251),]

Edit:

Using %in%, and ! NOT operator to negate the overlap:

study_a[ !study_a$code %in% c(1:2999, 4251:4599, 7000:7500), ]

edited Oct 13 '16 at 10:06

zx8754

52,746
12
114
209

answered Oct 13 '16 at 09:59

tobiasegli_te

1,413
1
12
18

Thanks for your help! I used the function above which resulted in; – Hampus Persson Oct 14 '16 at 07:22
<0 rows> (or 0-length row.names) which I understood as there were no values/rows outside the given intervals?! Thank you very much for helping me! – Hampus Persson Oct 14 '16 at 07:25
Use `str()` or `class()` to check if `study_a$Code` is numeric – tobiasegli_te Oct 14 '16 at 09:05
changed it from factor to numeric, still tough generating the same result/message as above (<0 rows> (or 0-length row.names) . – Hampus Persson Oct 15 '16 at 17:53

score 0 · Answer 2 · answered Oct 13 '16 at 10:20

0

Use subset function

xx <- c(1:2999, 4251:4599, 7000:7500)
study_a <- subset(study_a, !code %in% xx )

answered Oct 13 '16 at 10:20

Arun kumar mahesh

2,289
2
14
22

Remove rows in a dataframe containing values outside multiple intervals

2 Answers2