2

In order to filter a data.frame for only the the columns of interest I need to find the columns in this data.frame containing data outside a specific range. Let the data.frame be

df<-data.frame(x1=c(1,5,9),x2=c(10,20,30),x3=c(20,100,1000))
ranges<-data.frame(y1=c(3,8),y2=c(10,20), y3=c(15,1250))

As an output I'd like a list returning the colnames: "x1","x2"

I tried the following, but the code works only if "ranges" contains all the numbers as specified below, and matches if the number is found. Thats unfortunately not what I need.

ranges<-c(15:300,10:20)
df.l<-colnames(df)[sapply(df,function(x) any(x %in% ranges))]

Any ideas? Thanks!

aynber
  • 22,380
  • 8
  • 50
  • 63
Juan
  • 171
  • 1
  • 12

1 Answers1

2

If 'ranges' is a data.frame or list, one option is

names(which(unlist(Map(function(x, y) any(!(x >= y[1] & x <= y[2])), df, ranges))))
#[1] "x1" "x2"

Or use the reverse logic

names(which(unlist(Map(function(x, y) any(x < y[1]| x > y[2]), df, ranges))))

Or in tidyverse,

library(purrr)
library(dplyr)
library(tibble)
map2(df, ranges, ~ between(.x, .y[1], .y[2]) %>% `!` %>% any) %>% 
    enframe %>% 
    unnest(cols = value) %>% 
    filter(value) %>% 
    pull(name)
#[1] "x1" "x2"

data

ranges <- data.frame(y1 = c(3, 8), y2 = c(10, 20), y3 = c(15, 1250))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for the quick response. Data.frame, indeed.For the sample it works nicely. Im just checking how the code functions and if it works with my "real" data. – Juan Nov 09 '19 at 16:07
  • @Juan If it is not working, then please do provide an example that mimics your real data – akrun Nov 09 '19 at 16:08
  • May I ask: y[1] and y[2] return col y1 and y2. How come this is enough to check also for x3 / y3? – Juan Nov 09 '19 at 16:29
  • 1
    @Juan When we use `map2` or `Map`, it is basically looping through each corresponding columns of 'df' and 'ranges', so, the 'x' here is anonymous function for each column of 'df' i.e. x1, x2, x3 and the 'y' corresponds to the 'y1', 'y2', 'y3' and from there it is selecting each element of that column with indexing y[1], y[2] – akrun Nov 09 '19 at 16:31
  • @Juan If you want to understand how it is working checking `Map(function(x, y) y, df, ranges)` and `Map(function(x, y) x, df, ranges)` – akrun Nov 09 '19 at 16:32
  • @Juan It means that even if you have 100s of columns in each of the datasets, this would work without any change in the code. – akrun Nov 09 '19 at 16:34
  • was just working through ?map, but obviously did not get this part Thanks, again. – Juan Nov 09 '19 at 16:35
  • 1
    It does work for a larger data frame, thus my confusion in the first place. – Juan Nov 09 '19 at 16:38
  • 1
    @Juan `map` is different from `map2` here we used `map2` `map` takes only a single input – akrun Nov 09 '19 at 20:44