-1

The data set is this

badData <- list(c(296,310), c(330,335), c(350,565))
df <- data.frame(wavelength = seq(300,360,5.008667),
                  reflectance = seq(-1,-61,-5.008667))
df    
   wavelength reflectance
   300.0000   -1.000000
   305.0087   -6.008667
   310.0173  -11.017334
   315.0260  -16.026001
   320.0347  -21.034668
   325.0433  -26.043335
   330.0520  -31.052002
   335.0607  -36.060669
   340.0693  -41.069336
   345.0780  -46.078003
   350.0867  -51.086670
   355.0953  -56.095337

The orginal question was whether to identify if wavelength fell in any of the ranges given in badData The solution offered is this https://stackoverflow.com/a/52070363/1012249

my question is using a similar syntax, how does one identify which badData bin it falls into. Lets say badData were structured like this, and bins are non-overlapping.

badData <- data.frame(bin=c('a','b','c'),start= c(296,330,350),end=c(310.01,335,565))
ashleych
  • 1,042
  • 8
  • 25

2 Answers2

2

Here is an example using fuzzy join:

library(fuzzyjoin)
df %>%
  fuzzy_left_join(badData, #join badData to df
                  by = c("wavelength" = "start", #variables to join by
                       "wavelength" = "end"),
                  match_fun=list(`>=`, `<=`)) #functions to use for each par of variables so "wavelength" >= "start" and "wavelength" <= "end" is the logic here
#output
   wavelength reflectance  bin start    end
1    300.0000   -1.000000    a   296 310.01
2    305.0087   -6.008667    a   296 310.01
3    310.0173  -11.017334 <NA>    NA     NA
4    315.0260  -16.026001 <NA>    NA     NA
5    320.0347  -21.034668 <NA>    NA     NA
6    325.0433  -26.043335 <NA>    NA     NA
7    330.0520  -31.052002    b   330 335.00
8    335.0607  -36.060669 <NA>    NA     NA
9    340.0693  -41.069336 <NA>    NA     NA
10   345.0780  -46.078003 <NA>    NA     NA
11   350.0867  -51.086670    c   350 565.00
12   355.0953  -56.095337    c   350 565.00
missuse
  • 19,056
  • 3
  • 25
  • 47
  • Thanks. But I was looking for a solution based on lapply, similar to the link that I had shared – ashleych Aug 29 '18 at 09:03
  • @ashleych "*But I was looking for a solution based on lapply*" Why? This is a very elegant and succinct solution. You don't need `lapply`! – Maurits Evers Aug 29 '18 at 09:05
  • Agreed, and I’ve upvoted it too. But my motivation for the question was to understand if the lapply construct referred to can be extended to solve this. – ashleych Aug 29 '18 at 09:43
2

You don't need a loop. You can simply use cut:

badData <- data.frame(bin=c('a','b','c'),start= c(296,330,350),end=c(310.01,335,565))
df <- data.frame(wavelength = seq(300,360,5.008667),
                 reflectance = seq(-1,-61,-5.008667))

df$bins <- cut(df$wavelength, t(badData[, c("start", "end")]), 
               labels = head(c(t(cbind(as.character(badData$bin), "good"))), -1))
#   wavelength reflectance bins
#1    300.0000   -1.000000    a
#2    305.0087   -6.008667    a
#3    310.0173  -11.017334 good
#4    315.0260  -16.026001 good
#5    320.0347  -21.034668 good
#6    325.0433  -26.043335 good
#7    330.0520  -31.052002    b
#8    335.0607  -36.060669 good
#9    340.0693  -41.069336 good
#10   345.0780  -46.078003 good
#11   350.0867  -51.086670    c
#12   355.0953  -56.095337    c

You haven't said which side of the intervals should be open or closed, but this can be adjusted.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • It throws an error 'Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, : factor level [4] is duplicated' – ashleych Aug 29 '18 at 09:35