-1

Actually, I have solved this question, but I have problems because the solution is in two steps, which are really separated between each other (the first step is inside a function and the second step is inside another; this would imply me to make H as an output).

First, the replicable example:

RN = rnorm(n=1000,10,20)
H = cut2(RN,g=4,onlycuts=FALSE) # Step 1: The intervals are generated
H2= cut2(RN,g=4,onlycuts=TRUE) # Step 1: (This would be useful if Step 1 and 2 were not separated)
new_number = 10.53 # Step 2: New number
interval_new_number = cut2(new_number,cuts=H) # Step 2: Interval for new number

I would like to know a solution which can be done as:

new_number %in% H

Give me your opinion.

Uwe
  • 41,420
  • 11
  • 90
  • 134

1 Answers1

1

I (think) the request is for determination of the interval number for a new value relative to a factor vector constructed with cut2. If that is what is needed then use as.numeric on a gsub construction of the first of the two cuts in each factor level:

H = cut2(RN,g=4,onlycuts=FALSE)
attributes(H)
#----
$class
[1] "factor"

$levels
[1] "[-66.7,-2.4)" "[ -2.4,10.3)" "[ 10.3,23.7)" "[ 23.7,75.9]"

findInterval( 10.53, as.numeric( gsub( "\\[|\\,.+$","", levels(H) ) ) )
[1] 3

I had never seen the onlycuts parameter used before, but it would make the code even easier, since the as.numeric( gsub(...)) calls would not be needed:

> (H2 = cut2(RN,g=4,onlycuts=TRUE) )
[1] -66.687208  -2.397688  10.334926  23.659386  75.887076
> findInterval( 10.53, H2 )
[1] 3
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you! I was looking for this – Juan Esteban de la Calle Oct 10 '16 at 19:37
  • Is not such a big mistake, but the result is not the same: if I use as.numeric(gsub(...)) I get c(-66.687208, -2.397688, 10.334926, 23.659386) I am going to have the 75.887076 missing. I solved it but I needed to advice you about it. – Juan Esteban de la Calle Oct 11 '16 at 12:49
  • 1
    That value would be missing, but `findInterval` would still give the correct result for any numeric values in the interval `[ 23.7,75.9]`. The only value where findInterval would give different results would be when the test value were at or above 75.9. If you need to keep the maximum value from the cuts then use the `onlycuts=TRUE` strategy. – IRTFM Oct 11 '16 at 15:52
  • How can I get the last number? I thought I had it but I didn't. Sorry. – Juan Esteban de la Calle Oct 11 '16 at 16:54
  • 1
    It's the last value in the vector returned by `cut2(RN,g=4,onlycuts=TRUE)` – IRTFM Oct 11 '16 at 17:06
  • I mean when I make onlycuts=FALSE – Juan Esteban de la Calle Oct 11 '16 at 17:34
  • 1
    You would need to write another gsub call that "erases" everything up to a comma and also drops the trailing "]", and then apply `as.numeric` to it. – IRTFM Oct 11 '16 at 22:34
  • 1
    `as.numeric( gsub( ".+\\,|\\]","", tail(levels(H),1) ) )` – IRTFM Oct 11 '16 at 23:21
  • How can I thank you? This saved me hours of thinking. Thank you – Juan Esteban de la Calle Oct 11 '16 at 23:45
  • 1
    How to thank me? You should go track down a bunch of answers by G.Grothendieck that involve regex questions and upvote them. I learned regex by pulling apart and playing with his answers on Rhelp. – IRTFM Oct 11 '16 at 23:48