0

I encountered the following inconsistent behaviour of cut which gives me a headache:

x <- 0.2316
cut(x, c(0, 0.2315, 10)) #gives 0.232 as cutpoint and choses second interval
## [1] (0.232,10]
## Levels: (0,0.232] (0.232,10]
cut(x, c(0, 0.232, 10)) #choses first interval when taking the same cutpoint it just gave (0.232)
## [1] (0,0.232]
## Levels: (0,0.232] (0.232,10]

The problem is that cut seems to chose the interval before formatting (rounding) the cut points. This leads to the inconsistent behaviour in the example that it chooses the second interval but would have chosen the first interval according to the given cut point (which can be seen in the last line).

This is a problem for me because I have two functions in my package: One is calculating the cut points and the second is determining the right intervals where to put new data points. In the example above the same data point is put into the second interval in the first function but into the first interval in the second function - displaying the exact same cut points! That can lead to some strange behaviour in my package!

My question
Is this a known issue? And if yes are there any workarounds? Thank you

Edit
I know that you can change the number of decimal places with dig.lab yet the same problem would occur if you had cut points with more decimal places. The above example is just a demonstration of a more general problem!

vonjd
  • 4,202
  • 3
  • 44
  • 68
  • 1
    Do you want more digits for the cutpoints? That would be `cut(x, c(0, 0.2315, 10), dig.lab = 4)`. – lukeA Jun 18 '16 at 17:06
  • @lukeA: I know, yet the same problem would occur one decimal place further down when you had a number with more decimal places as the cutpoint. The above is just an illustrative example! – vonjd Jun 18 '16 at 17:09
  • @lukeA: Please see my edit. – vonjd Jun 18 '16 at 17:12
  • 1
    As lukeA points out, `cut` is using the correct point for partitioning. However, the print out is rounding. One fix for this if you are building functions for other users is to include a "dig.labs" argument in your function that will allow the user to choose the precision of the printed intervals. – lmo Jun 18 '16 at 17:13
  • @lmo: Thank you. Unfortunately this would not solve my problem because the cutpoints that are printed are used in my function No. 2 to chose the right interval and then again the wrong ones will be chosen in those unfortunate cases (see also my edit). – vonjd Jun 18 '16 at 17:18
  • Since you are setting the cutpoints, can you grab these separately? – Richard Telford Jun 18 '16 at 17:26
  • @RichardTelford: Yes, I think I found a way myself by having a look at the source code of `cut.default` - see my answer. – vonjd Jun 18 '16 at 17:30

1 Answers1

0

I just had a look at the source code of cut.default and I think one workaround would be to apply the same formatting that is applied on the printout on the breaks before calling the cut function:

breaks <- as.numeric(formatC(0 + c(0, 0.2315, 10), digits = 3, width = 1L))
cut(x, breaks = breaks)
## [1] (0,0.232]
## Levels: (0,0.232] (0.232,10]

Then at least everything is consistent (so in this case the first interval is chosen in both cases).

Yet this only works in cases where you set the cut points explicitly!

vonjd
  • 4,202
  • 3
  • 44
  • 68