0

I'm having an issue where a boolean flag doesn't seem to be correctly evaluating a numeric data type in R. But everything is only to 1 decimal place, so I'm confused how it could be unstable?

Here's some code to reproduce:

library(ggplot2)

myfunction <- function(x,y){
  t1 = (x-1)^2
  t2 = (y-1)^2
  t1 + t2
}

input_vector_full <- seq(-1,1,0.1)
x_full <- input_vector_full
y_full <- input_vector_full
result_vector_full <- c()
x_vect <- c()
y_vect <- c()
for(i in x_full){
  for(j in y_full){
    temp <- myfunction(i,j)
    x_vect <- c(x_vect, i)
    y_vect <- c(y_vect,j)
    result_vector_full <- c(result_vector_full, temp)
  }
}
df <- data.frame("x" = x_vect, "y" = y_vect, "z" = result_vector_full)

#Creating a new dataframe here for clarity

df2 <- df
df2$sum_absolute <- abs(df2$x) + abs(df2$y)
df2$bool_flag <- df2$sum_absolute <= 1

If you run the above and go to where df2$x == 0.1 and df2$y == 0.9 then you'll see a value of 1.0 in the "sum_absolute" column. But a false in the "bool_flag" column

Version info:

> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          4                           
minor          0.3                         
year           2020                        
month          10                          
day            10                          
svn rev        79318                       
language       R                           
version.string R version 4.0.3 (2020-10-10)
nickname       Bunny-Wunnies Freak Out 

Structure

'data.frame':   441 obs. of  5 variables:
 $ x           : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ y           : num  -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 ...
 $ z           : num  8 7.61 7.24 6.89 6.56 6.25 5.96 5.69 5.44 5.21 ...
 $ sum_absolute: num  2 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 ...
 $ bool_flag   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...

Even this doesn't work

library(dplyr)
df2 %>% filter(x == 0.1)

[1] x            y            z            sum_absolute bool_flag   
<0 rows> (or 0-length row.names)
Jamalan
  • 482
  • 4
  • 15
  • Also, I did try doing "df2$sum_absolute <= 1.0" no difference. – Jamalan Sep 02 '21 at 21:24
  • Computers have limitations when it comes to floating-point numbers (aka `double`, `numeric`, `float`). This is a fundamental limitation of computers in general, in how they deal with non-integer numbers. This is not specific to any one programming language. There are some add-on libraries or packages that are much better at arbitrary-precision math, but I believe most main-stream languages (this is relative/subjective, I admit) do not use these by default. Refs: https://stackoverflow.com/q/9508518, https://stackoverflow.com/q/588004, and https://en.wikipedia.org/wiki/IEEE_754 – r2evans Sep 02 '21 at 21:24
  • In order to test for numbers that are "effectively" what you are seeking, look for a tolerance of a difference. for instance, `filter(df2, abs(x - 0.1) < 1e-9)` will return rows where `x` is *really close to* `0.1`. The use of `1e-9` is somewhat arbitrary, and is informed by the expected precision of your dataset. If you are expecting precision on the order of 1-2 decimal places, then 1e-9 is more than enough; if you have higher levels of precision, you can likely use down to `1e-16`, though in this example `1e-17` does not produce any rows. – r2evans Sep 02 '21 at 21:26

0 Answers0