2

I would like to find the number of significant digits in a vector of numbers that can have very different scales.

For example, the number 1000 has 1 digit; the number 100 also has 1. The number 1300 has 2.

This is not to be confused with the number of digits after decimal, that in both cases is equal to 0.

Omry Atia
  • 2,411
  • 2
  • 14
  • 27
  • What have you tried so far? And you might want to read [this](https://perfdynamics.blogspot.com.es/2010/04/significant-figures-in-r-and-info-zeros.html) – phiver Apr 09 '18 at 10:47
  • I think you'll need to give a better definition of what "significant digits" means to you. It looks like you're interested in removing trailing zeros, and counting all other digits. Is this correct? Do you want to be able to accommodate decimal numbers? (note also that trailing zeros are not necessarily non-significant. It is impossible to know if a trailing zero is significant without knowledge about the precision of the measurement) – Benjamin Apr 09 '18 at 11:16
  • I would like to remove trailing zeros, and count all other digits as you say. If a number has decimal numbers, than the trailing zeros should not be counted. So for example 1003.20 has 5 significant digits – Omry Atia Apr 09 '18 at 11:22

4 Answers4

4

This function will convert each value in the vector to a character value, remove all leading zeros, trailing zeros, and decimal places, and count the number of characters remaining. It's performance appears to be comparable to phiver's answer.

sigfigs <- function(x){
  orig_scipen <- getOption("scipen")
  options(scipen = 999)
  on.exit(options(scipen = orig_scipen))

  x <- as.character(x)
  x <- sub("\\.", "", x)
  x <- gsub("(^0+|0+$)", "", x)
  nchar(x)
}

x <- c(1000,100,1300, 1200.1, 12345.67, 12345.670)

sigfigs(x)

A note of caution:

This function returns the number of digits that are neither leading nor trailing zeros. This is not exactly the same thing as the number of significant digits. while leading zeros are never significant, trailing zeros may or may not be significant--deciding if they are requires some knowledge about the precision of the measurement. I recommend reading the Wikipedia article on "Significant Figures" for more detail.

Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • Thanks Benjamin, this is almost it except 0.00001 should be one digit (sorry for not specifying this earlier) – Omry Atia Apr 09 '18 at 11:32
  • My apologies. Needed to remove the decimal point before removing the trailing and leading zeros. It is now fixed. – Benjamin Apr 09 '18 at 11:35
0

I think this could work. If you have numbers like 100000, you need to prevent R from using a scientific notation like 1e5 by setting options(scipen = 999). Also, in here you post you do not care about the numbers after the decimal point. In here I assume you do not have numbers with decimal points, but if you do, you could do floor(x) first.

x <- c(1000,100,1300, 1234,54334,324,1,1,546,12140465,0,100000,10203,20003,20,102030405060,20)

options(scipen = 999)

sapply(x, function(x) {sum(as.numeric(substring(x, 1: nchar(x), 1:nchar(x))) %in% c(1:9))})

That gives: [1] 1 1 2 4 5 3 1 1 3 7 0 1 3 2 1 6 1

Lennyy
  • 5,932
  • 2
  • 10
  • 23
  • `10203` would, conventionally, be read to have five significant figures. Omry seems interested in counting digits excluding trailing zeros. – Benjamin Apr 09 '18 at 11:13
  • I am not sure I understood you correctly. My function did give 3 as a result for 10203. What I did was splitting 10203 into, "1", "0", "2", "0", "3", making these numeric again, and then checking how many of these substrings of length 1 are in a vector of the digits c(1:9). So that does give TRUE, FALSE, TRUE, FALSE, TRUE, I summed over these, and it returned 3. So that should be ok right? – Lennyy Apr 09 '18 at 11:28
  • 1
    `10203` has five significant digits, because the zeros are both between non-zero digits. @OmryAtia is using a working definition that only leading and trailing zeros are non-significant. All other zeros should be counted. – Benjamin Apr 09 '18 at 11:33
  • 1
    Ah thanks, I get it now. When I read the original question I thought TS did not want to include the zeroes in 10203 in the summation. – Lennyy Apr 09 '18 at 11:38
0

You can try

library(tidyverse)
library(stringr)
a <- c(1000,100,1300, 1234,1,0,12140465,1003.02,1003.20,1003.22,0.00001)
tibble(a) %>% 
  mutate(b=format(a, scientific = FALSE)) %>% 
  separate(b, into = c("b1", "b2"), sep = "[.]", remove = F) %>% 
  mutate(b1 = case_when(str_sub(b1, str_length(b1),str_length(b1)) == "0" ~ str_count(b1, "[1-9]"),
                      TRUE ~ str_count(b1, "[0-9]"))) %>% 
  mutate(b2 = str_count(b2, "[1-9]")) %>% 
  mutate(res=b1+b2)
# A tibble: 11 x 5
         a b                   b1    b2   res
     <dbl> <chr>            <int> <int> <int>
 1 1.00e+3 "    1000.00000"     1     0     1
 2 1.00e+2 "     100.00000"     1     0     1
 3 1.30e+3 "    1300.00000"     2     0     2
 4 1.23e+3 "    1234.00000"     4     0     4
 5 1.00e+0 "       1.00000"     1     0     1
 6 0.      "       0.00000"     0     0     0
 7 1.21e+7 12140465.00000       8     0     8
 8 1.00e+3 "    1003.02000"     4     1     5
 9 1.00e+3 "    1003.20000"     4     1     5
10 1.00e+3 "    1003.22000"     4     2     6
11 1.00e-5 "       0.00001"     0     1     1
Roman
  • 17,008
  • 3
  • 36
  • 49
  • No idea what the conventional reading of significant figures is. Nevertheless I edited my answer. – Roman Apr 09 '18 at 11:24
  • Conventionally, any zeros that occur between non-zero digits are considered significant. So `10203` would have five significant figures; `01230` would have three. https://en.wikipedia.org/wiki/Significant_figures – Benjamin Apr 09 '18 at 11:26
0

I have adjusted the function in this article slightly to get it working again. All credits go to the author of the article. The function can probably be improved upon.

code:

x <- c(1000,100,1300, 1200.1, 12345.67, 12345.670)
sapply(x, FUN = sigdigs)
[1] 1 1 2 5 7 7

function:

sigdigs <- function(n) {
  i <- 0
  # Check for decimal point is present
  if(length(grep("\\.", as.character(n))) > 0) { # real number
    # Separate integer and fractional parts
    intfrac <- unlist(strsplit(as.character(n), "\\."))
    digstring <- paste(intfrac[1], intfrac[2], sep = "")
    numfigs <- nchar(digstring)
    while(i < numfigs) {
      # Find index of 1st non-zero digit from LEFT
      if(substr(digstring,i+1,i+1) == "0") {
        i <- i + 1
        next
      } else {
        sigfigs <- numfigs - i
        break
      }
    }   
  } else {  # must be an integer      
    digstring <- n
    numfigs <- nchar(digstring)
    while(i < numfigs) {
      # Find index of 1st non-zero digit from RIGHT
      if(substr(digstring, numfigs-i, numfigs-i) == "0") {
        i <- i + 1
        next
      } else {
        sigfigs <- numfigs - i
        break
      }
    }   
  }   
  return(sigfigs)
}
phiver
  • 23,048
  • 14
  • 44
  • 56