Is parse_number supposed to fail when there are multiple periods in the string?

Question

In the readr package in R, the parse_number function fails when there is more than one period in the string. Is this a bug, or is this by design?

Examples follow:

> library(readr)
> parse_number("asf125")
[1] 125
> parse_number("asf.125")
[1] 0.125
> parse_number(".asf.125")
Warning: 1 parsing failure.
row # A tibble: 1 x 4 col     row   col expected actual expected   <int> <int>    <chr>  <chr> actual 1     1    NA a number      .

[1] NA
attr(,"problems")
# A tibble: 1 x 4
    row   col expected actual
  <int> <int>    <chr>  <chr>
1     1    NA a number      .

I guess it "tries" to parse, it can't possibly cover all edge cases. Why not clean up with regex before parsing? As number can only have 1 dot `"."` and multiple commas `","` as thousands separator. — zx8754, Dec 04 '18 at 14:22
Thanks @zx8754. I'm using regex as a workaround, but I wanted to see if this was the expected behavior for `parse_number`. — Jake Fisher, Dec 04 '18 at 14:28
For future readers who are trying to do this with a regex, it'll be something like `regmatches(x = ".asf.125", m = regexpr(text = ".asf.125", pattern = "[[:digit:]]+"))`. — Jake Fisher, Dec 04 '18 at 14:41

score 0 · Answer 1 · answered Dec 05 '18 at 01:27

This extracts an arbitrary possibly floating point number with a sign in a string:

as.numeric(stringi::stri_match_first_regex(
  c("asf125", "asf.125", ".asf.125"), 
  "[-+]?[[:digit:]]*\\.?[[:digit:]]+"
)[,1])
## [1] 125.000   0.125   0.125

This just does positive or negative "integers":

as.numeric(stringi::stri_match_first_regex(
  c("asf125", "asf.125", ".asf.125"), 
  "[-+]?[[:digit:]]+"
)[,1])
## [1] 125 125 125

Relying on a generic function like readr::parse_number() seems fraught with peril.

Is parse_number supposed to fail when there are multiple periods in the string?

1 Answers1