2

In the readr package in R, the parse_number function fails when there is more than one period in the string. Is this a bug, or is this by design?

Examples follow:

> library(readr)
> parse_number("asf125")
[1] 125
> parse_number("asf.125")
[1] 0.125
> parse_number(".asf.125")
Warning: 1 parsing failure.
row # A tibble: 1 x 4 col     row   col expected actual expected   <int> <int>    <chr>  <chr> actual 1     1    NA a number      .

[1] NA
attr(,"problems")
# A tibble: 1 x 4
    row   col expected actual
  <int> <int>    <chr>  <chr>
1     1    NA a number      .
Jake Fisher
  • 3,220
  • 3
  • 26
  • 39
  • 1
    I guess it "tries" to parse, it can't possibly cover all edge cases. Why not clean up with regex before parsing? As number can only have 1 dot `"."` and multiple commas `","` as thousands separator. – zx8754 Dec 04 '18 at 14:22
  • Thanks @zx8754. I'm using regex as a workaround, but I wanted to see if this was the expected behavior for `parse_number`. – Jake Fisher Dec 04 '18 at 14:28
  • 1
    For future readers who are trying to do this with a regex, it'll be something like `regmatches(x = ".asf.125", m = regexpr(text = ".asf.125", pattern = "[[:digit:]]+"))`. – Jake Fisher Dec 04 '18 at 14:41
  • 2
    Please add as answer for future readers. – zx8754 Dec 04 '18 at 14:45

1 Answers1

0

This extracts an arbitrary possibly floating point number with a sign in a string:

as.numeric(stringi::stri_match_first_regex(
  c("asf125", "asf.125", ".asf.125"), 
  "[-+]?[[:digit:]]*\\.?[[:digit:]]+"
)[,1])
## [1] 125.000   0.125   0.125

This just does positive or negative "integers":

as.numeric(stringi::stri_match_first_regex(
  c("asf125", "asf.125", ".asf.125"), 
  "[-+]?[[:digit:]]+"
)[,1])
## [1] 125 125 125

Relying on a generic function like readr::parse_number() seems fraught with peril.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205