0
a<- "\n\t\t\t\n\t\t\t\New\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t - \n\t\t\t\t\n\t\t\t\t95\n\t\t\t\tdays\n\t\t\t\n\t\t"

How to isolate only the number 95 from this string? I tried the gsub and str_replace but it removes the 95 too I removed this string from a site through the rvest package

3 Answers3

2

We can use gsub from base R to remove all characters that are not digits

gsub("\\D+", "", a)
#[1] "95"

Or as commented by @G Grothendieck

gsub("\\D", "", a)

Or with str_remove_all

library(stringr)
str_remove_all(a, "\\D+")
#[1] "95"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

The previous answers have approached the desired output negatively, by defining patterns for what is to be removed, namely anything that is not a number (hence \\D with uppercase D). Here's a positive solution defining what is to be kept, and extracting it via a self-defined function extract:

Define function, including the pattern to be matched \\d{2}(i.e., two contiguous numbers):

extract <- function(x) unlist(regmatches(x, gregexpr("\\d{2}", x, perl = T)))

Apply function to data a:

extract(a)
[1] "95"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
-1

I was going to suggest to use readr::parse_number but then I learned that it will fail on the - charachter and then additional work is needed as explained here.

novica
  • 655
  • 4
  • 11