How to remove the characters from a string and leave only the numbers in R?

Question

a<- "\n\t\t\t\n\t\t\t\New\n\t\t\t\n\t\t\t\n\t\t\t\t\n\t\t\t\t\t - \n\t\t\t\t\n\t\t\t\t95\n\t\t\t\tdays\n\t\t\t\n\t\t"

How to isolate only the number 95 from this string? I tried the gsub and str_replace but it removes the 95 too I removed this string from a site through the rvest package

akrun · Answer 1 · 2019-12-25T21:28:53.627

2

We can use gsub from base R to remove all characters that are not digits

gsub("\\D+", "", a)
#[1] "95"

Or as commented by @G Grothendieck

gsub("\\D", "", a)

Or with str_remove_all

library(stringr)
str_remove_all(a, "\\D+")
#[1] "95"

edited Dec 25 '19 at 21:28

answered Dec 25 '19 at 20:24

akrun

874,273
37
540
662

score 0 · Answer 2 · answered Dec 25 '19 at 21:38

The previous answers have approached the desired output negatively, by defining patterns for what is to be removed, namely anything that is not a number (hence \\D with uppercase D). Here's a positive solution defining what is to be kept, and extracting it via a self-defined function extract:

Define function, including the pattern to be matched \\d{2}(i.e., two contiguous numbers):

extract <- function(x) unlist(regmatches(x, gregexpr("\\d{2}", x, perl = T)))

Apply function to data a:

extract(a)
[1] "95"

score -1 · Answer 3 · answered Dec 25 '19 at 21:27

-1

I was going to suggest to use readr::parse_number but then I learned that it will fail on the - charachter and then additional work is needed as explained here.

answered Dec 25 '19 at 21:27

novica

655
4
11

How to remove the characters from a string and leave only the numbers in R?

3 Answers3