0

Pretend I have a vector:

testVector <- c("I have 10 cars", "6 cars", "You have 4 cars", "15 cars")

Is there a way to go about parsing this vector, so I can store just the numerical values:

10, 6, 4, 15

If the problem were just "15 cars" and "6 cars", I know how to parse that, but I'm having difficulty with the strings that have text in front too! Any help is greatly appreciated.

Sheila
  • 2,438
  • 7
  • 28
  • 37

3 Answers3

5

For this particular common task, there's a nice helper function in tidyr called extract_numeric:

library(tidyr)

extract_numeric(testVector)
## [1] 10  6  4 15
alistaire
  • 42,459
  • 4
  • 77
  • 117
3

We can use str_extract with pattern \\d+ which means to match one or more numbers. It can be otherwise written as [0-9]+.

library(stringr)
as.numeric(str_extract(testVector, "\\d+"))
#[1] 10  6  4 15

If there are multiple numbers in a string, we use str_extract_all which wil1 return a list output.


This can be also done with base R (no external packages used)

as.numeric(regmatches(testVector, regexpr("\\d+", testVector)))
#[1] 10  6  4 15

Or using gsub from base R

as.numeric(gsub("\\D+", "", testVector))
#[1] 10  6  4 15

BTW, some functions are just using the gsub, from extract_numeric

function (x) 
 {
   as.numeric(gsub("[^0-9.-]+", "", as.character(x)))
 }

So, if we need a function, we can create one (without using any external packages)

ext_num <- function(x) {
             as.numeric(gsub("\\D+", "", x))
         }
ext_num(testVector)
#[1] 10  6  4 15
akrun
  • 874,273
  • 37
  • 540
  • 662
1

This might also come in handy .

testVector <- gsub("[:A-z:]","",testVector)
testVector <- gsub(" ","",testVector)

> testVector
[1] "10" "6"  "4"  "15"
Pankaj Kaundal
  • 1,012
  • 3
  • 13
  • 25