47

I have data with percent signs (%) that I want to convert into numeric. I run into a problem when converting character of percentage to numeric. E.g. I want to convert "10%" into 10%, but

as.numeric("10%")

returns NA. Do you have any ideas?

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Frank Wang
  • 1,462
  • 3
  • 17
  • 39

6 Answers6

68

10% is per definition not a numeric vector. Therefore, the answer NA is correct. You can convert a character vector containing these numbers to numeric in this fashion:

percent_vec = paste(1:100, "%", sep = "")
as.numeric(sub("%", "", percent_vec))

This works by using sub to replace the % character by nothing.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
34

Remove the "%", convert to numeric, then divide by 100.

x <- c("10%","5%")
as.numeric(sub("%","",x))/100
# [1] 0.10 0.05
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
25

If you're a tidyverse user (and actually also if not) there's now a parse_number function in the readr package:

readr::parse_number("10%")

The advantage is generalization to other common string formats such as:

parse_number("10.5%")
parse_number("$1,234.5")
Giora Simchoni
  • 3,487
  • 3
  • 34
  • 72
  • 5
    I really enjoy all the old SO questions that now have sexy Tidyverse solutions. – Andrew Brēza Dec 11 '18 at 01:20
  • 2
    readr::parse_number("10%") produces '10' - the number here is 0.1 - tidyverse might be sexy but would help if it really works too:) – sen_saven Jun 09 '22 at 17:11
  • This also doesn't work when `x` is already a number. I need my function to handle percents or numeric columns (depending on which column is being plotted). `readr::parse_number("-10.5")` yields `-10.5` (should be `-0.105`) and`readr::parse_number(-10.5)` returns an error. Two downvotes for this solution. – Dannid Apr 03 '23 at 15:59
  • @Dannid (a) why should `readr::parse_number("-10.5")` return `-0.105` and not `-10.5`, it is a general function for parsing numbers (b) the function takes a "Character vector of values to parse", so `readr::parse_number(-10.5)` results in a bad input error, makes sense. – Giora Simchoni Apr 04 '23 at 13:31
  • Hi @GioraSimchoni it looks like I mistyped. For the first example I should have included the percent sign: `readr::parse_number("-10.5%")` should return `-0.105` rather than `-10.5` – Dannid Apr 27 '23 at 02:56
8

Get rid of the extraneous characters first:

topct <- function(x) { as.numeric( sub("\\D*([0-9.]+)\\D*","\\1",x) )/100 }
my.data <- paste(seq(20)/2, "%", sep = "")
> topct( my.data )
 [1] 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.045 0.050 0.055 0.060 0.065 0.070 0.075 0.080
[17] 0.085 0.090 0.095 0.100

(Thanks to Paul for the example data).

This function now handles: leading non-numeric characters, trailing non-numeric characters, and leaves in the decimal point if present.

Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
  • 1
    It's more complex because it strips out anything non-numeric that follows the numbers.... – Ari B. Friedman Nov 30 '11 at 16:30
  • 1
    Edited to make it handle preceeding characters as well, and to make it a function that you can re-use. – Ari B. Friedman Nov 30 '11 at 16:36
  • @PaulHiemstra Thanks. I was a bit hesitant to make it too general, and would still probably prefer your solution, since having any non-"%", non-digit characters might be a sign that something isn't really a percentage after all. Thus having an NA returned might be preferable to having it return something sensible. – Ari B. Friedman Nov 30 '11 at 16:38
  • As you said, for a more general function your solution would be preferable. But than it would be called percentChar2numeric() or something and the OP would have to problem with the complexity (which would be hidden in the function). – Paul Hiemstra Nov 30 '11 at 16:40
4

I wanted to convert an entire column and combined the above answers.

pct_to_number<- function(x){
  x_replace_pct<-sub("%", "", x)
  x_as_numeric<-as.numeric(x_replace_pct)
  }
df[['ColumnName']] = pct_to_number(df[['ColumnName']])
nanselm2
  • 1,397
  • 10
  • 11
2

Try with:

> x = "10%"
> as.numeric(substr(x,0,nchar(x)-1))
[1] 10

This works also with decimals:

> x = "10.1232%"
> as.numeric(substr(x,0,nchar(x)-1))
[1] 10.1232

The idea is that the symbol % is always at the end of the string.

Galled
  • 4,146
  • 2
  • 28
  • 41