9

I have a table called LOAN containing column named RATE in which the observations are given in percentage for example 14.49% how can i format the table so that all value in rate are edited and % is removed from the entries so that i can use plot function on it .I tried using strsplit.

strsplit(LOAN$RATE,"%")

but got error non character argument

3 Answers3

10

Items that appear to be character when printed but for which R thinks otherwise are generally factor classes objects. I'm also guessing that you are not going to be happy with the list output that strsplit will return. Try:

gsub( "%", "", as.character(LOAN$RATE) n)

Factors which are appear numeric can be a source of confusion as well:

> factor("14.9%")
[1] 14.9%
Levels: 14.9%
> as.character(factor("14.9%"))
[1] "14.9%"
> gsub("%", "", as.character(factor("14.9%")) )
[1] "14.9"

This is especially confusing since print.data.frame removes the quotes:

> data.frame(z=factor("14.9%"), zz=factor(14.9))
      z   zz
1 14.9% 14.9
micstr
  • 5,080
  • 8
  • 48
  • 76
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 4
    Or `gsub("%", "", paste(LOAN$RATE))` for us lazy-typists. Did I say lazy? I meant efficient. – Joshua Ulrich Feb 05 '13 at 22:33
  • Me, virtuous? I don't think that's the word you meant to use. Unless by "virtuous" you meant "lazy" and/or "sarcastic". – Joshua Ulrich Feb 05 '13 at 22:44
  • This didnt work for me. First of all, the last "n" in your code-block produces an unexpected symbol error...then I tried @JoshuaUlrich suggestion but nothing changed...the column I am trying to modify is type `character`. Help? – zsad512 Jan 20 '18 at 14:16
  • 1
    I used `data$fundraising_goal <- as.numeric(gsub("\\$", "", data$fundraising_goal))` but this produces "Warning: NAs introduced by coercion" and Im not sure where/why... – zsad512 Jan 20 '18 at 14:22
  • Also, is there a way to apply this to multiple columns at once? – zsad512 Jan 20 '18 at 14:22
  • Without a specific example, it's not possible to say exactly why you got the warning. Why not use is.na on the result to generate a logical vector to identify which of the original items were the problems. And using `lapply` with a helper function is always available as a strategy. There are many worked examples. A better strategy than a downvote would be to ask a question. – IRTFM Jan 20 '18 at 15:58
5

This can be achieved using the mutate verb from the tidyverse package. Which in my opinion is more readable. So, to exemplify this, I create a dataset called LOAN with a focus on the RATE to mimic the problem above.

library(tidyverse)
LOAN <- data.frame("SN" = 1:4, "Age" = c(21,47,68,33), 
                   "Name" = c("John", "Dora", "Ali", "Marvin"),
                   "RATE" = c('16%', "24.5%", "27.81%", "22.11%"), 
                   stringsAsFactors = FALSE)
head(LOAN)
  SN Age   Name   RATE
1  1  21   John    16%
2  2  47   Dora  24.5%
3  3  68    Ali 27.81%
4  4  33 Marvin 22.11%

In what follows, mutate allows one to alter the column content, gsub does the desired substitution (of % with "") and as.numeric() converts the RATE column to numeric value, keeping the data cleaning flow followable.

LOAN <- LOAN %>% mutate(RATE = as.numeric(gsub("%", "", RATE)))
head(LOAN)
  SN Age   Name  RATE
1  1  21   John 16.00
2  2  47   Dora 24.50
3  3  68    Ali 27.81
4  4  33 Marvin 22.11
odunayo12
  • 425
  • 5
  • 10
  • 1
    What if instead of a single character, each element of column RATE is associated with different characters? Hypothetically speaking if RATE contained the following elements 16%, 24.5?, 27.81=, 22.11: How can one remove different characters associated with each element? – Debjyoti Oct 23 '20 at 11:39
  • @Debjyoti use `str_replace_all("%|=|:", "", RATE)` instead of `gsub()` and everything will be fine. Note that the to keep adding patterns, all you need do is to add `|` and the unwanted character next to it. For some special characters such as `"*, $"` you need to escape each by adding "\". For example , "$" would become `"\$"` and then added the pattern above looks something like `str_replace_all("%|=|:|\$", "", RATE)`. – odunayo12 Oct 24 '20 at 13:08
0

Try:

LOAN$RATE <- sapply(LOAN$RATE, function(x), gsub("%", "",  x))
AlSub
  • 1,384
  • 1
  • 14
  • 33