28

I have a whole column of numbers that include dot separators at the thousands and comma instead of dot as an dismal separator. When I try to create a numeric column out of them, I lose all data.

var1 <- c("50,0", "72,0", "960,0", "1.920,0", "50,0", "50,0", "960,0")
df <- cbind(var1, var2 = as.numeric(gsub(".", "", as.character(var1))))

and wound up with:

 var1      var2
[1,] "50,0"    NA  
[2,] "72,0"    NA  
[3,] "960,0"   NA  
[4,] "1.920,0" NA  
[5,] "50,0"    NA  
[6,] "50,0"    NA  
[7,] "960,0"   NA 

What am I doing wrong?

Nils Olve
  • 287
  • 1
  • 4
  • 6

3 Answers3

63

You need to escape the "." in your regular expression, and you need to replace the commas with a "." before you can convert to numeric.

> as.numeric(gsub(",", ".", gsub("\\.", "", var1)))
[1]   50   72  960 1920   50   50  960
Joshua Ulrich
  • 173,410
  • 32
  • 338
  • 418
  • 4
    ``format(var1, decimal.mark = '.')`` is an alternative way to change commas into dots. Can't say about pros and cons, it was just a side comment. – PatrickT May 17 '16 at 07:42
  • When I pass this function a numeric vector it returns the error ` Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('double', 'numeric')"` – d8aninja May 19 '16 at 22:25
10

For things like these I like scan() the most, because it is easy to understand. Just use

scan(text=var1, dec=",", sep=".")

Alas, it's not faster than gsub(), which on the other hand seemes overpowered. Hence another, and fast, option is sub():

as.numeric(sub(",", ".", sub(".", "", var1, fixed=TRUE), fixed=TRUE))

And just in case: When you're reading var1 from a file directly, just read it in with a specified separator: read.table("file.txt", dec=",", sep=".")

MERose
  • 4,048
  • 7
  • 53
  • 79
1

You can use function "type_convert", from "readr" package. I am reading an ODS file (Locale Portuguese), and converting the numbers:

library('readODS')
library('tidyverse')
data <- read_ods('mod-preditivo.ods', sheet=1,col_names = TRUE,range='a1:b30',col_types=NA)
df <- type_convert(data,trim_ws=TRUE,col_types = cols(Pesos=col_integer(),Alturas=col_double()),locale = locale(decimal_mark = ","))
str(df)
cleuton
  • 165
  • 1
  • 5