1

I've had some temperature measurements in .csv format and am trying to analyse them in R. For some reason the data files contain temperature with degree C following the numeric value. Is there a way to remove the degree C symbol and return the numeric value? I though of producing an example here but did not know how to generate a degree symbol in a string in R. Anyhow, this is what the data looks like:

> head(mm)
             dateTime Temperature
1 2009-04-23 17:01:00   15.115 °C
2 2009-04-23 17:11:00   15.165 °C
3 2009-04-23 17:21:00   15.183 °C

where the class of mm[,2] is 'factor'

Can anyone suggest a method for converting the second column to 15.115 etc?

Emma Tebbs
  • 1,457
  • 2
  • 17
  • 29

3 Answers3

2

You can remove the unwanted part and convert the rest to numeric all at the same time with scan(). Setting flush = TRUE treats the last field (after the last space) as a comment and it gets discarded (since sep expects whitespace separators by default).

mm <- read.table(text = "dateTime Temperature
1 '2009-04-23 17:01:00'  '15.115 °C'
2 '2009-04-23 17:11:00'   '15.165 °C'
3 '2009-04-23 17:21:00'   '15.183 °C'", header = TRUE)     

replace(mm, 2, scan(text = as.character(mm$Temp), flush = TRUE))
#              dateTime Temperature
# 1 2009-04-23 17:01:00      15.115
# 2 2009-04-23 17:11:00      15.165
# 3 2009-04-23 17:21:00      15.183

Or you can use a Unicode general category to match the unicode characters for the degree symbol.

type.convert(sub("\\p{So}C", "", mm$Temp, perl = TRUE))
# [1] 15.115 15.165 15.183

Here, the regular expression \p{So} matches various symbols that are not math symbols, currency signs, or combining characters. C matches the character C literally (case sensitive). And type.convert() takes care of the extra whitespace.

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
0

If all of your temperature values have the same number of digits you can make left and right functions (similar to those in Excel) to select the digits that you want. Such as in this answer from a different post: https://stackoverflow.com/a/26591121/4459730

First make the left function:

left = function (string,char){
substr(string,1,char)
}

Then recreate your Temperature string using just the digits you want:

mm$Temperature<-left(mm$Temperature,6)
Community
  • 1
  • 1
NMc
  • 43
  • 5
  • 1
    This seems like a very fragile way to solve the problem. The regular expression suggestions in the comments will be much more robust. – Gregor Thomas Feb 18 '15 at 18:40
  • @Gregor True - just giving another option. I've found the left and right functions to be helpful with stuff I have worked on. – NMc Feb 18 '15 at 18:44
0

degree symbol is represented as \u00b0, hence following code should work:

df['Temperature'] = df['Temperature'].replace('\u00b0','', regex=True)
Milind Deore
  • 2,887
  • 5
  • 25
  • 40