3

I've got a df just like this:

df = data.frame(longitude = c('-235.969', 
                       '-23.596.244', 
                       '-2.359.186'))

It´s an example of one column of geocode I'm trying to convert to something like this

new_df = data.frame(longitude = c('-23.5969', '-23.596244', '-23.59186'))

The main purpose is to use the geocode in an leaflet application.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 3
    There are ways to do this with regex, but, I would first ask how did you get your original `df` in the first place? Maybe something in the creation of `df` could change – SymbolixAU Aug 24 '18 at 02:26
  • 2
    `as.numeric(gsub('(-?\\d{2})(.*)', '\\1.\\2', gsub('\\.', '', df$longitude)))` maybe, but agreed that this should probably get fixed upstream. – alistaire Aug 24 '18 at 02:58
  • I received from a company case, but I put here the 3 different shapes the numbers appeared. I thought about looking in the documentation of the read.csv, but I couldn´t find anything. In the Excel, it opened just like this, but its reader interpreted properly. Maybe using some function from the excel's openning packages could manage this in a easier way? – Rafael Bicudo Aug 24 '18 at 15:53
  • 1
    @RafaelBicudo Excel might be the culprit. I would ty opening the csv file in a text editor in see it's logic and/or trying imporiting directly to R with `data.table::fread()`. – s_baldur Aug 25 '18 at 10:31
  • @snoram same error. Probably the error is at the orginal source from the company that sent me. – Rafael Bicudo Aug 25 '18 at 19:45

1 Answers1

1

If really necessary, I would do this in two steps:

library(magrittr)
gsub(".", "", df$longitude, fixed = TRUE) %>%
  sub("(\\d{2})", "\\1\\.", .)

[1] "-23.5969"   "-23.596244" "-23.59186" 

First drop any . then replace the first two digits with the first two digits + .

PS. without pipes you could do:

sub("(\\d{2})", "\\1\\.", gsub(".", "", df$longitude, fixed = TRUE))

EDIT: Important caveat:

As Matt points out this only works if your longitude ALWAYS consist of if your longitude degree is two digits (10-99).

s_baldur
  • 29,441
  • 4
  • 36
  • 69
  • 1
    Doesn't this only work if you can be sure that all your longitude values are between –99º and +99º? Otherwise your regex would replace -122.41º (e.g. San Francisco) with -12.241 (at the same latitude, a point in the Atlantic ~200km west of Lisbon). – Matt Ashby Aug 24 '18 at 08:52
  • Correct. But can you read anything more into the logic of the original post? – s_baldur Aug 24 '18 at 09:27
  • No, I can't work out what the second . might represent. My first thought was degrees.minutes.seconds, but the numbers are too large. As @SymbolixAU commented above, the solution here is going to come from understanding how the data ended up that way in the first place. – Matt Ashby Aug 24 '18 at 10:17
  • As I said above, I received this from a business case. It is a csv file, but the excel read it better than the read.csv function (literally, it was like I put here with this pattern). Maybe, using an excel file reader function, could I get a better result? – Rafael Bicudo Aug 24 '18 at 16:14