0

I have some GPS coordinate data e.g.

38 41'13.2"N
96 30'23.4"E 

How can I check it has a constant format? Sometimes the data is like:

38 41.2342
96 30.1211

I tried using re, but the punction inside the string makes it difficult to pass through.

Ideal format is XX XX'XX.X"(E or N)

I tried

import re
r = re.compile(".* .*'.*..*"N")
if r.match('48 46'55.3"N') is not None:
   print 'matches'

taken from here

William Baker Morrison
  • 1,642
  • 4
  • 21
  • 33
  • Punctuation should not be a problem. What did you try? – Jongware Nov 19 '18 at 17:02
  • This will work: `[NE]$`. If not, more details please. – Jongware Nov 19 '18 at 17:04
  • 1
    What exactly is the "correct" format? Do you expect values to be zero-padded? How many decimal places do you accept, and where? Are integers required to have zero'd decimal places? What values are acceptable for the cardinal direction at the end (`S`, `SW`, `SSW`, etc...)? – Patrick Haugh Nov 19 '18 at 17:04

2 Answers2

2

You haven't escaped your quotes in your example. Notice the \" on line 2, and the \' on line 3. This is important so that python knows the quote is part of the string, and not terminating it.

I have also used a slightly more explicit pattern.

import re
r = re.compile("^\d{2} \d{2}'\d{2}.\d{1}\"[EN]$")
if r.match('48 46\'55.3"N') is not None:
    print 'matches'
Jim Wright
  • 5,905
  • 1
  • 15
  • 34
0

Your punctuation problems are (1) you need to escape the . with a backslash when you want to match the actual decimal point, otherwise it matches any character; and (2) you need to escape the double-quote or otherwise prevent it from terminating your string.

The best way to write this as a readable debuggable regex is to use a Python "raw" string r"like this" which allows backslashes without escaping, and furthermore to triple-quote it, which lets you to use both ' and " inside it without escaping. And since triple-quoted strings allow multi-line expressions, you could even compile in VERBOSE mode, allowing whitespace and comments. Debuggability of your subsequent matching/extraction code is also improved if you use the (?P<...>) named-group syntax in your regex—groups will then be accessible by meaningful names, in the match object's groupdict() output. Taken all together, that gives us:

PATTERNS = [ # a list of alternative acceptable formats

    re.compile( r"""              
        ^\s*                      # beginning of string (optional whitespace)
        (?P<degrees>\d+)[\s]      # integer number of degrees (NB: might be desirable to insert the degree symbol into the square brackets here, to allow that as a possibility?)
        (?P<minutes>\d+)'         # integer number of minutes
        (?P<seconds>\d+(\.\d*)?)" # seconds, with optional decimal point and decimal places
        (?P<axis>[NE]?)           # optional 'N' or 'E' character (remove '?' to make it compulsory)
        \s*$                      # end of string (optional whitespace)
    """, re.VERBOSE ),            

    re.compile( r"""
        ^\s*                      # beginning of string (optional whitespace)
        (?P<degrees>\d+)[\s]      # integer number of degrees (NB: might be desirable to insert the degree symbol into the square brackets here, to allow that as a possibility?)
        (?P<minutes>\d+(\.\d*)?)  # minutes, with optional decimal point and decimal places
        (?P<axis>[NE]?)           # optional 'N' or 'E' character (remove this line if this is never appropriate in this format)
        \s*$                      # end of string (optional whitespace)
    """, re.VERBOSE ),            

]
jez
  • 14,867
  • 5
  • 37
  • 64