Python - Downsampling of a list of points, with given coordinates and distance between two adjacent points in the list

Question

I really hope someone can help me with this! I'm trying to make this script run as I want, but I cannot seem to get the hang of it.

The file with the data to be handled is input from a GPS and looks like this:

Line 20081002-1119.nmea
$GPGGA,094406.00,5849.40174,N,01738.15828,E,2,08,00.9,00003.26,M,0024.93,M,005,0734*62
$GPGGA,094407.00,5849.40177,N,01738.15827,E,2,08,00.9,00003.22,M,0024.93,M,005,0734*6B
$GPGGA,094408.00,5849.40174,N,01738.15826,E,2,08,00.9,00003.00,M,0024.93,M,006,0734*65
$GPGGA,094409.00,5849.40171,N,01738.15831,E,2,08,00.9,00003.24,M,0024.93,M,005,0734*62
$GPGGA,094410.00,5849.40176,N,01738.15833,E,2,08,00.9,00003.29,M,0024.93,M,006,0734*61
$GPGGA,094411.00,5849.40172,N,01738.15831,E,2,08,00.9,00003.31,M,0024.93,M,004,0734*6D
$GPGGA,094412.00,5849.40172,N,01738.15830,E,2,08,00.9,00003.15,M,0024.93,M,005,0734*68
$GPGGA,094413.00,5849.40175,N,01738.15834,E,2,08,00.9,00003.18,M,0024.93,M,005,0734*67
$GPGGA,094414.00,5849.40173,N,01738.15835,E,2,08,00.9,00003.16,M,0024.93,M,006,0734*6A
EOL

My output file should look like this (now with made up distances just to show what I want):

Line 20081002-1119.nmea
58.853952   17.643113   102.15 
58.853946   17.643243   101.63 
58.853939   17.643372   105.93 
58.853933   17.643503   104.01 
58.853927   17.643633   104.25 
...
EOL

The columns are: longitude, latitude, distance to the point above.

How do I do to downsample this to a given interval between two points (100 meters in my case)?

The script I've manged so far:`

indata=open('C:/nav.nmea', 'r')
outdata=open('C:/nav_out.txt', 'w')

from math import *

coords_list=[]
coords=[]

def distance(coords_list):
    for (longi2,lati2) in coords_list:
        for (longi1,lati1) in coords_list:
            a = sin(lati1) * sin(lati2)+cos(longi1-longi2)*cos(lati1) * cos(lati2)
            c= 2* asin(sqrt(a))

            s= (6367* c)/100000 # For results in meters

        if s<100:
            # Here I want to discard current line if not s<100 and jump to the next line
        else:
            "Return the valid lines"
    return s


for line in indata:

    if line.startswith('$GPGGA'):

        data=line.split(",")
        # Import only coordinates from input file

        LON=float(data[2])/100

        LAT=float(data[4])/100

        # Convert coordinates from DDMM.MMMM to DD.DDDDDD

        lon=((LON-int(LON))/60)*100+int(LON)

        lat=((LAT-int(LAT))/60)*100+int(LAT)

        coords_list.append((lon,lat))

        outdata.writelines("%0.6f\t" %lon)

        outdata.writelines("%0.6f\t" %lat)
        outdata.writelines("%s \n" %distance(coords_list))


    elif line.startswith('EOL'):

        outdata.writelines("EOL")

    elif line.startswith('Line'):

        LineID=line

        outdata.writelines('\n%s' %LineID)


indata.close()     

outdata.close()

`

Mine is not really an answer, more of a contribution. If you are interested you can find [here](https://github.com/quasipedia/admiral/tree/experimental/src) a command utility that I wrote to process the logs of my sailing robot. It does more of what you asked for [but _also_ what you asked]... but maybe you might find there stuff of some use for you. Keep in mind that the only part that truly works is the commandline utility `logman.py` and the connected libs. The rest is residual code from a previous iteration and it is probably broken. — mac, Nov 22 '11 at 13:32

Silas Parker · Answer 1 · 2022-05-03T12:28:05.960

4

For point reduction in a curve, you probably want to use the Ramer–Douglas–Peucker algorithm. This maintains the overall shape of the curve, but removes the detail, the amount removed can be controlled by a parameter.

Note that the code on the linked Wikipedia page is pseudo code, and not python.

I've found a python implementation of the DP line-simplication algorthim, I haven't tested it and can't vouch for its correctness.

edited May 03 '22 at 12:28

answered Nov 22 '11 at 13:38

Silas Parker

8,017
1
28
43

Ok! Seems like that is kind of what I'm looking for. But how do I use it in my script? I'm quite new to this whole programming world, so I can say I feel totally lost atm.. – user1059822 Nov 22 '11 at 13:53
I've found an implementation that is freely available, I'll link to it in my answer. – Silas Parker Nov 22 '11 at 14:30
I found that one too, but is there no easier way to solve this? This script is part of an introductory course to digital processing of data, and I cannot imagine we're supposed to do things as advanced as your link. My initial plan was to use iteration over the lines with somethin like "if distance < 100, don't use that line, jump to the next line". Can I do like that? And in that case, how to tell the script to discard the unwanted lines? – user1059822 Nov 22 '11 at 14:38

unutbu · Answer 2 · 2011-11-22T15:05:01.243

Here is a fairly simple way to downsample the data: coord1 is used to store the previous coordinate. Each time through the loop, the distance between coord1 and the new coordinate, coord2, is computed. When the distance is <100, the rest of the loop is skipped.

import math
import itertools

def distance(coord1,coord2):
    longi1,lati1=coord1
    longi2,lati2=coord2
    a = (math.sin(lati1)*math.sin(lati2)
         +math.cos(longi1-longi2)*math.cos(lati1)*math.cos(lati2))
    c = 2*math.asin(math.sqrt(a))
    s = (6367*c)/100000 # For results in meters
    return s

with open('nav.nmea', 'r') as indata:
    with open('nav_out.txt', 'w') as outdata:
        coords=[]
        coord1=None
        for line in indata:
            if line.startswith('$GPGGA'):
                data=line.split(",")
                # Import only coordinates from input file
                LON=float(data[2])/100
                LAT=float(data[4])/100
                # Convert coordinates from DDMM.MMMM to DD.DDDDDD
                lon=((LON-int(LON))/60)*100+int(LON)
                lat=((LAT-int(LAT))/60)*100+int(LAT)
                coords.append((lon,lat))
                if coord1 is None:
                    # The first time through the loop `coord1` is None
                    outdata.write('%0.6f\t%0.6f\t%s \n'%(lon,lat,0))
                    coord1=(lon,lat)
                else:
                    coord2=(lon,lat)
                    dist=distance(coord1,coord2)
                    if dist<100:
                        # go back to the beginning of the loop
                        continue
                    outdata.write('%0.6f\t%0.6f\t%s \n'%(lon,lat,dist))
                    coord1=coord2
            elif line.startswith('EOL'):
                outdata.writelines("EOL")
            elif line.startswith('Line'):
                LineID=line
                outdata.writelines('\n%s' %LineID)

This is definitely what I am looking for!! Is there also something to add to keep the last line in each of the different nmea-lines? Also, I didn't get it to work properly... This is the result I got in the outdata: ` Line 20081002-1119.nmea 58.853952 17.643113 0 EOL Line 20081002-1119.nmea EOL Line 20081002-1119.nmea EOL Line 20081003-0710.nmea EOL Line 20081003-0834.nmea EOL` — user1059822, Nov 22 '11 at 16:09
Check your formula for distance. When I run my script on the data you posted, the distances were all around 0.2. Yet you say you want to filter out all distances < 100. The problem is most likely in the `distance` function. Is your formula correct? If you need help, post a sample of your input data **and** the desired output. — unutbu, Nov 22 '11 at 16:13
Now looking at `distance` more closely, I see `math.asin` has a range of `[-pi/2, +pi/2]`. So `c` ranges from `[-pi,+pi]`. So `s` ranges over (roughly) `[-0.1,0.1]`. So the return value from `distance` will always be far less than 100. Something appears to be wrong with the formula. — unutbu, Nov 22 '11 at 16:21
This is what my assignment says: The navigation points you get from the supervisor will be very closely located, so they have to be filtered to steps of approximately 100 m (so-called downsampling). All other points should be discarded. The first and last points of every line should be maintained, though. Which means that about 999 out of 1000 lines needs to be discarded. The total amount of lines in the indata-file is about 19k. And the distance formula should be correct, I tried it out manually and it worked just fine. — user1059822, Nov 22 '11 at 16:24

Python - Downsampling of a list of points, with given coordinates and distance between two adjacent points in the list

2 Answers2