TSP in Python: Importing a .csv-file in code for randomized data points

Question

im quite the noob when it comes to coding. I am currently trying to code a solution for a traveling salesman problem. I already found functioning code for a set of random values (as seen below or in https://nbviewer.jupyter.org/url/norvig.com/ipython/TSP.ipynb#Random-Sets-of-Cities). Now i want to use data in a .csv file that i have created, containing data of cities in a coordinate system (x and y values). However i cannot figure out what exactly i have to change in the code.

I tried changing the definition of the Cities() function into return frozenset(City() for row in csv.reader(staedtekoordinaten.csv, delimiter=";") but this does not work either.

It is probably quite the stupid question, but i would highly appreciate if someone could help me... Thanks in advance!

import matplotlib.pyplot as plt
import random
import time
import itertools
import urllib
import csv
import functools
from statistics import mean, stdev


def alle_touren_tsp(cities):
    # Erstelle eine Auflistung aller möglichen auszuführenden Touren. Aus dieser Liste
    # kann später eine Auswahl per "min" Funktion getroffen werden, die die kürzeste Route darstellt.
    return minimale_tour(alle_touren(cities))


def minimale_tour(tours):
    # Definiere Funktion zur Auswahl der kürzesten Tour.
    return min(tours, key=laenge_tour)  # aus der Sammlung "touren" minimiere ELement "distanz_tour"


#Verbesserung - Redundante Touren werden vermieden (reduziert Rechenaufwand)
def alle_touren(cities):
    # Gibt List mit Touren aus mit Permutation von Städten, alle permutationen starten jedoch mit
    # derselben Stadt (verhindert Redundanzen)
    start = first(cities)
    return [[start] + Tour(rest)
            for rest in itertools.permutations(cities - {start})]

def first(collection):
    #Iteration über die collection, Ausgabe des jeweils ersten Elements
    return next(iter(collection))

Tour = list  # Tours are implemented as lists of cities


def laenge_tour(tour):
    # Gesamtsumme einer Tour.
    # Addiert die gelaufenen Teildistanzen zwischen zwei Datenpunkten (= locations)
    return sum(distance(tour[i], tour[i - 1])  # für i=0 wird i-1 der letzte Datenpunkt
               for i in range(len(tour)))
    # Für alle Elemente in tour (Anzahl = len(tour)) wird die Distanz von der vorherigen Location (i-1)
    # zur aktuellen (i) summiert)


# Lösung mit Subclass von Complex - Jeder Datenpunkt (also jeder Ort) wird mit zwei Koordinaten gespeichert,
# der Einfachheit halber im komplexen Zahlenraum (dort hat jeder Punkt generell zwei Koordinaten)
class Datenpunkt(complex):
    x = property(lambda p: p.real)
    y = property(lambda p: p.imag)


City = Datenpunkt


def distance(A, B):
    # Definiert die Distanz "zu laufende Länge" zwischen den Punkten A und B => Euklidische Distanz
    return abs(A - B)


# Testdatenpunkt: Randomized Cities mit Seed 42
def Cities(n, width=900, height=600, seed=42):
    # Set aus n Datenpunkten mit randomized Koordinaten, dargestellt in width x height (900x600 weil Python-Standard)
    random.seed(seed * n)
    return frozenset(City(random.randrange(width), random.randrange(height))
                     # frozenset, damit kein Algorithmus einfach Datenpunkte löscht
                     # (i. S. v. der kürzeste Weg wäre, erst gar keinen Weg zu laufen)
                     for c in range(n))

alle_touren_tsp(Cities(8))

def plot_tour(tour):
    # "Plot the cities as circles and the tour as lines between them."
    plot_lines(list(tour) + [tour[0]])


def plot_lines(points, style='bo-'):
    # "Plot lines to connect a series of points."
    plt.plot([p.x for p in points], [p.y for p in points], style)
    plt.axis('scaled');
    plt.axis('off')


plot_tour(alle_touren_tsp(Cities(8)))


def plot_tsp(algorithm, cities):
    # "Apply a TSP algorithm to cities, plot the resulting tour, and print information."
    # Find the solution and time how long it takes
    t0 = time.process_time()
    tour = algorithm(cities)
    t1 = time.process_time()
    assert valid_tour(tour, cities)
    plot_tour(tour);
    plt.show()
    print("{} city tour with length {:.1f} in {:.3f} secs for {}"
          .format(len(tour), laenge_tour(tour), t1 - t0, algorithm.__name__))


def valid_tour(tour, cities):
    # "Is tour a valid tour for these cities?"
    return set(tour) == set(cities) and len(tour) == len(cities)


plot_tsp(alle_touren_tsp, Cities(8))

As for the new code (if I understood the remarks correctly:

import csv


# Lösung mit Subclass von Complex - Jeder Datenpunkt (also jeder Ort) wird mit zwei Koordinaten gespeichert,
# der Einfachheit halber im komplexen Zahlenraum (dort hat jeder Punkt generell zwei Koordinaten)
class Datenpunkt(complex):
    x = property(lambda p: p.real)
    y = property(lambda p: p.imag)


class City(Datenpunkt):

    # this is a subclass of `complex` which sets itself up in `__new__`
    def __new__(cls, x, y, name):
        self = super().__new__(cls, float(x), float(y))
        self.name = name
        return self

    def __str__(self):
        return "{} {}".format(super().__str__(), self.name)

    def __repr__(self):
        return str(self)

    @classmethod
    def from_csv(cls, row):
        """Create class from CSV row. Row is assumed to be a collection with
        index 0 and 1 being the coordinates of interest."""
        return cls(*row[0:3])


def Cities(filename):
    with open(filename, newline='') as fp:
        return frozenset(City.from_csv(row) for row in csv.reader(fp, delimiter=";"))
print(Cities("testfile.csv"))

This gives the errors: Traceback (most recent call last):

File "filepath/Test.py", line 35, in <module>
  print(Cities("testfile.csv"))

File "filepath/Test.py", line 34, in Cities
  return frozenset(City.from_csv(row) for row in csv.reader(fp, 
delimiter=";"))

File "filepath/Test.py", line 34, in <genexpr>
  return frozenset(City.from_csv(row) for row in csv.reader(fp, 
delimiter=";"))

File "filepath/Test.py", line 29, in from_csv
  return cls(*row[0:3])

File "filepath/Test.py", line 15, in __new__
  self = super().__new__(cls, float(x), float(y))

ValueError: could not convert string to float: 'ï»¿x'

Process finished with exit code 1

The printed lines according to your idea are:

b'\xef\xbb\xbfx;y;name\r\n'b'0;0;Duisburg\r\n'
b'455,56;120,87;Berlin\r\n'
b'218,86;235,59;Hamburg\r\n'
b'345,6;-366,75;Muenchen\r\n'
b'13,97;-55,24;Koeln\r\n'
b'135,12;-147,2;Frankfurt\r\n'
b'190,25;-13,17;Kassel\r\n'
b'297,02;-51,3;Erfurt\r\n

3 coloumns, the first two contain the x and y value respectively, the third contains the name of the city. 9 rows in total, the first of which is filled with headlines — twinks-, Jul 09 '20 at 14:44
Do you want to include name with the City class? Personal ignorance here, is expressing coordinates as a single complex number a normal thing? Inheriting from complex, int and other fundamental types can be a challenge. — tdelaney, Jul 09 '20 at 14:51
Including the city names would be nice, but its not mandatory by any means. When I found the code i have posted i was also a little dumbfounded by the complex number thing. i have never seen that before, but because i dont really have enough knowledge to find another solution, i decided to roll with it — twinks-, Jul 09 '20 at 14:54

score 0 · Answer 1 · answered Jul 09 '20 at 15:26

0

You can make City inherit from Datenpunkt and add whatever specializations you want there. Since complex numbers initialize themselves in the __new__ method, you have to implement one youself if your object construction parameters are different than complex. I decided to add city name plus allow coordinates as strings as an example of what you can do.

I also added a class method that knows the csv row format. You could argue that its a rather narrow specialization of City and that this factory should be an external function instead, but, hey, I did it anyway just to show the option.

This example runs and you and add it to your code as you see fit.

import csv

# Lösung mit Subclass von Complex - Jeder Datenpunkt (also jeder Ort) wird mit zwei Koordinaten gespeichert,
# der Einfachheit halber im komplexen Zahlenraum (dort hat jeder Punkt generell zwei Koordinaten)
class Datenpunkt(complex):
    x = property(lambda p: p.real)
    y = property(lambda p: p.imag)

class City(Datenpunkt):

    # this is a subclass of `complex` which sets itself up in `__new__`
    def __new__(cls, x, y, name):
        self = super().__new__(cls, float(x), float(y))
        self.name = name
        return self
        
    def __str__(self):
        return "{} {}".format(super().__str__(), self.name)

    def __repr__(self):
        return str(self)

    @classmethod
    def from_csv(cls, row):
        """Create class from CSV row. Row is assumed to be a collection with
        index 0 and 1 being the coordinates of interest."""
        return cls(*row[0:3])
        

def Cities(filename):
    with open(filename, newline='') as fp:
        return frozenset(City.from_csv(row) for row in csv.reader(fp, delimiter=";"))


# test
print(City.from_csv(["111", "222", "Far City"]))
open('testcities.csv', 'w').write("""\
111;222;Far City
4.55;66;Near City
""")

cities = Cities('testcities.csv')
print(cities)
for city in cities:
    print("{}: {}, {}".format(city.name, city.x, city.y))

answered Jul 09 '20 at 15:26

tdelaney

73,364
6
83
116

Thank you very much! However, i am still quite lost.. I got rid of ```City = Datenpunkt``` and the definition of the Cities function. Instead i added your code. I also defined ```filename``` to be corresponding to my file and added it to the three different sectors in which the ```Cities``` function is called. It gives me multiple errors inclidung "ValueError: could not convert string to float: 'ï»¿x' ". I am probably making some rookie mistakes again, i have never programmed anything outside the very basic stuff before. – twinks- Jul 09 '20 at 16:03
You could narrow the problem down by trying your csv files with my test code. Remove my tests and instead print each filename as you call `Cities(filename)`. Now you know the offending file and you could post it (or a small sample that also fails) in your original question. This may be a problem with the file's character encoding, so reading and printing it in binary mode to post here may help. (open(filename, 'rb').read())`. – tdelaney Jul 09 '20 at 16:13
Added the new code to the inital question. I re-created the same file again manually and also took care of some good old german "umlauts" that were existing. Still having the same issues – twinks- Jul 09 '20 at 16:26
The CSV file causing the problem would help. There seems to be 2 problems: 1) the CSV file encoding - you could see what `sys.getdefaultencoding()` says and compare that with that with the CSV encoding (could be that your default encoding is 'latin-1' but the CSV itself is utf-8 for instance). 2) it seems like its not "x;y;name" as assumed here. It is still critical to post what this CSV is. You say you fixed the umlats.... you could find a file with umlates, figure out a few umlatey lines and post their binary representation here. – tdelaney Jul 09 '20 at 16:56
Lets say that lines 3-6 in file "foo.csv" have non-ascii characters. You could do `for line in open("foo.csv", "rb").readlines()[3:7]: print(line)`. That would be a good bit of data. – tdelaney Jul 09 '20 at 16:58
default encoding according to PyCharm is utf-8, .csv encoding is also utf-8. I have added the printed lines to the original question. Even if I open the file with the editor, i get the clean data withouth any issues. – twinks- Jul 09 '20 at 17:35

TSP in Python: Importing a .csv-file in code for randomized data points

1 Answers1