0

Let me first start this off by saying I don't have any real experience with multithreading. This script that I wrote reads ~4,400 addresses from a text file and then cleans the address and geocodes it. My brother mentioned something about using multithreading to improve the speed of it. I read online that multithreading doesn't make much of a difference if you're just using a single text file. Would it work if I split the single text file into 2 text files? Anyways, i'd really appreciate it if someone could show me how to implement multithreading or multiprocessing to this script to increase the speed. If it's not possible, could you tell me why? Thanks!

from geopy.geocoders import Bing
from geopy.exc import GeocoderTimedOut
geolocator = Bing('vadrPcGdNLSX5bPNL7tw~ySbwhthllg7rNA4VSJ-O4g~Ag28cbu9Slxp5Sh_AsBDuQ9WypPuEhl9pHVPCAkiPf4A9FgCBf3l0KyQTKKsLCHw')
import tkinter as tk
from tkinter import filedialog

root = tk.Tk()
root.withdraw()


def cleanAddress(dirty):
    try:
        clean = geolocator.geocode(dirty)
        x = clean.address
        address, city, zipcode, country = x.split(",")
        address = address.lower()
        if 'first' in address:
            address = address.replace('first', '1st')
        elif 'second' in address:
            address = address.replace('second', '2nd')
        elif 'third' in address:
            address = address.replace('third', '3rd')
        elif 'fourth' in address:
            address = address.replace('fourth', '4th')
        elif 'fifth' in address:
            address = address.replace('fifth', '5th')
        elif 'sixth' in address:
            address = address.replace('ave', '')
            address = address.replace('avenue', '')
            address = address.replace('sixth', 'avenue of the americas')
        elif '6th' in address:
            address = address.replace('ave', '')
            address = address.replace('avenue', '')
            address = address.replace('6th', 'avenue of the americas')
        elif 'seventh' in address:
            address = address.replace('seventh', '7th')
        elif 'fashion' in address:
            address = address.replace('fashion', '7th')
        elif 'eighth' in address:
            address = address.replace('eighth', '8th')
        elif 'ninth' in address:
            address = address.replace('ninth', '9th')
        elif 'tenth' in address:
            address = address.replace('tenth', '10th')
        elif 'eleventh' in address:
            address = address.replace('eleventh', '11th')
        zipcode = zipcode[3:]
        print(address + ",", zipcode.lstrip() + ",", str(clean.latitude) + ",", str(clean.longitude))
    except AttributeError:
        print('Can not be cleaned')
    except ValueError:
        print('Can not be cleaned')
    except GeocoderTimedOut as e:
        print('Can not be cleaned')        


def main():
    root.update()
    fpath = filedialog.askopenfilename()
    f = open(fpath)
    for line in f:
        dirty = line + " nyc"
        cleanAddress(dirty)
    f.close()

if __name__ == '__main__':
    main()
Harrison
  • 5,095
  • 7
  • 40
  • 60

1 Answers1

0

Short answer is: no, you cannot.

Python multiprocessing library allows you to decrease time needed to do all calculations by distributing them over several processes. It can speed up whole run of your script, but only when there is a lot to calculate for CPU.

In your example most time takes connection to web services that run geo-location stuff for you, so total execution time depends rather on your or service internet connection speed rather that your computer overall.

Patryk Perduta
  • 366
  • 1
  • 2
  • 9
  • Thank you for the explanation. So this code will take roughly 45 minutes to fully execute on the text file, but that is when i'm using an iPhone as a wireless hotspot (work network currently isn't allowing Python to make outside connections with libraries, but it will be fixed soon). Do you have any idea how much faster this program would be on an average speed internet connection? – Harrison Jun 24 '16 at 12:47
  • Best "thank you" I can get is upvote of my answer and check question as answered. I cannot tell you tho how much faster this program would be on an average internet connection speed, because I have no clue what "average" speed is and if "remote geo-location services" can handle this better than they do now. – Patryk Perduta Jun 24 '16 at 12:51
  • This may be a stupid question, but would it work if I just split the text file into 2 parts and opened up 2 instances of Python and ran the program on each half at the same time? – Harrison Jun 24 '16 at 13:06
  • There are no stupid questions, while you are learning. This woudn't help, as I explained before it depends more on your computer and server internet connection rather than on "script quickness". Only optimalization I can find is to split work between several internet connections and geo-locating services. – Patryk Perduta Jun 24 '16 at 13:09