1

so im looking to write a function that will take input in the form:

123 1st street APT 32S or
320 Jumping Alien Road
555 Google Ave

and output in a dictionary / json all the information parsed from the inputted string

dictionary would look something like

output = {
   "streetNum" : "123",
   "roadName" : "1st",
   "suffix" : "street",
   "enders" : "APT", #or None /null
   "room" : "32S" #or None / null
}

Im trying to thing of the logic but the best I can come up with is something along the lines of address.split(' ') and then taking where the roadname, suffix, and streetname would typically be located in said string but obviously these things aren't always gonna be located in that order and when road names have spaces inside them that would break the function as well.

def addressParser(addressString):
    return {
       "streetNum" : addressString.split(' ')[0], #prob need regex help
       "roadName" : addressString.split(' ')[1],
       "suffix" : addressString.split(' ')[2],
       "enders" : addressString.split(' ')[3],
       "room" : addressString.split(' ')[4]
    }

Edit: found exactly what i needed here https://pypi.org/project/address/

  • You'll need to be more specific with what's in your dataset. Pick a small handful of examples that are representative of the data that you're working with. Include them here as an edit to your original question, along with the resulting dictionary output that you want. If this isn't feasible, then you'll need to perform data cleaning, which we can't immediately help with (as we don't have access to your dataset, nor do we know the exact format that you want). – BrokenBenchmark Apr 12 '22 at 05:28
  • updated to give a better showing of the data inputted and expected output, does this help? I think I got what you were asking. – user3684785 Apr 12 '22 at 05:34
  • Looks good, just two more questions: 1. Is the `or` part of the original input? 2. Are apartment numbers always denoted by APT? – BrokenBenchmark Apr 12 '22 at 05:37
  • Also, just to be clear, most answerers (including myself) are going to assume that those three entries capture the format of all or nearly all of your output -- if you have a whole bunch of P.O. boxes that you need to parse as well, make sure that's in the representative input as well. – BrokenBenchmark Apr 12 '22 at 05:38
  • No, could be STE, RM, anything that could be used as a valid address. Thats the tough one. Also the literal 'or' characters are not expected as input. – user3684785 Apr 12 '22 at 05:39
  • Got it. Are those indicators always in all-caps? – BrokenBenchmark Apr 12 '22 at 05:40
  • I did not think about the po boxes, I will include that in the input. I have no data to go off of right now as this a personal project that I plan to use in my local life. – user3684785 Apr 12 '22 at 05:40
  • I can parse the entire string to uppercase in the first line so capitalization isn't required one way or the other. – user3684785 Apr 12 '22 at 05:41
  • Are you getting this data from an external source? – BrokenBenchmark Apr 12 '22 at 05:42
  • Just did some googling on the PO Box situation and ill ignore that for now, doesn't seem immediately relevant. Thanks for the heads up though, definately something I hadn't thought of. – user3684785 Apr 12 '22 at 05:43
  • It will be inputted in a web form. – user3684785 Apr 12 '22 at 05:44
  • 1
    There are existing libraries for this. Are you looking for only street addresses using US conventions? – tripleee Apr 12 '22 at 05:45
  • Here's what I would do in that case: if you have a use case that's very small, define a bunch of rules for valid input, and reject those that don't conform to the input. For larger use cases, I'm sure there's also some sort of library for parsing addresses; if this is part of some larger application, I would leverage that rather than trying to reinvent the wheel. – BrokenBenchmark Apr 12 '22 at 05:46
  • Yes, just using US conventions. – user3684785 Apr 12 '22 at 05:46
  • https://pypi.org/project/usaddress/ – tripleee Apr 12 '22 at 05:47
  • Thanks @tripleee, found exactly what I needed lol. https://pypi.org/project/address/ – user3684785 Apr 12 '22 at 05:48

0 Answers0