The title is more completely: Convert tuple containing an OrderedDict with tagged parts to table with columns named from tagged parts (variable number of tagged parts and variable number of occurrences of tags).
I know more about address parsing than python which is probably the underlying source of the problem. How to do this might be obvious. The usaddress
library is intentionally returning results in this manner which is presumably useful.
I'm using usaddress
which "is a python library for parsing unstructured address strings into address components, using advanced NLP methods," and seems to work very well. Here is the usaddress
source and website.
So I run it on a file like:
2244 NE 29TH DR
1742 NW 57TH ST
1241 NE EAST DEVILS LAKE RD
4239 SW HWY 101, UNIT 19
1315 NE HARBOR RIDGE
4850 SE 51ST ST
1501 SE EAST DEVILS LAKE RD
1525 NE REGATTA WAY
6458 NE MAST AVE
4009 SW HWY 101
814 SW 9TH ST
1665 SALMON RIVER HWY
3500 NE WEST DEVILS LAKE RD, UNIT 18
1912 NE 56TH DR
3334 NE SURF AVE
2734 SW DUNE CT
2558 NE 33RD ST
2600 NE 33RD ST
5617 NW JETTY AVE
I want to convert those results into something more like a table (CSV or database eventually).
I was not sure what datatypes are returned. Reading the docs, tells me that the tag method returns a tuple containing an OrderedDict with tagged parts. The parse method seems to return a slightly different type. This question, helped me determine that it is a list and a tuple (apparently with tags). Searching for how to convert a python list with tagged parts to a table was unsuccessful.
Searching for how to convert a tuple containing an OrderedDict doesn't turn up much. This is the closest that I found. I also found that pandas is good at various formatting tasks, although it was not clear to me how to apply pandas to this. Many of the closest question I've found like the opposite question or one with named tuples have very low scores.
I also tried some exploratory attempts to see if it would just work (below). I was able to see a few ways to access the data and using zip from this Matrix Transpose question got a little closer to a table since the data and named tags are now separate, although not uniform. Is there a way to take these results in tagged lists or tuples containing an OrderedDict with tagged parts to a table? Is there a fairly direct way from the returned results?
Here is the parse method:
## Get a library
import usaddress
## Open the file with read only permmission
f = open('address_sample.txt')
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time
## until the file is empty
while line:
## Try the parse method
parsed = usaddress.parse(line)
## See what the parse results look like
zippy = [list(i) for i in zip(*parsed)]
print(zippy)
## read the next line
line = f.readline()
## close the file
f.close()
And the results produced (notice that when there are multiple parts to a tag it is repeated).
[['2244', 'NE', '29TH', 'DR'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['1742', 'NW', '57TH', 'ST'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['1241', 'NE', 'EAST', 'DEVILS', 'LAKE', 'RD'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetName', 'StreetName', 'StreetNamePostType']]
[['4239', 'SW', 'HWY', '101,', 'UNIT', '19'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetNamePreType', 'StreetName', 'OccupancyType', 'OccupancyIdentifier']]
[['1315', 'NE', 'HARBOR', 'RIDGE'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['4850', 'SE', '51ST', 'ST'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['1501', 'SE', 'EAST', 'DEVILS', 'LAKE', 'RD'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetName', 'StreetName', 'StreetNamePostType']]
[['1525', 'NE', 'REGATTA', 'WAY'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['6458', 'NE', 'MAST', 'AVE'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['4009', 'SW', 'HWY', '101'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetNamePreType', 'StreetName']]
[['814', 'SW', '9TH', 'ST'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['1665', 'SALMON', 'RIVER', 'HWY'], ['AddressNumber', 'StreetName', 'StreetName', 'StreetNamePostType']]
[['3500', 'NE', 'WEST', 'DEVILS', 'LAKE', 'RD,', 'UNIT', '18'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetName', 'StreetName', 'StreetNamePostType', 'OccupancyType', 'OccupancyIdentifier']]
[['1912', 'NE', '56TH', 'DR'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['3334', 'NE', 'SURF', 'AVE'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['2734', 'SW', 'DUNE', 'CT'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['2558', 'NE', '33RD', 'ST'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['2600', 'NE', '33RD', 'ST'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
[['5617', 'NW', 'JETTY', 'AVE'], ['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType']]
Here is the tag method:
## Get a library
import usaddress
## Open the file with read only permmission
f = open('address_sample.txt')
## Read the first line
line = f.readline()
## If the file is not empty keep reading line one at a time
## until the file is empty
while line:
## Try tag method
tagged = usaddress.tag(line)
## See what the tag results look like
items_ = list(tagged[0].items())
zippy2 = [list(i) for i in zip(*items_)]
print(zippy2)
## read the next line
line = f.readline()
## close the file
f.close()
produces the following output which better handles the combining of multiple parts with the same tag:
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['2244', 'NE', '29TH', 'DR']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['1742', 'NW', '57TH', 'ST']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['1241', 'NE', 'EAST DEVILS LAKE', 'RD']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetNamePreType', 'StreetName', 'OccupancyType', 'OccupancyIdentifier'], ['4239', 'SW', 'HWY', '101', 'UNIT', '19']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['1315', 'NE', 'HARBOR', 'RIDGE']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['4850', 'SE', '51ST', 'ST']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['1501', 'SE', 'EAST DEVILS LAKE', 'RD']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['1525', 'NE', 'REGATTA', 'WAY']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['6458', 'NE', 'MAST', 'AVE']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetNamePreType', 'StreetName'], ['4009', 'SW', 'HWY', '101']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['814', 'SW', '9TH', 'ST']]
[['AddressNumber', 'StreetName', 'StreetNamePostType'], ['1665', 'SALMON RIVER', 'HWY']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType', 'OccupancyType', 'OccupancyIdentifier'], ['3500', 'NE', 'WEST DEVILS LAKE', 'RD', 'UNIT', '18']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['1912', 'NE', '56TH', 'DR']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['3334', 'NE', 'SURF', 'AVE']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['2734', 'SW', 'DUNE', 'CT']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['2558', 'NE', '33RD', 'ST']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['2600', 'NE', '33RD', 'ST']]
[['AddressNumber', 'StreetNamePreDirectional', 'StreetName', 'StreetNamePostType'], ['5617', 'NW', 'JETTY', 'AVE']]