So I've built a function that will look through all the xml files in a folder, and look for a node attribute (speaker name) and write to a row in a csv file. Note, at the moment, it appends them all to the same csv file, but I'm looking to get it to change up the file name after I've figured out the next step.
The next step that I was trying to do is to supply those speaker names from a list in a text file (I've also tried a csv file, and a list of dictionaries) and have the function applied to each of those speaker names individually.
I'm doing it with a function because I figured a for-loop iterating through a set of items within another for-loop iterating through a different set of items was kind of chancy, and a preliminary test I did with that, didn't prove that worry wrong.
When I paste in any of the items in this list individually as the argument in the function, it works. When I print the list after accessing through any of the ways I've tried, it works, I just can't seem to get the two to talk.
I've tried to apply the function to each of the items in the following way, but all it does is print out the error I gave to my except statement, and write in the header column in the csv (so I know it's at least accessing the function)
speaker_list = open("UAS_Speakers.csv","r").readlines()
for item in speaker_list:
look_for_speaker_in_files(item)
or
with open("speaking.txt","r") as f:
for x in f:
look_for_speaker_in_files(x)
for the heck of it, I even tried to open it as a list of dictionaries since the data already had curly brackets around it. No change.
speaker_list = open("speaking.py","r")
for x in speaker_list:
look_for_speaker_in_files(x)
I also, modeled on a script that I did that was taking urls from a list and performing a couple of urllib functions on them, tried this:
def main():
with open("speaking.py","r") as speaker_list:
for x in speaker_list:
look_for_speaker_in_files(x)
if __name__ == "__main__":
main()
I'm not sure if the issue is the whole list is being all fed into the function at once when I do any of these, but in case there's something wrong with the fucntion itself, preventing this from working, it's here:
def look_for_speaker_in_files(speakerAttrib):
c = csv.writer(open("allspeakers.csv","w"))
c.writerow(["Name", "Filename", "Text"])
for cr_file in glob.iglob('parsed/*.xml'):
try:
tree = etree.parse(cr_file)
for node in tree.iter('speaking'):
if node.attrib == speakerAttrib:
c.writerow([node.attrib, cr_file, node.text])
else:
continue
except:
print "bad string " + cr_file
continue
Any help on this would be greatly appreciated, otherwise I'll just be stuck sorting this out by hand from OpenRefine or copy and pasting from a spreadsheet by the hundreds, and the thought of that makes my eyeballs burn.
Sample list items:
{'name': 'Mr. BEGICH'}
{'name': 'The SPEAKER pro tempore (Mr. Miller of Florida)'}
{'name': 'The Acting CHAIR'}
{'name': 'Mr. McKINLEY'}
{'quote': 'true', 'speaker': 'recorder'}
{'name': 'Mr. WAXMAN'}
{'name': 'Mr. MORAN'}
{'name': 'Mr. McKEON'}
{'quote': 'true', 'speaker': 'The Acting CHAIR'}
{'name': 'Mr. RIGELL'}
{'name': 'Mr. SMITH of Washington'}
{'name': 'Mr. KILMER'}
{'name': 'Mr. LAMBORN'}
{'name': 'Mr. CLEAVER'}
{'name': 'Mr. MICA'}
{'name': 'Ms. SPEIER'}
{'name': 'Mrs. ELLMERS'}
Sample files are in this folder: https://drive.google.com/folderview?id=0B7lGA34vOZItREhRbmF6Z3YtTnM&usp=sharing