I'm new to python and so I apologize if this question has already been answered. I've used this script before and its worked so I'm not at all sure what is wrong.
I'm trying to transform a MALLET output document into a long list of topic, weight, value rather than a wide list of topics documents and weights.
Here's what the original csv I'm trying to convert looks like but there are 30 topics in it (its a text file called mb_composition.txt):
0 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Abizaid.txt 6.509147794508226E-6 1.8463345214533957E-5 3.301298069640119E-6 0.003825178550032757 0.15240841618294929 0.03903974304065183 0.10454783676528623 0.1316719812119471 1.8018057013225344E-5 4.869261713020613E-6 0.0956868156114931 1.3521101623203115E-5 9.514591058923748E-6 1.822741355900598E-5 4.932324961835634E-4 2.756817586271138E-4 4.039186874601744E-5 1.0503346606335033E-5 1.1466132458804392E-5 0.007003443189848799 6.7094360963952E-6 0.2651753488982284 0.011727025879070194 0.11306132549594633 4.463460490946615E-6 0.0032751230536005056 1.1887304822238514E-5 7.382714572306351E-6 3.538808652077042E-5 0.07158823129977483
1 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Jeffrey,%20Jim%20-%20Chk5-%20ASC%20-%20FINAL%20-%20Sept%202017.docx.txt 4.296636200313062E-6 1.218750594272488E-5 1.5556725986514498E-4 0.043172816021532695 0.04645757277949794 0.01963429696910822 0.1328206370818606 0.116826297071711 1.1893574776047563E-5 3.2141605637859693E-6 0.10242945223692496 0.010439315937573735 0.2478814493196687 1.2031769351093548E-5 0.010142417179693447 2.858721603853616E-5 2.6662348272204834E-5 6.9331747684835E-6 7.745091995495631E-4 0.04235638910274044 4.428844900369446E-6 0.0175105406405736 0.05314379308820005 0.11788631730736487 2.9462944350793084E-6 4.746133386282654E-4 7.846714475661223E-6 4.873270616886766E-6 0.008919869163605806 0.02884824479155971
And here's the python script I'm trying to use to convert it:
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
#outfile.write(fn[46:] + ",")
for i in range(0,59):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
I'm running this in the terminal with python reshape.py and I get this error:
Traceback (most recent call last):
File "reshape.py", line 12, in <module>
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
IndexError: list index out of range
Any idea what I'm doing wrong here? I can't seem to figure it out and am frustrated because I know Ive used this script many times before with success! If it helps I'm on Mac OSx with Python Version 2.7.10