Split txt file by Year and ID and rename each new txt file as "Year_ID.txt"

Question

I have a bunch of txt files (comma separated) and I want to split the file into separate text files by using common group identifiers from Column 1(Year) and Column 3(ID). Also, I would like to save the new filenames as "Column1_Column3.txt".I do not want to keep any header for these files. I have tried many scripts/suggestions from other questions, but nothing seems to work. I am new to python and any suggestions would be very helpful. Thank you very much.

file format:

1.0,9.0,0.0,0.0,5.0,13.2,143.2,993.8529934630001,18.005554199200002,92.5999984741,0.0,0.0,159.882055791 1.0,9.0,0.0,1.0,5.0,13.3,142.8,992.4,19.0,91.5013544438,0.0,0.0,202.645072402 1.0,9.0,0.0,2.0,5.0,13.4,142.5,989.0,21.2,90.4027104135,0.0,0.0,235.39787781 1.0,9.0,0.0,3.0,5.0,13.5,142.2,986.5,22.7,89.3040663832,0.0,0.0,268.74681081200004 1.0,11.0,1.0,1.0,5.0,11.5,175.6,995.6,18.7,18.5200004578,0.0,0.0,680.61138846 1.0,11.0,1.0,5.0,5.0,12.2,174.1,988.9,23.4,18.5200004578,0.0,0.0,645.040646961 1.0,11.0,1.0,6.0,5.0,12.4,173.9,986.5,24.9,18.5200004578,0.0,0.0,654.7981628169999 1.0,9.0,2.0,4.0,5.0,10.7,146.8,986.0,23.2,68.3182237413,0.0,0.0,364.724300756 1.0,9.0,2.0,5.0,5.0,10.8,146.2,982.9,25.0,66.8777792189,0.0,0.0,317.156397048

So my output should be: File1:

1.0,9.0,0.0,0.0,5.0,13.2,143.2,993.8529934630001,18.005554199200002,92.5999984741,0.0,0.0,159.882055791 1.0,9.0,0.0,1.0,5.0,13.3,142.8,992.4,19.0,91.5013544438,0.0,0.0,202.645072402 1.0,9.0,0.0,2.0,5.0,13.4,142.5,989.0,21.2,90.4027104135,0.0,0.0,235.39787781

File2:

1.0,11.0,1.0,1.0,5.0,11.5,175.6,995.6,18.7,18.5200004578,0.0,0.0,680.61138846 1.0,11.0,1.0,5.0,5.0,12.2,174.1,988.9,23.4,18.5200004578,0.0,0.0,645.040646961 1.0,11.0,1.0,6.0,5.0,12.4,173.9,986.5,24.9,18.5200004578,0.0,0.0,654.7981628169999

File3:

1.0,9.0,2.0,4.0,5.0,10.7,146.8,986.0,23.2,68.3182237413,0.0,0.0,364.724300756 1.0,9.0,2.0,5.0,5.0,10.8,146.2,982.9,25.0,66.8777792189,0.0,0.0,317.156397048

score 1 · Answer 1 · answered Jul 22 '21 at 02:39

Assumptions:

All entries are uniform
Entries are housed in a 2d list
All entries have at least length 3 (to include both delimiting fields)

Slight concern:

In File1, is the second entry supposed to have '2055791 ' in front of it? This would mean that the list entries are not too uniform for what you want. If this is the case then I suggest scrubbing the data before hand or adding to this code so that it could ignore that.

#grab the full list
full_list = []

#grab every value of column 1
col_one_list = [a[0] for a in full_list]

#grab every value of column 3
col_three_list = [b[2] for b in full_list]


#sort by them
for i in col_one_list:
    for j in col_three_list:
        separate_list = []
        for entry in full_list:
            if (entry[0] == i and entry[2] == j):
                separate_list.append(entry)
        with open(str(i) + "_" +str(j)+".txt", "w" ) as file:
            for item in separate_list:
                file.write("%s\n" % item)

this should be sufficient.

Hi dperry5910, thanks a lot for your feedback. I will try out the script. 2055791 value is just a copy paste error in this post....the value actually belongs to row 1 in the file. So format is uniform. — Ivy, Jul 23 '21 at 01:40

Split txt file by Year and ID and rename each new txt file as "Year_ID.txt"

1 Answers1