Python3 how to combine 2 text files line by line conditionally

Question

I have two ascii tables in text files containing information about stars, one of which with the headers

and the other has the headers

| ID | CLASS |

and I want to add the CLASS column to the first text file. The main problem here is that the first text file has got many rows for each star (I.E. Star 3_6588 has got 20 entries in table a for different times) whereas the second text file has only one entry for each ID (as Star 3_6588 is always a Class I).

What I need to do is to add the |CLASS| column to the first table where every instance of a specific ID has the required class. The text file has over 14 million rows in it which is why I can't just do this manually.

Show your primary code and tools that you are using. – Alireza HI May 28 '20 at 11:20 — Alireza HI, May 28 '20 at 11:20

score 0 · Answer 1 · answered May 28 '20 at 14:27

0

Sounds like you should use the csv module to read the ID|CLASS file into a dictionary, then iterate over the first file line-by-line, lookup CLASS using the ID value, and output the resulting "row" to a new file.

answered May 28 '20 at 14:27

Terry Spotts

3,527
1
8
21

Sam · Accepted Answer · 2020-05-29T15:53:28.173

@Terry Spotts has the right idea. However the leading and trailing | characters in the header line will make this a slightly tricky CSV, as the delimiter is a pipe char, but sometimes with a leading space, trailing space, or both. But here's an example to generate your ID: Class dictionary:

> cat bigfile.txt
| ID | TIME | MAGNITUDE | ERROR |
| Star 3_6588 | 10 | 2 | 1.02 |
| Star 3_6588 | 15 | 4 | 1.2 |
| Star 2_999 | 20 | 6 | 1.4 |
| Star 2_999 | 25 | 8 | 1.6 |

> cat smallfile.txt
| ID | CLASS |
| Star 3_6588 | CLASS I |

Code:

id2class = {}
with open('/tmp/smallfile.txt', 'r') as classfile:
    line = classfile.readline()        # Skip Header Line
    for line in classfile:
        line = line.rstrip('\n')[2:-2] # strip newline and the Pipe-Space / Space-Pipe and the start + end
        fields = line.split(' | ')     # Split on ' | '
        id = fields[0]
        starclass = fields[1]
        id2class[id] = starclass

Now you have a dict id2class that looks like:

{
    'Star 3_6588': 'CLASS I',
    'Star 2_999': 'CLASS II'
}

You can then parse the first file in a similar way, use the ID of each line to look up the Class in the dict, and write out the full data for the line to a new file. I'll leave that part to you :)

Happy Coding!

Glad to help, feel free to upvote, and have a great Friday! – Sam May 29 '20 at 15:54 — Sam, May 29 '20 at 15:54

Python3 how to combine 2 text files line by line conditionally

2 Answers2