-1

I have two ascii tables in text files containing information about stars, one of which with the headers

| ID | TIME | MAGNITUDE | ERROR |

and the other has the headers

| ID | CLASS |

and I want to add the CLASS column to the first text file. The main problem here is that the first text file has got many rows for each star (I.E. Star 3_6588 has got 20 entries in table a for different times) whereas the second text file has only one entry for each ID (as Star 3_6588 is always a Class I).

What I need to do is to add the |CLASS| column to the first table where every instance of a specific ID has the required class. The text file has over 14 million rows in it which is why I can't just do this manually.

2 Answers2

0

Sounds like you should use the csv module to read the ID|CLASS file into a dictionary, then iterate over the first file line-by-line, lookup CLASS using the ID value, and output the resulting "row" to a new file.

Terry Spotts
  • 3,527
  • 1
  • 8
  • 21
0

@Terry Spotts has the right idea. However the leading and trailing | characters in the header line will make this a slightly tricky CSV, as the delimiter is a pipe char, but sometimes with a leading space, trailing space, or both. But here's an example to generate your ID: Class dictionary:

> cat bigfile.txt
| ID | TIME | MAGNITUDE | ERROR |
| Star 3_6588 | 10 | 2 | 1.02 |
| Star 3_6588 | 15 | 4 | 1.2 |
| Star 2_999 | 20 | 6 | 1.4 |
| Star 2_999 | 25 | 8 | 1.6 |

> cat smallfile.txt
| ID | CLASS |
| Star 3_6588 | CLASS I |

Code:

id2class = {}
with open('/tmp/smallfile.txt', 'r') as classfile:
    line = classfile.readline()        # Skip Header Line
    for line in classfile:
        line = line.rstrip('\n')[2:-2] # strip newline and the Pipe-Space / Space-Pipe and the start + end
        fields = line.split(' | ')     # Split on ' | '
        id = fields[0]
        starclass = fields[1]
        id2class[id] = starclass

Now you have a dict id2class that looks like:

{
    'Star 3_6588': 'CLASS I',
    'Star 2_999': 'CLASS II'
}

You can then parse the first file in a similar way, use the ID of each line to look up the Class in the dict, and write out the full data for the line to a new file. I'll leave that part to you :)

Happy Coding!

Sam
  • 1,406
  • 1
  • 8
  • 11