1

I have quite a messy txt file which I need to convert to a dataframe to use as reference data. An Excerpt is shown below:

http://amdc.in2p3.fr/nubase/nubase2016.txt

I've cleaned it up the best I can but to cut a long story short I would like to space delimit most of each line and then fixed delimit the last column. i.e. ignore the spaces in the last section.

Cleaned Data Text File

Can anyone point me in the right direction of a resource which can do this? Not sure if Pandas copes with this?

Kenny

P.S. I have found some great resources to clean up the multiple whitespaces and replace the line breaks. Sorry can't find the original reference, so see attached.

fin = open("Input.txt", "rt")
fout = open("Ouput.txt", "wt")

for line in fin:
      fout.write(re.sub(' +', ' ', line).strip() + "\n")
fin.close()
fout.close()
KennyMcK
  • 35
  • 4
  • Welcome to Stack Overflow. It is always recommended to include an [MWE](https://stackoverflow.com/help/minimal-reproducible-example). Also, if you have found the solution to your own question, you might as well post it as an answer. – winkmal Apr 15 '20 at 12:55
  • Hi @rotton, I am still looking for something which can do this. – KennyMcK Apr 15 '20 at 13:13
  • [Similar question](https://stackoverflow.com/questions/7111690/python-read-formatted-string) – winkmal Apr 24 '20 at 09:05

2 Answers2

0

So what i would do is very simple, i would clean up the data as much as possible and then convert it to a csv file, because they are easy to use. An then i would step by step load it into a pandas dataframe and change if it needed.

with open("NudatClean.txt") as f:
    text=f.readlines()

import csv
with open('dat.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    for i in text:
        l=i.split(' ')
        row=[]
        for a in l:
            if a!='':
                row.append(a)
        print(row)
        writer.writerow(row)

That should to the job for the beginning. But I don't know what you want remove exactly so I think the rest should be pretty clear.

SebNik
  • 880
  • 3
  • 10
  • 21
  • Hi @SebNik, I have stripped it down as much as I can, I've added a link to the cleaned file. I'd like to produce the csv file automatically but need to use the mixed delimiting i.e. space and fixed delimited, to get a sensible csv format. – KennyMcK Apr 15 '20 at 13:22
  • Okay i will see what i can do. – SebNik Apr 15 '20 at 14:58
  • Should be done now, if that answers your question glad I could help. – SebNik Apr 15 '20 at 15:21
  • 1
    Thanks, that seems to write a csv file but only delimits on the spaces. I've decided to split the file into two as the application will allow different tables. – KennyMcK Apr 16 '20 at 17:07
  • Okay sounds good, if your question is then answered please mark it at solved, so that it is clear for the system. – SebNik Apr 17 '20 at 08:06
0

The way I managed to do this was split the csv into two parts then recombine. Not particularly elegant but did the job I needed.

Split by Column

KennyMcK
  • 35
  • 4