Read .txt data using python

Question

I have a .txt file like this:

# 经纬度
x1 = 11.21  
x2 = 11.51

y1 = 27.84  
y2 = 10.08

time: 201510010000  
变量名: val1  
[1.1,1.2,1.3]  
变量名: va2    
[1.0,1.01,1.02]  

time: 201510010100  
变量名: val1  
[2.1,2.2,2.3]  
变量名: va2  
[2.01,2.02,2.03]

time: 2015020000  
变量名: val1  
[3.0,3.1,3.2]  
变量名: val2  
[3.01,3.02,3.03]

time: 2015020100  
变量名: val1  
[4.0,4.1,4.2]  
变量名: val2    
[401,4.02,4.03]

and, I hope to read it using python like this:

with open('text.txt','r',encoding='utf-8') as f:
    lines = f.readlines()
    for line in lines:
        print(line,)

This is what I have done, but I have no idea about the next step.

How can I reach it?

I'd personally export the data to a .csv or .asc file. Just a bunch of formatting parses. — tgikal, Jul 03 '18 at 12:40
The data structure is complex，so, i am afraid of exporting the data to a .csv dont work. — user10025959, Jul 03 '18 at 12:46
You have `a.txt` and open `text.txt` file to read?! Can you please clear your question? — Ersel Er, Jul 03 '18 at 13:23
thanks for pointing out the mistakes, the filename is text.txt in my local folder. — user10025959, Jul 03 '18 at 15:04

score 0 · Answer 1 · answered Jul 03 '18 at 13:01

0

I advice you to change .txt format and convert in .ini file or .csv. Anyway, you could use a dictionary.

dict = {}
file = open("file.txt")
text = file.readline()
i=0
for i in range (text.lenght):
   if text[i][0:5]=="time":
      dict[text[i]] = []
      dict[text[i]].append(text[i+2])
      dict[text[i]].append(text[i+4])

That code might work for your file, but if you change the format will be easier for you to store data in the dict. I hope I was helpful.

answered Jul 03 '18 at 13:01

cristian leonardi

3
3

Thanks for help. I have some questions. 1) weather it is better for line 3 using .readlines ; 2) what is the means of ' attribute 'lenght' in line 5, and it has a AttributeError: 'list' object has no attribute 'lenght' – user10025959 Jul 05 '18 at 03:14

iacob · Answer 2 · 2018-07-03T13:37:41.660

To get the data in the format you want, you could add the relevant parts to a dictionary and then convert it to a dataframe:

import ast
import pandas as pd

with open('text.txt','r', encoding='utf-8') as f:
    lines = f.readlines()
    d = {"time":[],
         "val1":[],
         "val2":[]}

    for i, line in enumerate(lines):
        if line[:5] == "time:":
            time = line.strip().split()[-1]

            #Reading string representations of lists as lists
            v1 = ast.literal_eval(lines[i+2].strip())
            v2 = ast.literal_eval(lines[i+4].strip())

            #Counting number of vals per date
            n1 = len(v1)
            n2 = len(v2)

            #Padding values if any are missing
            if n1 > n2:
                v2 += [None] * n1-n2
            elif n2 > n1:
                v1 += [None] * n2-n1

            d["time"].extend([time] * max(n1,n2))
            d["val1"].extend(v1)
            d["val2"].extend(v2)

df = pd.DataFrame(d)

print(df)

            time  val1    val2
0   201510010000   1.1    1.00
1   201510010000   1.2    1.01
2   201510010000   1.3    1.02
3   201510010100   2.1    2.01
4   201510010100   2.2    2.02
5   201510010100   2.3    2.03
6     2015020000   3.0    3.01
7     2015020000   3.1    3.02
8     2015020000   3.2    3.03
9     2015020100   4.0  401.00
10    2015020100   4.1    4.02
11    2015020100   4.2    4.03

Thanks for your help. Very beautiful and simple code. thands — user10025959, Jul 03 '18 at 15:02

score 0 · Accepted Answer · answered Jul 03 '18 at 13:50

I am learning python and this is what I came up with :) Someone who reads the solution and finds mistakes, please be kind to point out.

time = ""
val1 = []
val2 = []
final_list = []
process_val1 = False
process_val2 = False
with open('read.txt','r',encoding='utf-8') as f:
    lines = f.readlines()
    for line in lines:
        try:
            line = line.strip()
            if val1 and val2 and time != '':
                for v1, v2 in zip(val1, val2):
                    final_list.append([time, v1, v2])
                val1 = []
                val2 = []
                time = ''
                continue
            if process_val1 == True:
                val1 = line.split('[')[1].split(']')[0].split(',')
                process_val1 = False
                continue
            if process_val2 == True:
                val2 = line.split('[')[1].split(']')[0].split(',')
                process_val2 = False
                continue
            if 'time:' in line:
                time = line.split(": ")[1]
                continue
            elif 'val1' in line:
                process_val1 = True
                continue
            elif 'val2' in line:
                process_val2 = True
                continue
            elif 'va2' in line:
                process_val2 = True
                continue
            else:
                continue
        except:
            #handle exception here
            pass
    if final_list:
        with open('write.txt', 'w') as w:
            for list in final_list:
                w.write(", ".join(list) + '\n')

Thanks for help. a amazing method and works well. thanks again. — user10025959, Jul 05 '18 at 02:54
I have a question. the "try and except functions " is necessary？ Because the code works well when I remove the "try and except functions". — user10025959, Jul 05 '18 at 04:00
Thank you :). Try and except blocks are used to handle exceptions in the code. It might work well right now because I tried to create a code based on the example you provided. But I wasn't sure how it would react if the format of the data changes, hence added the try-except block. — javapyscript, Jul 05 '18 at 05:47

Tianpeng. Xia · Answer 4 · 2018-07-03T20:37:05.077

First, from your discription I assume x1, x2, y1 and y2 below "经纬度" do not mean anything to you.

Let's suppose that the data in the picture you showed us is all you want and that the original data is formatted as the example (eg. there are only two data columns, namely val1 and val2; val1 and val2 always have 3 values per timestamp; val2 always comes after val1), then the following solution should work:

import re

#define 4 patterns
p1=r'time:\s*(\d+)' # for time: 201510010000
p2=r'\[([\d\.]+),([\d\.]+),([\d\.]+)\]' # for [1.1,2.1,3.1]
v1p=u'变量名:\s*val1' # for val1
v2p=u'变量名:\s*val2' # for val2
inV1=False # the flag to show if next line is for val1
inV2=False # the flag to show if next line is for val1
time_column=''
csv_f=open('output.csv','w',encoding='utf-8') #open a csv file for writing
csv_f.write('time,val1,val2')

with open('text.txt','r',encoding='utf-8') as f:
    lines = f.readlines()
    for line in lines:
        m=re.match(p1,line)
        if m and time_column != m.groups()[0]:
            time_column = m.groups()[0]
            #reset the flags
            inV1=False
            inV2=False
            continue
        if re.match(v1p,line):
            inV1=True
            continue            
        if re.match(v2p,line):
            inV2=True
            continue            
        m=re.match(p2,line)
        if not m: continue
        if inV1:
            val1=m.groups()          
        if inV2: # we should ouput all the values for a timestamp since both val2 and val1 are ready now
            val2=m.groups()
            for i in range(0,3):
                l="{0},{1},{2}".format(time_column,val1[i],val2[i])
                csv_f.write("\n"+l)
    csv_f.close() #close the csv file

What the above code does is parse the given text and write the formatted output to a csv file named "output.csv" in the same folder as "text.txt". You can open it directly with MS Excel or any other spreedsheet editor or viewer.

I used regex here because it's most flexible and you could always modify the patterns to suit your needs without changing the remaining logics. Also using flags has the advantage of not being confused by possible duplicate lines in the text.

Should you have further requirements, please leave a comment.

Read .txt data using python

4 Answers4