Read a file, drop text fields, keep numeric ones

Question

I want to write a little python script to plot some .dat files. For that I need to process the files first. The .dat file looks like this:

(Real64
 (numDims 1)
 (size 513)
 (data 
   [ 90.0282291905089 90.94377050431068 92.31708247501335 93.38521400778211 94.60593575951782 95.67406729228657 97.04737926298925 97.96292057679104 ...]
 )
)

I want to delete the text parts and the 'normal' brackets. I just need the data in between [.....].

I tried something like this:

from Tkinter import Tk
from tkFileDialog import askopenfilename

# just a small GUI to get the file
Tk().withdraw()
filename = askopenfilename()

import numpy as np

with open(filename) as f:
    temp = f.readlines(5) #this is the line in the .dat file

    for i in range(len(temp)-1):
        if type(temp[i]) == str:
            del temp[i]

However, this always leads to an 'index of out bounds'. Help would be much appreciated.

Where are you getting this `.dat` file from? Could you possibly have whatever is generating it give you another format (like JSON)? If not, you could replace the spaces with commas and parse it as JSON possibly. — gen_Eric, Mar 17 '17 at 17:06
What do you mean *"delete the text parts"*? Be clear. Show us the expected output for your given input. Should `(size 513)` -> `(513)`, or `513` or deleted entirely? You can do all this using a regex, but you haven't specified for us what exactly you want done. — smci, Mar 17 '17 at 17:06
What are you trying to accomplish with that `del`? And why are you checking if strings are strings? — pvg, Mar 17 '17 at 17:08
Btw, '.DAT' file is not a well-defined term, other than to imply some text or binary file. You might as well say 'read a text file'. Is it valid JSON? etc. — smci, Mar 17 '17 at 17:11
this sort of looks like a lisp program... have you tried just running it in lisp? (it might not be valid lisp... i dunno its been a while since i messed with lisp) — Joran Beasley, Mar 17 '17 at 17:11

Joran Beasley · Answer 1 · 2017-03-17T18:03:30.683

print re.findall("\[([0-9. ]+)\]",f.read())

this is called a regular expression and it says find me all the stuff that is digits periods and spaces in between two square brackets

 \[   # literal left bracket
 ( # capture the stuff in here
 [0-9. ] # accept 0-9 and . and space
+ # at least one ... probably more
 ) # end capture group
 \] # literal close bracket

alternatively you could use something like pyparsing

inputdata = '''(Real64
 (numDims 1)
 (size 513)
 (data
   [ 90.0282291905089 90.94377050431068 92.31708247501335 93.38521400778211 94.60593575951782 95.67406729228657 97.04737926298925 97.96292057679104 ...]
 )
)
'''
from pyparsing import OneOrMore, nestedExpr

data = OneOrMore(nestedExpr()).parseString(inputdata)
print "GOT:", data[0][-1][2:-1]

litepresence · Accepted Answer · 2017-03-17T19:18:24.653

I just need the data in between [.....]

# treat the whole thing as a string
temp = '''(Real64
 (numDims 1)
 (size 513)
 (data
   [ 90.0282291905089 90.94377050431068 92.31708247501335 ]
 )
)'''

# split() at open bracket; take everything right
# then split() at close bracket; take everything left
# strip() trailing / leading white space
number_string = temp.split('[')[1].split(']')[0].strip()

# convert to list of floats, because I expect you'll need to
number_list = [float(i) for i in number_string.split(' ')]

print number_string
print number_list

>>> 90.0282291905089 90.94377050431068 92.31708247501335
>>> [90.0282291905089, 90.94377050431068, 92.31708247501335]

Read a file, drop text fields, keep numeric ones

2 Answers2