0

I have trouble parsing a txt file which contains lines like this:

50.0 0.1 [0.03, 0.05, 0.067, 1.003, ...]
50.0 0.134 [0.3465, 0.5476, 1.0, ....]
.
.
.

I don't need the beginning of each line, only the lists! The elements in the lists does not holds the same number of characters and they are separated by a space and a comma.

What I want to do is to ignore whatever is in front of each list and jump to (for example) the 9th element of the list, read the value and save it. Then go to the next line and do the same.

my approach:

Find a way to parse the line of txt, as a list and not as a string, so i can process the elements of the list.

or

manage to jump to the 9th value in the list and then read everything until the next value (10th in this case).

any ideas how to do this?

Mat
  • 202,337
  • 40
  • 393
  • 406
chrizz
  • 55
  • 1
  • 1
  • 5

5 Answers5

1

When you have each line in this form:

line = '50.0 0.1 [0.03, 0.05, 0.067, 1.003]\n'

First remove the unnecessary parts of the string. Find '[' and ']' and use slicing.

line[line.index('[')+1:line.index(']')]

Split the remaining string with the delimiter (now: ','). You get a list of strings.

line[line.index('[')+1:line.index(']')].split(',')

Take the n-th element and transform it with float() or eval().

float(line[line.index('[')+1:line.index(']')].split(',')[3])  

If you need more elements from the list then evaluate the string with '[' and ']' and you get a list. (Note: eval() is slow.)

eval('[0.03, 0.05, 0.067, 1.003]')  

The code will similar to this:

with open('datas.txt') as f:
    n = 8
    for line in f:
        a = float(line[line.index('[')+1:line.index(']')].split(',')[n])  
        do_something_with(a)
0

As your question is not well formed, I will try to answer in a broader way

  1. Read the file linearly.
  2. If the format of your data is uniform i.e. a. Square Bracket at the beginning and at the end. b. Number's separated by space

    Use strip to remove any trailing whitespace and or newline Slice it to remove the first and last character ex.

    instr="[0.03 0.05 0.067 1.003]"[1:-1]

    Use split() to split the string to a list of numbers.

  3. Index the List to access the 9th Element
  4. Save it or do what ever calculation you would want to do
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • sorry i mislead you... it's not just lists, each line has a beginning (which i don't need). I updated my question! – chrizz Apr 15 '12 at 17:24
0

If you need to read a file and extract 9th element of every line, you need to do something like this:

with open('your_file.txt') as in_file:

    my_list = [line.split()[9] for line in in_file]
Akavall
  • 82,592
  • 51
  • 207
  • 251
  • my problem changed a bit, now i have two numbers at the beginning of each line. Any idea how to strip each line, so i have just the lists? thx – chrizz Apr 15 '12 at 17:30
0

Assuming the text file structure is exactly as posted.

def openFile(file):
    "Usage: list = openFile(filename)"
    try:
        linesList = []
        inputFile = open(file, "r")
        tempList = inputFile.readlines()
        inputFile.close()
        for line in tempList:
            linesList.append(line.replace("[","").replace("]","").split())
        return linesList
    except:
        print("Could not open file!")

def saveFile(file, data, element):
    "Usage: saveFile('text.txt',myList,9)"
    outputFile = open(file, "w")
    for line in data:
        outputFile.write(line[element-1] + "\n")
    outputFile.close()



def main():
    myList = openFile("text.txt")
    #now you have a list of lists :D
    #you can do what ever you want with the data
    print(myList)
    saveFile("text2.txt",myList,2)

main()
Joshtech
  • 11
  • 3
0

To remove the characters up to the start of the list on a line, one method is to get a slice of the line beginning with the opening bracket character. This would look like the following:

line = line[line.index('['):]

You can then process the string using the split() or eval() function to convert it into a list and retrieve an element from that point.

line = line[line.index('['):].split()[9]
  • sounds like a good solution and it is almost working, just one little problem left: if i do it as you suggested, i still have the comma of the end of an element! How do i get rid of that comma? – chrizz Apr 16 '12 at 06:35