Python Re-ordering the lines in a dat file by string

Question

Sorry if this is a repeat but I can't find it for now.

Basically I am opening and reading a dat file which contains a load of paths that I need to loop through to get certain information.

Each of the lines in the base.dat file contains m.somenumber. For example some lines in the file might be:

Volumes/hard_disc/u14_cut//u14m12.40_all.beta/beta8
Volumes/hard_disc/u14_cut/u14m12.50_all.beta/beta8
Volumes/hard_disc/u14_cut/u14m11.40_all.beta/beta8

I need to be able to re-write the dat file so that all the lines are re-ordered from the largest m.number to the smallest m.number. Then when I loop through PATH in database (shown in code) I am looping through in decreasing m.

Here is the relevant part of the code

base = open('base8.dat', 'r')
database= base.read().splitlines()
base.close()
counter=0
mu_list=np.array([])
delta_list=np.array([])
ofsset = 0.00136
beta=0


for PATH in database:
    if os.path.exists(str(PATH)+'/CHI/optimal_spectral_function_CHI.dat'):    

        n1_array = numpy.loadtxt(str(PATH)+'/AVERAGES/av-err.n.dat')
        n7_array= numpy.loadtxt(str(PATH)+'/AVERAGES/av-err.npx.dat')
        n1_mean = n1_array[0]
        delta=round(float(5.0+ofsset-(n1_array[0]*2.+4.*n7_array[0])),6)

        par = open(str(PATH)+"/params10", "r")

        for line in par:
            counter= counter+1
            if re.match("mu", line):
                mioMU= re.findall('\d+', line.translate(None, ';'))
                mioMU2=line.split()[2][:-1]
                mu=mioMU2
                print mu, delta, PATH

                mu_list=np.append(mu_list, mu)
                delta_list=np.append(delta_list,delta)

        optimal_counter=0

print delta_list, mu_list

I have checked the possible flagged repeat but I can't seem to get it to work for mine because my file doesn't technically contain strings and numbers. The 'number' I need to sort by is contained in the string as a whole:

Volumes/data_disc/u14_cut/from_met/u14m11.40_all.beta/beta16

and I need to sort the entire line by just the m(somenumber) part

Possible duplicate of [Python - Ordering Number Values in a List Containing Strings and Numbers](http://stackoverflow.com/questions/34288774/python-ordering-number-values-in-a-list-containing-strings-and-numbers) — Julien, Jul 21 '16 at 11:23
@HarshaBiyani The problem is all the lines in the file base.dat need to be ordered numerically in terms of a specific string in the line. If they aren't ordered then when I loop through the path the parameters mu and delta are disordered.I can't re-order them afterwards because I have to re-loop separately later on in my code through the paths for a further parameter and then the arrays do not match the correct values together. — Ciara Mackellar, Jul 21 '16 at 11:31
what they do in this post is a bit more complex, but you should be able to simplify it to what you need. — Julien, Jul 21 '16 at 11:36
@JulienBernu Thanks, I checked it out, this method would work if I could acess the m value separately - however the thing I need to sort by isn't a self contained element (or number) in a list its a continuous part of a larger string my dat file is filled with many lines ALL of the form: Volumes/hard_disc/u14_cut//u14m12.40_all.beta/beta8 so I have split these but I need to access just the m12.40 part of the string as this is the only things that diffres in each of the lines and then sort by that value wihilst keeping the line intact. — Ciara Mackellar, Jul 21 '16 at 11:55

albert · Accepted Answer · 2016-07-21T12:48:22.550

Assuming that the number part of your line has the form of a float you can use a regular expression to match that part and convert it from string to float.

After that you can use this information in order to sort all the lines read from your file. I added a invalid line in order to show how invalid data is handled.

As a quick example I would suggest something like this:

import re

# TODO: Read file and get list of lines

l = ['Volumes/hard_disc/u14_cut/u14**m12.40**_all.beta/beta8',
    'Volumes/hard_disc/u14_cut/u14**m12.50**_all.beta/beta8',
    'Volumes/hard_disc/u14_cut/u14**m11.40**_all.beta/beta8',
    'Volumes/hard_disc/u14_cut/u14**mm11.40**_all.beta/beta8']

regex = r'^.+\*{2}m{1}(?P<criterion>[0-9\.]*)\*{2}.+$'
p = re.compile(regex)

criterion_list = []

for s in l:
    m = p.match(s)
    if m:
        crit = m.group('criterion')
        try:
            crit = float(crit)
        except Exception as e:
            crit = 0
    else:
        crit = 0
    criterion_list.append(crit)


tuples_list = list(zip(criterion_list, l))
output = [element[1] for element in sorted(tuples_list, key=lambda t: t[0])]
print(output)

# TODO: Write output to new file or overwrite existing one.

Giving:

['Volumes/hard_disc/u14_cut/u14**mm11.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m11.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m12.40**_all.beta/beta8', 'Volumes/hard_disc/u14_cut/u14**m12.50**_all.beta/beta8']

This snippets starts after all lines are read from the file and stored into a list (list called l here). The regex group criterion catches the float part contained in **m12.50** as you can see on regex101. So iterating through all the lines gives you a new list containing all matching groups as floats. If the regex does not match on a given string or casting the group to a float fails, crit is set to zero in order to have those invalid lines at the very beginning of the sorted list later.

After that zip() is used to get a list of tules containing the extracted floats and the according string. Now you can sort this list of tuples based on the tuple's first element and write the according string to a new list output.

Thanks, this approach works perfectly! Great explanation on what each piece of the code is doing aswell. I would give you a 1up but I only just made this account, so I don't have the feedback. — Ciara Mackellar, Jul 21 '16 at 12:36
If this solved your quesion please tick the answer in order to mark your question as closed. — albert, Jul 21 '16 at 12:47

Python Re-ordering the lines in a dat file by string

1 Answers1