6

I didn't quite know how to ask this question, or even search for the answer on Google, but I will write it out here. I have a sorted list of integers, that correspond to line numbers in a file. I would like to convert them into strings, but for the numbers that are sequential, I want the string to have the first number of the sequence, a dash, and then the last number. Here is an example:

line_nums = [ 1, 2, 3, 5, 7, 8, 9, 10 ]

I want to turn that list into:

[ '1-3', '5', '7', '8-10' ]

I wrote some code that works for the most part. On some sequences, it will put the same number in a string twice. On a recent execution of this code, the input was:

[ 10007, 10008, 10009, 10010, 10011, 10013, 10015, 10016, 10017, 10018, 10019 ]

But what I got back was:

[ '10007-10011', '10013-10013', '10015-10019' ]

Here is my code:

def get_line_numbers_concat(line_nums):
    seq = []
    final = []
    last = 0

    for index, val in enumerate(line_nums):

        if last + 1 == val or index == 0:
            seq.append(val)
            last = val
        else:
            final.append(str(seq[0]) + '-' + str(seq[len(seq)-1]))
            seq = []
            seq.append(val)
            last = val

        if index == len(line_nums) - 1:
            if len(seq) > 1:
                final.append(str(seq[0]) + '-' + str(seq[len(seq)-1]))
            else:
                final.append(str(seq[0]))

    final_str = ', '.join(map(str, final))
    return final_str
rgutierrez1014
  • 396
  • 4
  • 9
  • 3
    Related: http://stackoverflow.com/questions/15276156/python-return-lists-of-continuous-integers-from-list. Once you get the list-of-lists described in this post, it's fairly trivial to format it into hyphenated strings. – Kevin Apr 02 '15 at 17:23

5 Answers5

8

You're almost there except in the case when seq[0] is actually the same element as seq[len(seq)-1] which you then simplify to the case of len(seq)==1 or as shown below if len(seq) > 1 then you perform your normal processing, otherwise JUST add the first element.

def get_line_numbers_concat(line_nums):
    seq = []
    final = []
    last = 0

    for index, val in enumerate(line_nums):

        if last + 1 == val or index == 0:
            seq.append(val)
            last = val
        else:
            if len(seq) > 1:
               final.append(str(seq[0]) + '-' + str(seq[len(seq)-1]))
            else:
               final.append(str(seq[0]))
            seq = []
            seq.append(val)
            last = val

        if index == len(line_nums) - 1:
            if len(seq) > 1:
                final.append(str(seq[0]) + '-' + str(seq[len(seq)-1]))
            else:
                final.append(str(seq[0]))

    final_str = ', '.join(map(str, final))
    return final_str
nvuono
  • 3,323
  • 26
  • 27
2

You could probably rearrange the code a bit to not have to duplicate on the last case, but working with what's there:

Looking at the first if..else,

str(seq[len(seq)-1])) is going to equal str(seq[-1]) for a one-value sequence, which will be the same as str(seq[0]). I think that's giving you "10013-10013".

Try adding an if len(seq) > 1: above that one too and see if that doesn't work in terms of suppressing that. You might also need a similar if/else to what you have below to handle the one-number case.

geoelectric
  • 286
  • 1
  • 5
2

You can use an OrderedDict using the start of a new sequence as the key and appending values as you go if the last is equal to the current + 1 then join the first and last elements of the sublists if the are more than one element or else just add the single element:

from collections import OrderedDict

od = OrderedDict()

# create iterator
it = iter(l)

# get first element to use as starting key
key = next(it)

od[key] = [key]

# keep track of previous element
prev = key

for i in it:
    # if last element + 1 is equal to the current
    # add it to the current sequence
    if prev + 1 == i:
        od[key].append(i)
    else:
        # else start a new sequence adding key
        key = i
        od[key] = [i]
    # update prev 
    prev = i

# if a sublist had len > 1 we have a sequence so join first and last
# elements using str.format or else we just extract a single element 
print(["{}-{}".format(sub[0], sub[-1]) if len(sub) > 1 else str(sub[0]) for sub in od.values()])
['10007-10011', 10013, '10015-10019']

You can use key = l[0] then for i in l[1:] but slicing creates a new list so using iter allows us to get the first element using next which moves the pointer to the second element which allows us to extract the first element and just iterate over the rest without slicing.

In [7]: l = [1,2,3,4]
In [8]: it = iter(l)    
In [9]: next(it) # first element
Out[9]: 1    
In [10]: next(it) # second element ...
Out[10]: 2     
In [11]: next(it)
Out[11]: 3
In [12]: next(it)
Out[12]: 4

when you iterate over the iter object, it is the same as calling next so when we remove the first element with next we iterate over the remainder.

In [13]: l = [1,2,3,4]    
In [14]: it = iter(l)    
In [15]: key = next(it)   
In [16]: key
Out[16]: 1   
In [17]: for i in it:
   ....:     print(i)
   ....:     
2
3
4

You can also do it without a dict, setting a flag to True when we have at least two in a sequence:

key, out = next(it), []
prev, flag = key, False

for i in it:
    if prev + 1 == i:
        flag = True
    else:
        # if flag is set we have a sequence else just add the key
        out.append("{}-{}".format(key, prev) if flag else str(key))
        # reset flag
        flag = False
        key = i
    prev = i
# catch last element
out.append("{}-{}".format(key, prev) if flag else str(key))
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • 2
    Do you really need an explanation for a downvote on an answer with a `SyntaxError`? – Lukas Graf Apr 02 '15 at 17:35
  • @LukasGraf, so a downvote instead of pointing out a typo with a comment is the correct approach yes? – Padraic Cunningham Apr 02 '15 at 17:38
  • *testing your code* before you dump it as an answer (without any explanation whatsoever, initially) is the correct approach. – Lukas Graf Apr 02 '15 at 17:39
  • 2
    I did not downvote. Can you please include the line `it = iter(line_nums)` because it confused me what `it` was. – Shashank Apr 02 '15 at 17:40
  • @LukasGraf, the code did work, I had forgotten to cast the single element to a string and put the str around the wrong place by accident when editing. Also there is commentary, amazing how you saw the the typo not the introduction which is three lines long – Padraic Cunningham Apr 02 '15 at 17:41
  • 2
    @PadraicCunningham the initial answer you posted did not have a single character of commentary, and you do that *a lot*. Sometimes you add an explanation later, sometimes you don't. So excuse me if I vote on an answer according to its current state (code only and incorrect), and then move on with my life. If you don't like that, don't [FGITW](http://meta.stackexchange.com/questions/9731/fastest-gun-in-the-west-problem) it but wait to post your answer until it's in a decent state. – Lukas Graf Apr 02 '15 at 17:44
  • @LukasGraf, again you are wrong, the very first post had the lines at the top. And as far as explaining my code goes it is very rare you will find any code dump from me. You might also want to take your own advice http://stackoverflow.com/questions/29415915/join-a-list-of-tuples/29416056#29416056 – Padraic Cunningham Apr 02 '15 at 17:50
  • @PadraicCunningham I don't use `iter()` or `next()` very often, so can you tell me the purpose of the lines `it = iter(line_nums)` and `key = next(it)`? If you wanted the first element of the list, couldn't you just store `line_nums[0]`? (assuming the line from Shashank's comment) – rgutierrez1014 Apr 02 '15 at 18:00
  • @rgutierrez1014, You could do `key = l[0]` and `for i in l[1:]` but that creates a new list so the iterator is an efficient way of extracting the first element and iterating over the rest, calling `next(it)` gets the first element moving the pointer to the next element. – Padraic Cunningham Apr 02 '15 at 18:04
0

I would like to offer an alternative solution, which to me, looks much simpler and easier to work with.

It's because it looks exactly like a problem which could be very easily solved using left fold, which is exactly what reduce is in python (http://en.wikipedia.org/wiki/Fold_%28higher-order_function%29)

reduce(function, iterable[, initializer])

Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned. Roughly equivalent to:

Simply put, I would process iterable, which would be the line_nums one value at a time using provided function, which will decide if the value is part of already created sequences or not. That way I would end up with a list of lists representing consecutive number sequences. Then I would convert them to range (xx-yy) or just single value (xx) strings.

So my solution would look like this:

def make_sequences(sequences, val):
    if sequences != [] and sequences[-1][-1] == val - 1:
        return sequences[:-1] + [sequences[-1] + [val]]
    return sequences + [[val]]

def sequence_to_string(s):
    return '%s-%s' % (s[0], s[-1]) if len(s) > 1 else str(s[0])

def get_line_numbers_concat(line_nums):
    return ', '.join(
        sequence_to_string(seq)
        '%s-%s' % (seq[0], seq[-1])
        for seq in reduce(make_sequences, line_nums, [])
    )

The sequence_to_string(..) and get_line_numbers_concat(..) functions are pretty straightforward, so I'll just explain what happens inside make_sequences(..):

def make_sequences(sequences, val):

On the first call he sequences will be [] (as this was passed to reduce in get_line_numbers_concat(..)), on subsequent calls, this is where the resulting list of sequences will be build - the results of make_sequences(..) will be passed as sequences to subsequent calls of make_sequences(..). To make it clear, this is how it would get called using the original line_nums:

make_sequences([], 10007)
    ==> [[10007]]
make_sequences([[10007]], 10008)
    ==> [[10007, 10008]]
...
make_sequences([[10007, 10008, 10009, 10010, 10011]], 10013)
    ==> [[10007, 10008, 10009, 10010, 1011], [10013]]
...

Then we only have to decide, if the val belongs to the last sequence in sequences:

    if sequences != [] and sequences[-1][-1] == val - 1:      # (1)

This makes sure that sequences are not empty (otherwise we would get index error), and then we check if the last number in the last sequence in sequences (i.e. sequences[-1][-1] is equal to val - 1 and therefore that val should be appended to this last sequence.

This is done here:

        return sequences[:-1] + [sequences[-1] + [val]]

where we take all sequences except the last one (sequences[:-1]) and append to them a new sequence which is a result of appending val to the last sequence.

If however the condition (1) is not true - which means either there are no previous sequences (seqences == []) or the last number of the last sequence is not exactly one less than val. In that case we add a new sequence with only one value val:

    return sequences + [[val]]
Jan Spurny
  • 5,219
  • 1
  • 33
  • 47
0

I try to avoid:

  • Special handling in the beginning or end
  • Setting and checking flags
  • Repeating code (even single line)

Here's my solution:

#Split list into separate intervals
#i.e. [1,3,4,5,7] -> [[1], [3-5], [7]]
def split_list(lst):

  def is_linear(l):
    if len(l)<1: return False
    return sorted(l) == range(min(l), max(l)+1)

  assert isinstance(lst, list)

  lst.sort()

  n = 0
  sub = lst
  out = []
  while len(sub):
    # Search for linear chunk
    m = 0
    while is_linear(sub[:m+1]) and m+n<len(lst):
      m += 1

    out.append(sub[:m])

    # Advance forward - skip found chunk
    n += len(sub[:m])
    sub = lst[n:]

  return out
alexbk66
  • 362
  • 1
  • 3
  • 12