-3

I have a list of strings ending with numbers. Want to sort them in python and then compress them if a range is formed.

Eg input string :

ABC1/3, ABC1/1, ABC1/2, ABC2/3, ABC2/2, ABC2/1

Eg output string:

ABC1/1-3, ABC2/1-3

How should I approach this problem with python?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Vibhu
  • 61
  • 4
  • 1
    There are plenty of questions here on Stack Overflow covering the two parts of your problem: sorting, and collapsing into ranges. Did you try either yet? Where did you get stuck? – Martijn Pieters May 30 '14 at 08:40

3 Answers3

3

There's no need to use a dict for this problem. You can simply parse the tokens into a list and sort it. By default Python sorts a list of lists by the individual elements of each list. After sorting the list of token pairs, you only need to iterate once and record the important indices. Try this:

# Data is a comma separated list of name/number pairs.

data = 'ABC1/3, ABC1/1, ABC1/2, ABC2/3, ABC2/2, ABC2/1'

# Split data on ', ' and split each token on '/'.

tokens = [token.split('/') for token in data.split(', ')]

# Convert token number to integer.

for index in range(len(tokens)):
    tokens[index][1] = int(tokens[index][1])

# Sort pairs, automatically orders lists by items.

tokens.sort()

prev = 0     # Record index of previous pair's name.
indices = [] # List to record indices for output.

for index in range(1, len(tokens)):

    # If name matches with previous position.

    if tokens[index][0] == tokens[prev][0]:

        # Check whether number is increasing sequentially.

        if tokens[index][1] != (tokens[index - 1][1] + 1):

            # If non-sequential increase then record the indices.

            indices.append((prev, index - 1))
            prev = index

    else:

        # If name changes then record the indices.

        indices.append((prev, index - 1))
        prev = index

# After iterating the list, record the indices.

indices.append((prev, index))

# Print the ranges.

for start, end in indices:
    if start == end:
        args = (tokens[start][0], tokens[start][1])
        print '{0}/{1},'.format(*args),
    else:
        args = (tokens[start][0], tokens[start][1], tokens[end][1])
        print '{0}/{1}-{2},'.format(*args),

# Output:
# ABC1/1-3 ABC2/1-3
GrantJ
  • 8,162
  • 3
  • 52
  • 46
2

I wanted to speedhack this problem, so here is an almost complete solution for you, based on my own make_range_string and a stolen natsort.

import re
from collections import defaultdict

def sortkey_natural(s):
    return tuple(int(part) if re.match(r'[0-9]+$', part) else part
                for part in re.split(r'([0-9]+)', s))

def natsort(collection):
    return sorted(collection, key=sortkey_natural)

def make_range_string(collection):
    collection = sorted(collection)
    parts = []

    range_start = None
    previous = None

    def push_range(range_start, previous):
        if range_start is not None:
            if previous == range_start:
                parts.append(str(previous))
            else:
                parts.append("{}-{}".format(range_start, previous))

    for i in collection:
        if previous != i - 1:
            push_range(range_start, previous)
            range_start = i

        previous = i

    push_range(range_start, previous)
    return ', '.join(parts)

def make_ranges(strings):
    components = defaultdict(list)
    for i in strings:
        main, _, number = i.partition('/')
        components[main].append(int(number))

    rvlist = []
    for key in natsort(components):
        rvlist.append((key, make_range_string(components[key])))

    return rvlist

print(make_ranges(['ABC1/3', 'ABC1/1', 'ABC1/2', 'ABC2/5', 'ABC2/2', 'ABC2/1']))

The code prints a list of tuples:

[('ABC1', '1-3'), ('ABC2', '1-2, 5')]
Community
  • 1
  • 1
1

I would start by splitting the strings, and using the part that you want to match on as a dictionary key.

strings = ['ABC1/3', 'ABC1/1', 'ABC1/2', 'ABC2/3', 'ABC2/2', 'ABC2/1']
d = {}
for s in string:
    a, b = s.split('/')
    d.get(a, default=[]).append(b)

That collects the number parts into a list for each prefix. Then you can sort the lists and look for adjacent numbers to join.

otus
  • 5,572
  • 1
  • 34
  • 48