Find sublists with common starting elements - python

Question

I have a nested list:

lists =[['a','b','c'],
        ['a','b','d'],
        ['a','b','e'],
        ['с','с','с','с']]

I need to find sublists with 2 or more common (2 or more occurrences) first elements, make a single string from this elements, and make a single string from sublists, which does not contain common first elements. Sublists can go in different order, so just checking next or previous element is the wrong way, I suppose. Desired output:

   [['a b','c'],
    ['a b','d'],
    ['a b','e'],
    ['с с с с']]

I tried some for loops, but with no success. I currently don't know, where to start, so any help would be appreciated. Thank you for your time!

@NishantNawarkhede Thanks for your reply! Updated the post, tried some for loops, but I don't think that's the right way at all. — Alex Nikitin, Jun 13 '18 at 12:21
@Vinny Thanks for your reply! Sublists can go in different order, so common if there are common elements in any sublist. — Alex Nikitin, Jun 13 '18 at 12:25
Is the `n` value fixed? The last line in your output seems to suggest that you can combine more than `n` substrings under some circumstances? — Vincent van der Weele, Jun 13 '18 at 12:29
How do you define what common is? 2 occurences is common? 3? 4? — Chen A., Jun 13 '18 at 12:30
@VincentvanderWeele Thanks for your reply! All first common elements should form a string. Sublists with no matching first elements just should become a big single string. — Alex Nikitin, Jun 13 '18 at 12:32

score 1 · Accepted Answer · answered Jun 13 '18 at 13:06

Probably not the most efficient way, but you could try something like this:

def foo(l,n):
    #Get all of the starting sequences
    first_n = [list(x) for x in set([tuple(x[:n]) for x in l])]

    #Figure out which of those starting sequences are duplicated
    duplicates = []
    for starting_sequence in first_n:
        if len([x for x in l if x[:n] == starting_sequence])>2:
            duplicates.append(starting_sequence)

    #make changes
    result = []
    for x in l:
        if x[:n] in duplicates:
            result.append([" ".join(x[:n])]+x[n:])
        else:
            result.append([" ".join(x)])

    return result

Set's have no repeats, but elements of sets must be hashable. Since lists are unhashable, that is why I have converted them into tuples and then back into lists.

Find sublists with common starting elements - python

1 Answers1