Python reverting bigrams and trigrams

Question

I have a list of bigrams and trigrams:

string = 'do not be sad'

a_list: = ['do', 'not', 'do not', 'be', 'not be', 'do not be', 'sad', 'be sad', 'not be sad']

I was wondering if there is a function to reverse the bigram and trigram in a_list? I know I could join all the strings and remove duplicates, but that loses the structure of the sentence. I'm looking if someone has any tips so that the a_list can be reverted back to its original string.

Desired output would be:

b_list = ['do not be sad']

I don't follow the logic that you use to reach `b_list` from `a_list`. Can you elaborate? — gtlambert, Feb 18 '16 at 12:22

danidee · Answer 1 · 2016-02-18T20:34:14.233

1

Try this

string = 'do not be sad'
string = string.split()

a_list = ['do', 'not', 'do not', 'be', 'not be', 'do not be', 'sad', 'be sad', 'not be sad']

new = []

for a in string:
    for b in a_list:
        if a == b:
            new.append(b)

print([' '.join(new)])

Output

['do not be sad']

and we can make it into a nice one-liner

print([' '.join([b for a in string for b in a_list if a == b])])

EDIT: IN response to zondo's comment i decided to edit my answer, moreover i found this problem very interesting

a_list = ['do', 'not', 'do not', 'be', 'not be', 'do not be', 'sad', 'be sad', 'not be sad']
a_list = ['This', 'is', 'This is', 'my', 'is my', 'This is my', 'car', 'my car', 'is my car']
a_list = ['i', 'am', 'i am', 'a' , 'am a', 'i am a', 'boy', 'a boy', 'am a boy']

largest = max(a_list, key=len) # get the longest sub word in the list

# loop through and if all words of a sublist don't exist in the largest sub word then join them together
for elem in a_list:
    sp = elem.split()
    if all(i not in largest for i in sp):
        if a_list.index(elem) < a_list.index(largest):
            print([elem + ' ' + largest])
        else:
            print([largest + ' ' + elem])

i also created several test cases to test my solution, and they all passed

edited Feb 18 '16 at 20:34

answered Feb 18 '16 at 12:26

danidee

9,298
2
35
55

1. This doesn't work; `a` will be defined as each *letter* of the string, not each word. 2. If he wanted that, he could simply say `new = [string]` – zondo Feb 18 '16 at 12:27
also, if `a_list` is edited, it can't follow `string` – user47467 Feb 18 '16 at 12:31
i tested with a modified version of `a_list` and it comes out correct `a_list = ['be', 'do', 'never', 'do not', 'not be', 'not to be', 'something', 'do not be', 'sad', 'be sad', 'not', 'not be sad'] ` – danidee Feb 18 '16 at 12:38
@zondo, you are incorrect. `string` is re-defined as a list of words here: `string = string.split()`, so `a` will, in fact, be each word. – SiHa Feb 18 '16 at 12:38
I'm sorry; I didn't notice. I don't think `string` is an appropriate name for a list of strings, though. My second point is still valid. – zondo Feb 18 '16 at 14:26
i've edited my current answer, it works even when the list is not sorted too. – danidee Feb 18 '16 at 20:35

score 0 · Answer 2 · edited Feb 18 '16 at 23:22

0

Use a list comprehension:

a_sentence = [" ".join(word for word in a_list if len(word.split()) == 1)]
print(a_sentence)

# Output: ['do not be sad']

edited Feb 18 '16 at 23:22

Zizouz212

4,908
5
42
66

answered Feb 18 '16 at 12:21

zondo

19,901
8
44
83

This works because `a_list` is sorted. I think however that the OP wants an algorithm to repair the structure even if the list of *n*-grams is shuffled. – Willem Van Onsem Feb 18 '16 at 12:23
@WillemVanOnsem sorry, yes this is what i want. – user47467 Feb 18 '16 at 12:24
Well, I'll leave my answer there just in case it helps someone who, like me, misunderstood the question. – zondo Feb 18 '16 at 12:24
@zondo i'm sorry that 'I know I could join all the strings and remove duplicates, but that loses the structure of the sentence.' was not clear – user47467 Feb 18 '16 at 12:25

Python reverting bigrams and trigrams

2 Answers2