Remove multiple occurrence except for particular values?

Question

I have a list of lines of text: textlines which is a list of strings (ending with '\n').

I would like to remove multiple occurence of lines, excluding those that contains only spaces, line feeds and tabs.

In other words, if the original list is:

textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = "   \n"
textlines[3] = "First line\n"
textlines[4] = "   \n"

The output list would be:

textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = "   \n"
textlines[3] = "   \n"

How to do that ?

score 3 · Accepted Answer · answered Dec 08 '13 at 21:09

3

seen = set()
res = []
for line in textlines:
    if line not in seen:
        res.append(line)
        if not line.strip():
            seen.add(line)
textlines = res

answered Dec 08 '13 at 21:09

Hugh Bothwell

55,315
8
84
99

score 1 · Answer 2 · answered Dec 08 '13 at 21:34

Because I can't resist a good code golfing:

seen = set()

[x for x in textlines if (x not in seen or not x.strip()) and not seen.add(x)]
Out[29]: ['First line\n', 'Second line \n', '   \n', '   \n']

This is equivalent to @hughbothwell's answer. Which you should use if you ever intend to have human beings read your code :-)

score 0 · Answer 3 · answered Dec 08 '13 at 21:19

0

new = []
for line in textlines:
    if line in new and line.strip():
        continue
    new.append(line)
textlines = new

answered Dec 08 '13 at 21:19

John1024

109,961
14
137
171

Remove multiple occurrence except for particular values?

3 Answers3