0

I have a list of lines of text: textlines which is a list of strings (ending with '\n').

I would like to remove multiple occurence of lines, excluding those that contains only spaces, line feeds and tabs.

In other words, if the original list is:

textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = "   \n"
textlines[3] = "First line\n"
textlines[4] = "   \n"

The output list would be:

textlines[0] = "First line\n"
textlines[1] = "Second line \n"
textlines[2] = "   \n"
textlines[3] = "   \n"

How to do that ?

Vincent
  • 57,703
  • 61
  • 205
  • 388

3 Answers3

3
seen = set()
res = []
for line in textlines:
    if line not in seen:
        res.append(line)
        if not line.strip():
            seen.add(line)
textlines = res
Hugh Bothwell
  • 55,315
  • 8
  • 84
  • 99
1

Because I can't resist a good code golfing:

seen = set()

[x for x in textlines if (x not in seen or not x.strip()) and not seen.add(x)]
Out[29]: ['First line\n', 'Second line \n', '   \n', '   \n']

This is equivalent to @hughbothwell's answer. Which you should use if you ever intend to have human beings read your code :-)

roippi
  • 25,533
  • 4
  • 48
  • 73
0
new = []
for line in textlines:
    if line in new and line.strip():
        continue
    new.append(line)
textlines = new
John1024
  • 109,961
  • 14
  • 137
  • 171