How can I delete lines containing only spaces in python?

Question

I'm developing a simple tool that allows to extract relevant data from HTML files and write them in TXT files. So far, I've achieved most of what I had in mind, but the final result is still unusable because there are (lots of) lines consisting of only white spaces that keep getting transcribed into the final TXT files. I'll attach a picture of how one of the TXTs is looking like as of right now: txt file with the HTML text

Ideally, I'd want all lines containing text to be consecutive. How do I ignore all the lines containing ONLY spaces (.i.e. containing no alphanumeric character) when reading the HTML file once I got rid of the etiquettes? (the spaces are the remainder after deleting everything in between "<" and ">" for the TXTs)

hint: `str.strip()` will strip whitespace from the string. You can then check if a string is an empty string. — Green Cloak Guy, Apr 10 '22 at 21:41
Loop over the lines and add `if not line.strip(): continue` or something similar. — Klaus D., Apr 10 '22 at 21:42

score 3 · Answer 1 · answered Apr 10 '22 at 21:43

3

You should post some code, for instance of how you write your TXT file. Anyway, if you use lines, you can simply have a condition:

if len(line.strip()) > 0:
     f.write(line)

answered Apr 10 '22 at 21:43

Giovanni Tardini

558
3
7

1

Works even without `len(` and `) > 0`. – Klaus D. Apr 10 '22 at 21:48

score 0 · Answer 2 · answered Apr 10 '22 at 21:47

Use str.strip to get rid of the spaces, then you can use filter to remove the (then empty) lines:

example = """
  
     


AAA
    
f
ffifljlsehfshogfse
        
   
 
    
hello
"""

def remove_blank_lines(s):
    lines = s.split("\n")
    lines = filter(None, lines)
    return "\n".join(lines)

# Or as a one-liner:
# remove_blank_lines = lambda s: "\n".join(filter(None, s.split("\n")))

print(remove_blank_lines(example))

score 0 · Answer 3 · answered Apr 10 '22 at 23:21

0

with open("<file_name>.txt") as f:
    data = list(filter(lambda x: x.strip(), f.readlines()))

answered Apr 10 '22 at 23:21

MoRe

2,296
2
3
23

How can I delete lines containing only spaces in python?

3 Answers3