finding string in .txt file and delete it

Question

i write folder content (files wit .pdf .doc and .xls) in a small txt file. every filename get a new line in the txt file. Works fine. Now i want to remove all line with the .pdf files. I still use the following code to remove false entries (fail.png in this case):

def clean():
    with open("files.txt", "r") as f:
        lines = f.readlines()
        with open("files.txt", "w") as f:
            for line in lines:
                if line.strip("\n") != "fail.png":
                    f.write(line)

clean_folderlog()

Is it possible to use some sort of "wildcard" (*.pdf) instead of the specific file name? Or is there a complete other way to solve that?

Thanks a lot

Why are you opening the file in read mode and immediately opening in write mode? Cant you do it just once? — Joe Ferndz, Sep 02 '20 at 06:55
good question. the code growing this way. is it enough to open the file in write mode to read and write from/to the file? — 0x01_PH, Sep 02 '20 at 07:04
In general you don't want to read and write at the same time. Why don't you do this filtering at the time of the initial write. I mean, instead of writing all filenames to the file and then removing some, just don't write the bad ones in the first place — Tomerikoo, Sep 02 '20 at 07:18
After readlines() you can just close the file, i.e. just unindent the second `with` to top level. — alexis, Sep 02 '20 at 07:21
thanks for all the advices! i am still learning and you all helped a lot with good ideas — 0x01_PH, Sep 02 '20 at 07:26

andy meissner · Accepted Answer · 2020-09-02T07:41:26.540

There are multiple options:

You could check wether the line contains the string '.pdf':

if not "pdf" in line.strip("\n")
    f.write(line)

You could also use a regular expression. This can be useful in other situations where you want to have a more complex pattern matching.

import re

with open("testdata.txt", "w") as f:
    for line in lines:
        line = line.strip()
        if not re.match(".+\.pdf$",line):
            f.write(line)

.+ matches any character
\. matches the literal dot
pdf matches the literal chars 'pdf'
$ matches at the end of the line

Whole code would look like this:

def clean():
    with open("files.txt", "r") as f:
        lines = f.readlines()
    with open("files.txt", "w") as f:
        for line in lines:
            if not "pdf" in line.strip("\n"):
                f.write(line)

clean_folderlog()

Also, I fixed the indentation, because the write-open doesn't have to be indented

does not work very well. program failed and i got an empty txt file. — 0x01_PH, Sep 02 '20 at 07:08

score 0 · Answer 2 · answered Sep 02 '20 at 07:26

You have lots of options:

Check if the string ends with ".pdf":
```
  if not line.endswith(".pdf"):
```

Use the re module (most general pattern matching):

  import re
  ...
  if not re.match(r"\.pdf$", line):

Use the fnmatch module for shell-style pattern matching:

  from fnmatch import fnmatch
  ....
  if not fnmatch(line, "*.pdf"):

Oleksii Komarov · Answer 3 · 2020-09-02T07:35:17.860

You can easily replace your two functions of writing folder content and removing unnecessary files with, for example, such code snippet, written below:

import os

extensions = ['.pdf', 'PUT_YOUR_OTHER_EXTENSIONS']

with open('test.txt', 'w') as f:
    for file_name in os.listdir('PUT_YOUR_FOLDER_PATH'):
            if os.path.isfile(file_name) and not file_name.endswith(tuple(extensions)):
                f.write("%s\n" % file_name)

It will write in a file all filenames of your folder. You just need to put in list extensions that you don't need. Enjoy!

Note: This works for one folder, that is mentioned in os.listdir() function. For writing all files from subfolders, use recursive walk.

finding string in .txt file and delete it

3 Answers3