0

i write folder content (files wit .pdf .doc and .xls) in a small txt file. every filename get a new line in the txt file. Works fine. Now i want to remove all line with the .pdf files. I still use the following code to remove false entries (fail.png in this case):

def clean():
    with open("files.txt", "r") as f:
        lines = f.readlines()
        with open("files.txt", "w") as f:
            for line in lines:
                if line.strip("\n") != "fail.png":
                    f.write(line)

clean_folderlog()

Is it possible to use some sort of "wildcard" (*.pdf) instead of the specific file name? Or is there a complete other way to solve that?

Thanks a lot

0x01_PH
  • 126
  • 10
  • Why are you opening the file in read mode and immediately opening in write mode? Cant you do it just once? – Joe Ferndz Sep 02 '20 at 06:55
  • good question. the code growing this way. is it enough to open the file in write mode to read and write from/to the file? – 0x01_PH Sep 02 '20 at 07:04
  • In general you don't want to read and write at the same time. Why don't you do this filtering at the time of the initial write. I mean, instead of writing all filenames to the file and then removing some, just don't write the bad ones in the first place – Tomerikoo Sep 02 '20 at 07:18
  • After readlines() you can just close the file, i.e. just unindent the second `with` to top level. – alexis Sep 02 '20 at 07:21
  • thanks for all the advices! i am still learning and you all helped a lot with good ideas – 0x01_PH Sep 02 '20 at 07:26

3 Answers3

0

There are multiple options:

You could check wether the line contains the string '.pdf':

if not "pdf" in line.strip("\n")
    f.write(line)

You could also use a regular expression. This can be useful in other situations where you want to have a more complex pattern matching.

import re

with open("testdata.txt", "w") as f:
    for line in lines:
        line = line.strip()
        if not re.match(".+\.pdf$",line):
            f.write(line)
  • .+ matches any character
  • \. matches the literal dot
  • pdf matches the literal chars 'pdf'
  • $ matches at the end of the line

Whole code would look like this:

def clean():
    with open("files.txt", "r") as f:
        lines = f.readlines()
    with open("files.txt", "w") as f:
        for line in lines:
            if not "pdf" in line.strip("\n"):
                f.write(line)

clean_folderlog()

Also, I fixed the indentation, because the write-open doesn't have to be indented

andy meissner
  • 1,202
  • 5
  • 15
0

You have lots of options:

  • Check if the string ends with ".pdf":

      if not line.endswith(".pdf"):
    
  • Use the re module (most general pattern matching):

      import re
      ...
      if not re.match(r"\.pdf$", line):
    
  • Use the fnmatch module for shell-style pattern matching:

      from fnmatch import fnmatch
      ....
      if not fnmatch(line, "*.pdf"):
    
alexis
  • 48,685
  • 16
  • 101
  • 161
0

You can easily replace your two functions of writing folder content and removing unnecessary files with, for example, such code snippet, written below:

import os

extensions = ['.pdf', 'PUT_YOUR_OTHER_EXTENSIONS']

with open('test.txt', 'w') as f:
    for file_name in os.listdir('PUT_YOUR_FOLDER_PATH'):
            if os.path.isfile(file_name) and not file_name.endswith(tuple(extensions)):
                f.write("%s\n" % file_name)

It will write in a file all filenames of your folder. You just need to put in list extensions that you don't need. Enjoy!

Note: This works for one folder, that is mentioned in os.listdir() function. For writing all files from subfolders, use recursive walk.