1

I am trying to get filtered list of all Text and Python file, like below

from walkdir import filtered_walk, dir_paths, all_paths, file_paths
vdir=raw_input ("enter director :")

files = file_paths(filtered_walk(vdir, depth=0,included_files=['*.py', '*.txt']))

I want to:

  1. know the total number of files found in given directory

    I have tried options like : Number_of_files= len (files) or for n in files n=n+1 but all are failing as "files" is something called "generator" Object which I searched on python docs but couldn't make use of it

  2. I also want to find a string e.g. "import sys" in the list of files found in above and store the file names having my search string in new file called "found.txt"

Community
  • 1
  • 1
x0rcist
  • 23
  • 1
  • 5

4 Answers4

1

I believe this does what you want, if I misunderstood your specification, please let me know after you give this a test. I've hardcoded the directory searchdir, so you'll have to prompt for it.

import os

searchdir = r'C:\blabla'
searchstring = 'import sys'

def found_in_file(fname, searchstring):
    with open(fname) as infp:
        for line in infp:
            if searchstring in line:
                return True
        return False

with open('found.txt', 'w') as outfp:
    count = 0
    search_count = 0
    for root, dirs, files in os.walk(searchdir):
        for name in files:
            (base, ext) = os.path.splitext(name)
            if ext in ('.txt', '.py'):
                count += 1

            full_name = os.path.join(root, name)
            if found_in_file(full_name, searchstring):
               outfp.write(full_name + '\n')
               search_count += 1

print 'total number of files found %d' % count
print 'number of files with search string %d' % search_count

Using with to open the file will also close the file automatically for you later.

Levon
  • 138,105
  • 33
  • 200
  • 191
  • Brilliant. That is: I was looking for. Thanks. Now , let me understand it and see how I can add Regex to find all instances of search string. – x0rcist May 18 '12 at 12:56
  • @x0rcist if you have questions about any part of the code let me know. – Levon May 18 '12 at 13:11
  • @acid_crucifix your solution was non-functional and didn't even run. I suggested a fix to the OP to get your program to at least run, but your code still missed all of the .txt files. I waited for a bit to see if you would fix your code, and then provided my own answer .. there's no "stealing" here and your comment is inappropriate and offensive. – Levon May 18 '12 at 13:29
  • This does not use the library OP is using... but it works :) so no comment. – jadkik94 May 18 '12 at 15:07
  • @jadkik94 :-) You are right, I hadn't even noticed that. Then again, that didn't seem to be a requirement, and OP seems happy with the solution. – Levon May 18 '12 at 15:12
0

A python generator is a special kind of iterator. It yields one item after the other, without knowing in advance how much items there are. You only can know it at the end.

It should be ok, though, to do

n = 0
for item in files:
    n += 1
    do_something_with(items)
print "I had", n, "items."
glglgl
  • 89,107
  • 13
  • 149
  • 217
0

You can think of a generator (or generally, an iterator) as a list that gives you one item at a time. (NO, it is not a list). So, you cannot count how much items it will give you unless you go through them all, because you have to take them one by one. (This is just a basic idea, now you should be able to understand the docs, and I'm sure there are lots of questions here about them too).

Now, for your case, you used a not-so-wrong approach:

count = 0
for filename in files:
    count += 1

What you were doing wrong was taking f and incrementing, but f here is the filename! Incrementing makes no sense, and an Exception too.

Once you have these filenames, you have to open each individual file, read it, search for your string and return the filename.

def contains(filename, match):
    with open(filename, 'r') as f:
        for line in f:
            if f.find(match) != -1:
                return True
    return False

match_files = [] for filename in files: if contains(filename, "import sys"): match_file.append(filename) # or a one-liner: match_files = [f for f in files if contains(f, "import sys")]

Now, as an example of a generator (don't read this before you read the docs):

def matching(filenames):
    for filename in files:
        if contains(filename, "import sys"):
            # feed the names one by one, you are not storing them in a list
            yield filename
# usage:
for f in matching(files):
    do_something_with_the_files_that_match_without_storing_them_all_in_a_list()
jadkik94
  • 7,000
  • 2
  • 30
  • 39
-1

You should try os.walk

import os
dir = raw_input("Enter Dir:")
files = [file for path, dirname, filenames in os.walk(dir) for file in filenames if file[-3:] in [".py", ".txt"]]

nfiles = len(files)
print nfiles

For searching for a string in a file look at Search for string in txt file Python

Combining both these your code would be something like

import os
import mmap

dir = raw_input("Enter Dir:")
print "Directory %s" %(dir) 
search_str = "import sys" 
count = 0
search_count = 0
write_file = open("found.txt", "w")
for dirpath, dirnames, filenames in os.walk(dir):
    for file in filenames:
        if file.split(".")[-1] in ["py", "txt"]:
            count += 1
            print dirpath, file
            f = open(dirpath+"/"+file)
            #            print f.read()

            if search_str in f.read():
                search_count += 1
                write_file.write(dirpath+"/"+file)

write_file.close()
print "Number of files: %s" %(count)
print "Number of files containing string: %s" %(search_count)
Community
  • 1
  • 1
acid_crucifix
  • 362
  • 2
  • 12
  • -1 your `file[-3:]` should only compare/look for extensions of length 3, currently it will *not* find **.txt** (but it would find "txt") and the count will be off. It works fine for ".py" – Levon May 18 '12 at 11:02
  • It gives error for last line: print "Number of files containing string" % (search_count) TypeError: not all arguments converted during string formatting – x0rcist May 18 '12 at 11:20
  • @x0rcist that line is missing the format directive. It should read like this: `print "Number of files containing string: %d" %(search_count)` - note the **%d**. (For that matter, I am not sure why the line above uses `%s` rather than `%d` to display a count). The solution as is now will not find/count ".txt" files, you should test to be sure of its behavior. – Levon May 18 '12 at 11:23
  • @Levon you are right its not working for .txt files as highlighted by you earlier. I am trying to figure it out how to do it for .txt. If someone has better approach or advise regarding walkdir so please share. Thanks – x0rcist May 18 '12 at 11:43
  • @x0rcist If you unmark the question as answered, you'll get others to come back and look at your question and try to provide answers, otherwise everyone will think this problem has been solved. I've posted a solution too – Levon May 18 '12 at 12:36
  • Okay, this works. Please test it and let me know. Also id appreciate removing the downvotes. – acid_crucifix May 18 '12 at 13:14