Python script to count num lines in all files in directory

Question

So I'm new to python and I'm trying to write a script that iterates through all .txt files in a directory, counts the number of lines in each one (with exception to lines that are blank or commented out), and writes the final output to a csv. The final output should look something like this:

agprices, avi, adp
132, 5, 8

I'm having trouble with the syntax to save each count as the value of the dictionary. Here is my code below:

#!/usr/bin/env python

import csv
import copy
import os
import sys

#get current working dir, set count, and select file delimiter
d = os.getcwd()
count = 0
ext = '.txt'

#parses through files and saves to a dict
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext] 
 #selects all files with .txt extension
for f in txt_files:
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

            if line.strip():                #Exclude blank lines
                continue
            else if line.startswith("#"):   #Exclude commented lines
                continue
            else
                count +=1
                #Need to save count as val in dict here

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
w = csv.DictWriter(f, series_dict.keys())
w.writeheader()
w.writerow(series_dict)

So here's the edit:

#!/usr/bin/env python

import csv
import copy
import os
import sys
import glob

#get current working dir, set count, and select file delimiter
os.chdir('/Users/Briana/Documents/Misc./PythonTest')

#parses through files and saves to a dict
series = {}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        series[fn] = (1 for line in f if line.strip() and not line.startswith('#')) 

print series

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
    w = csv.DictWriter(f, series.keys())
    sum(names.values())

I'm getting an indentation error on the 2nd to last line and am not quite sure why? Also, I'm not positive that I'm writing the syntax correctly on the last part. Again, I'm simply trying to return a dictionary with names of files and number of lines in files like {a: 132, b:245, c:13}

`if line.strip(): continue` will skip lines that _aren't_ blank. Isn't that the opposite of what you want? — TigerhawkT3, Jul 24 '15 at 17:18
don't use `os.chdir`. The last line gives a `NameError`, and is not of much use either. — Daniel, May 31 '16 at 09:08

dawg · Accepted Answer · 2015-07-24T17:29:47.103

8

You can try something along these lines:

os.chdir(ur_directory)
names={}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))    

print names

That will print a dictionary similar to:

{'test_text.txt': 20, 'f1.txt': 3, 'lines.txt': 101, 'foo.txt': 6, 'dat.txt': 6, 'hello.txt': 1, 'f2.txt': 4, 'neglob.txt': 8, 'bar.txt': 6, 'test_reg.txt': 6, 'mission_sp.txt': 71, 'test_nums.txt': 8, 'test.txt': 7, '2591.txt': 8303}

And you can use that Python dict in csv.DictWriter.

If you want the sum of those, just do:

sum(names.values())

edited Jul 24 '15 at 17:29

answered Jul 24 '15 at 17:23

dawg

98,345
23
131
206

So the last section then should look like this? 'with open('seriescount.csv', 'wb') as f: w = csv.DictWriter(f, series.keys()) sum(names.values())' – bvecc Jul 24 '15 at 17:52
Almost. It would be `with open('seriescount.csv', 'wb') as f: w = csv.DictWriter(f, series.keys()); w.writeheader(); w.writerow(names)` Read [this example](https://docs.python.org/2/library/csv.html#csv.DictWriter) – dawg Jul 24 '15 at 20:41

score 0 · Answer 2 · answered Jul 24 '15 at 17:17

0

I think you should make two changes to your script:

Use glob.glob() to get the list of files matching your desired suffix
Use for line in file_obj to iterate through the lines

Other problem:

The indentation is wrong on your last few lines

answered Jul 24 '15 at 17:17

Borealid

95,191
9
106
122

I know the indentation is wrong, but I'm still getting an indentation error when I try to indent the lines underneath the with open statement and I'm not sure why. How should I properly indent it? – bvecc Jul 24 '15 at 18:07

score 0 · Answer 3 · answered Jul 24 '15 at 17:22

You could count your lines in your files with this 1-liner:

line_nums = sum(1 for line in open(f) if line.strip() and line[0] != '#')

that would shorten your code segment to

for f in txt_files:
    count += sum(1 for line in open(os.path.join(d,f)) 
                 if line[0] != '#' and line.strip())

score 0 · Answer 4 · answered Jul 24 '15 at 17:31

I looks like you want to use a dictionary to keep track of the counts. You could create one a the top like this counts = {}

Then (once you fix your tests) you can update it for each non-comment line:

series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
    counts[f] = 0 # create an entry in the dictionary to keep track of one file's lines 
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

        if line.startswith("#"):   #Exclude commented lines
            continue
        elif line.strip():                #Exclude blank lines
            counts(f) += 1

Python script to count num lines in all files in directory

4 Answers4

Linked