3

So I'm new to python and I'm trying to write a script that iterates through all .txt files in a directory, counts the number of lines in each one (with exception to lines that are blank or commented out), and writes the final output to a csv. The final output should look something like this:

agprices, avi, adp
132, 5, 8 

I'm having trouble with the syntax to save each count as the value of the dictionary. Here is my code below:

#!/usr/bin/env python

import csv
import copy
import os
import sys

#get current working dir, set count, and select file delimiter
d = os.getcwd()
count = 0
ext = '.txt'

#parses through files and saves to a dict
series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext] 
 #selects all files with .txt extension
for f in txt_files:
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

            if line.strip():                #Exclude blank lines
                continue
            else if line.startswith("#"):   #Exclude commented lines
                continue
            else
                count +=1
                #Need to save count as val in dict here

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
w = csv.DictWriter(f, series_dict.keys())
w.writeheader()
w.writerow(series_dict)

So here's the edit:

#!/usr/bin/env python

import csv
import copy
import os
import sys
import glob

#get current working dir, set count, and select file delimiter
os.chdir('/Users/Briana/Documents/Misc./PythonTest')

#parses through files and saves to a dict
series = {}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        series[fn] = (1 for line in f if line.strip() and not line.startswith('#')) 

print series

#save the dictionary with key/val pairs to a csv
with open('seriescount.csv', 'wb') as f: 
    w = csv.DictWriter(f, series.keys())
    sum(names.values())

I'm getting an indentation error on the 2nd to last line and am not quite sure why? Also, I'm not positive that I'm writing the syntax correctly on the last part. Again, I'm simply trying to return a dictionary with names of files and number of lines in files like {a: 132, b:245, c:13}

bvecc
  • 187
  • 4
  • 14

4 Answers4

8

You can try something along these lines:

os.chdir(ur_directory)
names={}
for fn in glob.glob('*.txt'):
    with open(fn) as f:
        names[fn]=sum(1 for line in f if line.strip() and not line.startswith('#'))    

print names     

That will print a dictionary similar to:

{'test_text.txt': 20, 'f1.txt': 3, 'lines.txt': 101, 'foo.txt': 6, 'dat.txt': 6, 'hello.txt': 1, 'f2.txt': 4, 'neglob.txt': 8, 'bar.txt': 6, 'test_reg.txt': 6, 'mission_sp.txt': 71, 'test_nums.txt': 8, 'test.txt': 7, '2591.txt': 8303} 

And you can use that Python dict in csv.DictWriter.

If you want the sum of those, just do:

sum(names.values())
dawg
  • 98,345
  • 23
  • 131
  • 206
  • So the last section then should look like this? 'with open('seriescount.csv', 'wb') as f: w = csv.DictWriter(f, series.keys()) sum(names.values())' – bvecc Jul 24 '15 at 17:52
  • Almost. It would be `with open('seriescount.csv', 'wb') as f: w = csv.DictWriter(f, series.keys()); w.writeheader(); w.writerow(names)` Read [this example](https://docs.python.org/2/library/csv.html#csv.DictWriter) – dawg Jul 24 '15 at 20:41
0

I think you should make two changes to your script:

  • Use glob.glob() to get the list of files matching your desired suffix
  • Use for line in file_obj to iterate through the lines

Other problem:

  • The indentation is wrong on your last few lines
Borealid
  • 95,191
  • 9
  • 106
  • 122
  • I know the indentation is wrong, but I'm still getting an indentation error when I try to indent the lines underneath the with open statement and I'm not sure why. How should I properly indent it? – bvecc Jul 24 '15 at 18:07
0

You could count your lines in your files with this 1-liner:

line_nums = sum(1 for line in open(f) if line.strip() and line[0] != '#')

that would shorten your code segment to

for f in txt_files:
    count += sum(1 for line in open(os.path.join(d,f)) 
                 if line[0] != '#' and line.strip())
Syntactic Fructose
  • 18,936
  • 23
  • 91
  • 177
0

I looks like you want to use a dictionary to keep track of the counts. You could create one a the top like this counts = {}

Then (once you fix your tests) you can update it for each non-comment line:

series_dict = {}
txt_files = [i for i in os.listdir(d) if os.path.splitext(i)[1] == ext]
#selects all files with .txt extension
for f in txt_files:
    counts[f] = 0 # create an entry in the dictionary to keep track of one file's lines 
    with open(os.path.join(d,f)) as file_obj:
        series_dict[f] = file_obj.read()

        if line.startswith("#"):   #Exclude commented lines
            continue
        elif line.strip():                #Exclude blank lines
            counts(f) += 1
ate50eggs
  • 444
  • 3
  • 14