The first part is ok where you get the total words and print the result.
Where you fall down is here
words_par = 0
for words_par in lines:
if words_par.startswith("P1" or "P2" or "P3") & words_par.endswith("P1" or "P2" or "P3"):
words_par = line.split()
print len(words_par)
print words_par.replace('P1', '') #doesn't display it but still counts
else:
print 'No words'
The words_par is at first a string containing the line from the file. Under a condition which will never be meet, it is turned into a list with the
line.split()
expression. This, if the expression
words_par.startswith("P1" or "P2" or "P3") & words_par.endswith("P1" or "P2" or "P3")
were to ever return True, would always be splitting the last line in your file, due to the last time it was assigned to was in the first part of your program where you did a full count of the number of words in the file. That should really be
words_par.split()
Also
words_par.startswith("P1" or "P2" or "P3")
will always be
words_par.startswith("P1")
since
"P1" or "P2" or "P3"
always evaluates to the first one which is True, which is the first string in this case. Read http://docs.python.org/reference/expressions.html if you want to know more.
While we are at it, unless you are wanting to do bitwise comparisons avoid doing
something & something
instead do
something and something
The first will evaluate both expressions no matter what the result of the first, where as the second will only evaluate the second expression if the first is True. If you do this your code will operate a little more efficiently.
The
print len(words_par)
on the next line is always going to counting the number of characters in the line, since the if statement is always going to evaluate to False and the word_par never got split into a list of words.
Also the else clause on the for loop will always be executed no matter whether the sequence is empty or not. Have a look at http://docs.python.org/reference/compound_stmts.html#the-for-statement for more information.
I wrote a version of what I think you are after as a example according to what I think you want. I tried to keep it simple and avoid using things like list comprehension, since you say you are just starting to learn, so it is not optimal, but hopefully will be clear. Also note I made no comments, so feel free to hassle me to explain things for you.
words = None
with open('data.txt') as f:
words = f.read().split()
total_words = len(words)
print 'Total words:', total_words
in_para = False
para_count = 0
para_type = None
paragraph = list()
for word in words:
if ('P1' in word or
'P2' in word or
'P3' in word ):
if in_para == False:
in_para = True
para_type = word
else:
print 'Words in paragraph', para_type, ':', para_count
print ' '.join(paragraph)
para_count = 0
del paragraph[:]
para_type = word
else:
paragraph.append(word)
para_count += 1
else:
if in_para == True:
print 'Words in last paragraph', para_type, ':', para_count
print ' '.join(paragraph)
else:
print 'No words'
EDIT:
I actually just noticed some redundant code in the example. The variable para_count is not needed, since the words are being appended to the paragraph variable. So instead of
print 'Words in paragraph', para_type, ':', para_count
You could just do
print 'Words in paragraph', para_type, ':', len(paragraph)
One less variable to keep track of. Here is the corrected snippet.
in_para = False
para_type = None
paragraph = list()
for word in words:
if ('P1' in word or
'P2' in word or
'P3' in word ):
if in_para == False:
in_para = True
para_type = word
else:
print 'Words in paragraph', para_type, ':', len(paragraph)
print ' '.join(paragraph)
del paragraph[:]
para_type = word
else:
paragraph.append(word)
else:
if in_para == True:
print 'Words in last paragraph', para_type, ':', len(paragraph)
print ' '.join(paragraph)
else:
print 'No words'