-1

The prompt:

Write a program that categorizes each mail message by which day of the week the commit was done. To do this look for lines that start with "from", then look for the third word and keep a runnning count of each of the days of the week. At the end of the program print out the contents of your dictionary (order does not matter).

The code in Python 3:

fname = input('enter file name:')
fhand = None
days = dict()

try:
    fhand = open(fname)
except:
    print(fname, 'is not a file thank you have a nice day and stop trying to ruin my program\n')
    exit()

for line in fhand:
    sline = line.split()
    if line.startswith('From'):
        print (sline)
        day = sline[2]
        if day not in days:
            days[day] = 1
        else:
            days[day] += 1
print(days)

The problem:

['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']
**['From:', 'stephen.marquard@uct.ac.za']**
Traceback (most recent call last):
  File "C:\Users\s_kestlert\Desktop\Programming\python\chap9.py", line 13, in <module>
    day = sline[2]
IndexError: list index out of range

The file: http://www.py4inf.com/code/mbox-short.txt

Why does the .split cut the line down to only [0] and [1]?

How can I circumvent this?

aaron
  • 39,695
  • 6
  • 46
  • 102
Thomas K.
  • 17
  • 2
  • Can you provide sample file content ? – Sandeep Lade Nov 03 '17 at 16:41
  • Are you sure this is your code as executed? Because the `print` seems to indicate `sline` has 7 elements and you index on the next line. – ryachza Nov 03 '17 at 16:43
  • 2
    It looks like you have a line consisting solely of `From: stephen.marquard@uct.ac.za`. When split, the resulting list contains two elements, so the valid indices are 0 and 1. You can't use index 2 on a list with only two elements. – Tom Karzes Nov 03 '17 at 16:44
  • 2
    "*Acting up*" is not really a technical term.. – Stefan Falk Nov 03 '17 at 16:51

3 Answers3

3

Looking at the file you linked, I think you need to change your line.startswith('From') to line.startswith('From ') (note the trailing space). The From: ... header lines are being matched (and only have 2 words), when I think you only want the From ... lines containing more information.

ryachza
  • 4,460
  • 18
  • 28
  • This was my last resort and 6 other people have looked at this. This is a giant relief thank you so much! – Thomas K. Nov 03 '17 at 16:53
  • @ThomasK. No problem. Keep in mind that your approach is still rather fragile (what if a line of a message body happens to start with `From `) but may be fine for your particular case. For something more robust, you might try separating the contents into the component messages (looks like 3 consecutive newlines followed by `From`, but that may not be sufficient for all inputs) and then taking the first line of each block. Also, not sure if anything in those messages is sensitive, but you may want to delete the file and remove the link. – ryachza Nov 03 '17 at 17:03
2

Your program is crashing on the line

From: stephen.marquard@uct.ac.za

that appears later (line 38), not on the first line in the file.

Check to make sure sline has enough elements before you try to grab the day field from it.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
0

For file file.txt

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
From: stephen.marquard@uct.ac.za

Your program outputs

enter file name:file.txt
['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008']
['From:', 'stephen.marquard@uct.ac.za']
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    day=sline[2]
IndexError: list index out of range

This is because there is no third word in the second line. You need to implement error control in your program.

nilo
  • 818
  • 8
  • 20
  • I recognize that my program reduces it to a list with only 2 words. I am wondering on the why because i dont know why it reduces it down. from ['From', 'stephen.marquard@uct.ac.za', 'Sat', 'Jan', '5', '09:14:16', '2008'] to ['From:', 'stephen.marquard@uct.ac.za'] during a print, and I do not wish for it to do this and am looking for why it does. – Thomas K. Nov 03 '17 at 16:50
  • You've got to have a second line in your test file. I am running your program. You see that it does not crash on the first line. – nilo Nov 03 '17 at 16:52
  • I now see that you have added you input file. As I mentioned in my answer, it is not the first line that makes your program crash, it is the next line that starts with From. – nilo Nov 03 '17 at 16:59