Python find last occurence in a file

Question

I have a file with different IP's.

192.168.11.2
192.1268.11.3
192.168.11.3
192.168.11.3
192.168.11.2
192.168.11.5

This is my code until now. Where I print the IP and the occurence, but how can I found out when the last occurennce was for each of the IP's. Is it a simple way to do so?

liste = []

dit = {}
file = open('ip.txt','r')

file = file.readlines()

for line in file:
        liste.append(line.strip())

for element in liste:
        if element in dit:
                dit[element] +=1
        else:
                dit[element] = 1

for key,value in dit.items():
        print "%s occurs %s times, last occurence at line"  %(key,value)

Output:

192.1268.11.3 occurs 1 times, last occurence at line
192.168.11.3 occurs 2 times, last occurence at line
192.168.11.2 occurs 2 times, last occurence at line
192.168.11.5 occurs 1 times, last occurence at line

konart · Accepted Answer · 2015-06-05T13:47:01.970

3

Try this:

liste = []

dit = {}
file = open('ip.txt','r')

file = file.readlines()

for line in file:
        liste.append(line.strip())

for i, element in enumerate(liste, 1):
        if element in dit:
                dit[element][0] += 1
                dit[element][1] =  i
        else:
                dit[element] = [1,i]

for key,value in dit.items():
        print "%s occurs %d times, last occurence at line %d" % (key, value[0], value[1])

edited Jun 05 '15 at 13:47

answered Jun 05 '15 at 13:10

konart

1,714
1
12
19

The dictionary values are (immutable) tuples, so you can't add to the first element – jonrsharpe Jun 05 '15 at 13:11
Could you please comment the edited part ? I dont know what enumerate do. – Jun 05 '15 at 13:17
@Iknowpython `enumerate()` returns enumerate object (just imagine list of tuples, each tuple has two values - element's index and the element itself). https://docs.python.org/2/library/functions.html#enumerate – konart Jun 05 '15 at 13:20
What changes would I have to make to for instance print the line for first occurence? @konart – Jun 05 '15 at 18:42
@Iknowpython Just don't increment the counter (comment `dit[element][1] = i`) – konart Jun 06 '15 at 15:19

Hai Vu · Answer 2 · 2015-06-05T13:52:08.813

2

Here is a solution:

from collections import Counter

with open('ip.txt') as input_file:
    lines = input_file.read().splitlines()

    # Find last occurrence, count
    last_line = dict((ip, line_number) for line_number, ip in enumerate(lines, 1))
    ip_count = Counter(lines)

    # Print the stat, sorted by last occurrence
    for ip in sorted(last_line, key=lambda k: last_line[k]):
        print '{} occurs {} times, last occurence at line {}'.format(
            ip, ip_count[ip], last_line[ip])

Discussion

I use the enumerate function to generate line number (starting at line 1)
With a sequence of (ip, line_number), it's easy to generate the dictionary last_line where the key is the IP address and the value is the last line it occurs
To count the number of occurences, I use the Counter class--very simple
If you want the report sorted by IP address, use sorted(last_line)
This solution has a performance implication: it scans the list of IPs twice: once to calculate last_line and once to calculate ip_count. That means this solution might not be ideal if the file is large

edited Jun 05 '15 at 13:52

answered Jun 05 '15 at 13:36

Hai Vu

37,849
11
66
93

why would you need to create a list of all the lines with `lines = input_file.read().splitlines()`? – Padraic Cunningham Jun 05 '15 at 16:12
As I stated in my post, the performance problem with my solution is I need to scan the list twice. I created the list so that I don't have to read the file twice. – Hai Vu Jun 05 '15 at 16:22
You could just `file.seek(0)` and avoid reading all the content into memory. – Padraic Cunningham Jun 05 '15 at 16:25
Yes, I can do that. I choose to read it once because (a) laziness, (b) processing in-memory list is faster than reading from file. As stated in my post, this solution is not ideal if the file is large; in which case, I will redesign my solution to read the file only once and not storing data in 3 separate structures: `lines`, `last_line`, and `ip_count`. – Hai Vu Jun 05 '15 at 17:36

PetMarion · Answer 3 · 2015-06-05T15:13:21.643

1

last_line_occurrence = {}
for element, line_number in zip(liste, range(1, len(liste)+1)):
     if element in dit:
            dit[element] +=1
     else:
            dit[element] = 1
     last_line_occurrence[element] = line_number

for key,value in dit.items():
     print "%s occurs %s times, last occurence at line %s"  %(key,value, last_line_occurrence[key])

edited Jun 05 '15 at 15:13

answered Jun 05 '15 at 13:22

PetMarion

147
3
14

1. Misspelled with 1 r: `last_line_occurence`. 2. You need `len(liste)+1`: `range(1, 5)` ==> `[1, 2, 3, 4]` which does not include 5. – Hai Vu Jun 05 '15 at 13:55

Padraic Cunningham · Answer 4 · 2015-06-05T16:29:08.103

This can easily be done in a single pass without reading all the file into memory:

from collections import defaultdict
d = defaultdict(lambda: {"ind":0,"count":0})

with open("in.txt") as f:
    for ind, line in enumerate(f,1):
        ip = line.rstrip()
        d[ip]["ind"] = ind
        d[ip]["count"]  += 1

for ip ,v in d.items():
    print("IP {}  appears {} time(s) and the last occurrence is at  line {}".format(ip,v["count"],v["ind"]))

Output:

IP 192.1268.11.3  appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3  appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.2  appears 2 time(s) and the last occurrence is at line 5
IP 192.168.11.5  appears 1 time(s) and the last occurrence is at line 6

If you want the order the ip's are first encountered use an OrderedDict:

from collections import OrderedDict
od = OrderedDict()
with open("in.txt") as f:
    for ind, line in enumerate(f,1):
        ip = line.rstrip()
        od.setdefault(ip, {"ind": 0,"count":0})
        od[ip]["ind"] = ind
        od[ip]["count"] += 1

for ip ,v in od.items():
    print("IP {}  appears {} time(s) and the last occurrence is at  line {}".format(ip,v["count"],v["ind"]))

Output:

IP 192.168.11.2  appears 2 time(s) and the last occurrence is at line 5
IP 192.1268.11.3  appears 1 time(s) and the last occurrence is at line 2
IP 192.168.11.3  appears 2 time(s) and the last occurrence is at line 4
IP 192.168.11.5  appears 1 time(s) and the last occurrence is at line 6

score 0 · Answer 5 · answered Jun 05 '15 at 13:09

You can use another dictionary. In this dictionary you store, for each line, the line number of the last occurrence and overwrite every time you find another occurrence. At the end, in this dictionary you will have, for each line, the line number of the last occurrence.

Obviously you will need to increment a counter for each read line in order to know the line you're reading right now.

Python find last occurence in a file

5 Answers5

Discussion