Given a list of urls, print out the top 3 frequent filenames

Question

Given a list of urls, print out the top 3 frequent filenames.

url = [
        "http://www.google.com/a.txt",
        "http://www.google.com.tw/a.txt",
        "http://www.google.com/download/c.jpg",
        "http://www.google.co.jp/a.txt",
        "http://www.google.com/b.txt",
        "http://facebook.com/movie/b.txt",
        "http://yahoo.com/123/000/c.jpg",
        "http://gliacloud.com/haha.png",
    ]

The program should print out

a.txt 3  
b.txt 2  
c.jpg 2

Does this answer your question? [Python Lists - Finding Number of Times a String Occurs](https://stackoverflow.com/questions/11800755/python-lists-finding-number-of-times-a-string-occurs) — Brydenr, Dec 04 '19 at 15:58
you should put what you have tried. we are here to help, not solve — hurnhu, Dec 04 '19 at 16:00

A l w a y s S u n n y · Accepted Answer · 2019-12-04T16:31:00.627

How about this with collections.Counterand top 3 with counter.most_common(3)?

import collections
url = [
        "http://www.google.com/a.txt",
        "http://www.google.com.tw/a.txt",
        "http://www.google.com/download/c.jpg",
        "http://www.google.co.jp/a.txt",
        "http://www.google.com/b.txt",
        "http://facebook.com/movie/b.txt",
        "http://yahoo.com/123/000/c.jpg",
        "http://gliacloud.com/haha.png",
    ]

splited_url = [i.split('/')[-1] for i in url]
counter = collections.Counter(splited_url)
counter = counter.most_common(3)
for p in counter:
    print('{} {}'.format(p[0], p[1]))

WORKING DEMO: https://rextester.com/EGJX25593

Fourier · Answer 2 · 2019-12-04T16:14:59.393

0

How about using re and collections which provides a Counter and most_common to extract your top n hits!

import re
from collections import Counter

pattern = re.compile(r"\w+\.\w+$")
Counter(re.findall(pattern,u)[0] for u in url).most_common(3)

Output:

[('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]

edited Dec 04 '19 at 16:14

answered Dec 04 '19 at 15:53

Fourier

2,795
3
25
39

PySaad · Answer 3 · 2019-12-04T16:23:06.393

0

You can use Counter from collections:

from collections import Counter
res = [a.rsplit('/', 1)[-1] for a in url]
print (Counter(res))
#Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})

Output:

Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})

Link:

https://docs.python.org/3.1/library/collections.html

UPDATE:

OP asked about top 3:


    import collections
    kk = [a.rsplit('/', 1)[-1] for a in url]
    print (collections.Counter(kk).most_common(3))
    # [('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]

edited Dec 04 '19 at 16:23

answered Dec 04 '19 at 15:57

PySaad

1,052
16
26

This will print all counts, but not the top n hits. – Fourier Dec 04 '19 at 16:17
1

@Fourier Thanks! I updated – PySaad Dec 04 '19 at 16:23

Given a list of urls, print out the top 3 frequent filenames

3 Answers3