0

Given a list of urls, print out the top 3 frequent filenames.

url = [
        "http://www.google.com/a.txt",
        "http://www.google.com.tw/a.txt",
        "http://www.google.com/download/c.jpg",
        "http://www.google.co.jp/a.txt",
        "http://www.google.com/b.txt",
        "http://facebook.com/movie/b.txt",
        "http://yahoo.com/123/000/c.jpg",
        "http://gliacloud.com/haha.png",
    ]

The program should print out

a.txt 3  
b.txt 2  
c.jpg 2
  • Does this answer your question? [Python Lists - Finding Number of Times a String Occurs](https://stackoverflow.com/questions/11800755/python-lists-finding-number-of-times-a-string-occurs) – Brydenr Dec 04 '19 at 15:58
  • you should put what you have tried. we are here to help, not solve – hurnhu Dec 04 '19 at 16:00

3 Answers3

1

How about this with collections.Counterand top 3 with counter.most_common(3)?

import collections
url = [
        "http://www.google.com/a.txt",
        "http://www.google.com.tw/a.txt",
        "http://www.google.com/download/c.jpg",
        "http://www.google.co.jp/a.txt",
        "http://www.google.com/b.txt",
        "http://facebook.com/movie/b.txt",
        "http://yahoo.com/123/000/c.jpg",
        "http://gliacloud.com/haha.png",
    ]

splited_url = [i.split('/')[-1] for i in url]
counter = collections.Counter(splited_url)
counter = counter.most_common(3)
for p in counter:
    print('{} {}'.format(p[0], p[1]))

WORKING DEMO: https://rextester.com/EGJX25593

A l w a y s S u n n y
  • 36,497
  • 8
  • 60
  • 103
0

How about using re and collections which provides a Counter and most_common to extract your top n hits!

import re
from collections import Counter

pattern = re.compile(r"\w+\.\w+$")
Counter(re.findall(pattern,u)[0] for u in url).most_common(3) 

Output:

[('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]
Fourier
  • 2,795
  • 3
  • 25
  • 39
0

You can use Counter from collections:

from collections import Counter
res = [a.rsplit('/', 1)[-1] for a in url]
print (Counter(res))
#Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})

Output:

Counter({'a.txt': 3, 'c.jpg': 2, 'b.txt': 2, 'haha.png': 1})

Link:

https://docs.python.org/3.1/library/collections.html

UPDATE:

OP asked about top 3:


    import collections
    kk = [a.rsplit('/', 1)[-1] for a in url]
    print (collections.Counter(kk).most_common(3))
    # [('a.txt', 3), ('c.jpg', 2), ('b.txt', 2)]

PySaad
  • 1,052
  • 16
  • 26