-1

I am now studying Python, and I am trying to solve the following exercise:

Assuming there is a list of words in a text file, My goal is to print the longest N words in this list.

Where there are several important points:

  1. The print order does not matter
  2. Words that appear later in the file are given priority to be selected (when there are several words with the same length, i added an example for it)
  3. assume that each row in the file contains only one single word
  4. Is there a simple and easy solution for a short list of words, as opposed to a more complex solution for a situation where the list contains several thousand words?

I have attached an example of the starting code to a single word with a maximum length,

And an example of output for N = 4, for an explanation of my question.

Thanks for your advice,

word_list1 = open('WORDS.txt', 'r')

def find_longest_word(word_list):
    longest_word = ''
    for word in word_list:
        if len(word) > len(longest_word):
            longest_word = word
    print(longest_word)

find_longest_word(word_list1)


example(N=4):
WORDS.TXT
---------
Mother
Dad
Cat
Bicycle
House
Hat

The result will be (as i said before, print order dosen't matter):

Hat
House
Bicycle
Mother

thanks in advance!

jpp
  • 159,742
  • 34
  • 281
  • 339
Itay Av
  • 69
  • 3
  • 10
  • 1
    Possible duplicate of [Python: Finding Longest/Shortest Words In a List and Calling Them in a Function](https://stackoverflow.com/questions/26132770/python-finding-longest-shortest-words-in-a-list-and-calling-them-in-a-function) – Andras Deak -- Слава Україні Sep 09 '18 at 13:28
  • @AndrasDeak hi, When I wrote my question, I saw the question you were talking about. These are not the same questions, because in your question it is about finding the longest and shortest string, Ie the return of two values (maximum and minimum). Whereas in my question, I would like to print the longest N strings. That is, printing of N strings. This is the difference between the questions, thank you. – Itay Av Sep 10 '18 at 07:40

3 Answers3

3

One alternative is to use a heap to maintain the top-n elements:

import heapq
from operator import itemgetter


def top(lst, n=4):
    heap = [(0, i, '') for i in range(n)]
    heapq.heapify(heap)

    for i, word in enumerate(lst):
        item = (len(word), i, word)
        if item > heap[0]:
            heapq.heapreplace(heap, item)

    return list(map(itemgetter(2), heap))


words = ['Mother', 'Dad', 'Cat', 'Bicycle', 'House', 'Hat']

print(top(words))

Output

['Hat', 'House', 'Bicycle', 'Mother']

In the heap we keep items that correspond to length and position, so in case of ties the last one to appear gets selected.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
  • thank you for your reply and example! i was not familier with the "heap" function, can you explain me what is the time and space complexity of your function? Would you use this solution if it was a list of thousands of words? Thanks – Itay Av Sep 10 '18 at 08:44
1

sort the word_list based on length of the words and then based on a counter variable, so that words occurring later gets higher priority

>>> from itertools import count
>>> cnt = count()
>>> n = 4
>>> sorted(word_list, key=lambda word:(len(word), next(cnt)), reverse=True)[:n]
['Bicycle', 'Mother', 'House', 'Hat']
Sunitha
  • 11,777
  • 2
  • 20
  • 23
  • thank you very much for your reply! I wanted to find out where in the code you wrote me I should add the print command? In addition, I would be happy to understand, What is the time and space complexity of your function? Would you use this solution if it was a list of thousands of words? Thank you – Itay Av Sep 10 '18 at 08:38
1

You can use sorted with a custom tuple key and then list slicing.

from io import StringIO

x = StringIO("""Mother
Dad
Cat
Bicycle
House
Hat
Brother""")

def find_longest_word(word_list, n):
    idx, words = zip(*sorted(enumerate(word_list), key=lambda x: (-len(x[1]), -x[0]))[:n])
    return words

res = find_longest_word(map(str.strip, x.readlines()), 4)

print(*res, sep='\n')

# Brother
# Bicycle
# Mother
# House
jpp
  • 159,742
  • 34
  • 281
  • 339
  • thank you very much for your reply! can you explain me what is the time and space complexity of your function? Would you use this solution if it was a list of thousands of words? Thanks! – Itay Av Sep 10 '18 at 08:40
  • Should be O(*n* log *n*) since it requires sorting. For better time complexity, use @DanielMesejo's `heapq` solution. – jpp Sep 10 '18 at 08:42
  • Thanks! i am reading now about the "heapq" function, in order to understand what kind of space and time compexity it takes, thanks again! – Itay Av Sep 10 '18 at 08:47