You are not treating your data as numbers. You are instead passing the whole file content (a very large string) to nlargest()
, which can't do anything but give you the last characters in lexicographical order. The character '9'
sorts after the character '8'
in this case.
You need to a) read your input line by line, not as one big string, and b) convert your data to integers so they are compared by numeric values:
from heapq import nlargest
def as_numbers(it):
for line in it:
try:
yield int(line)
except ValueError:
# not a line with a number, skip
continue
with open('number.txt') as data:
five_largest = nlargest(5, as_numbers(data))
print(five_largest)
I've used a generator function to convert lines to integers here, because that'll make it easier to keep using heapq.nlargest()
(which is absolutely the right tool to use for this job, as it can efficiently keep the top-n values available in O(NlogK) time, so for a fixed K=5 items that's basically linear and only proportional to the number of integer values in the file). The generator function takes care of conversion to int()
, skipping any lines that can't be converted.
Note also the use of with
with the opened file object; at the end of the with
block the file is automatically closed for you, no need to call data.close()
explicitly here. This will even if there was an exception too!
Demo:
>>> from heapq import nlargest
>>> from io import StringIO
>>> random_data = '\n'.join([str(random.randrange(10**6)) for _ in range(10000)]) # 10k lines of random numbers between 0 and 1 million
>>> random_data.splitlines(True)[1042:1045] # a few sample lines
['39909\n', '15068\n', '420171\n']
>>> sample_file = StringIO(random_data) # make it a file object
>>> def as_numbers(it):
... for line in it:
... try:
... yield int(line)
... except ValueError:
... # not a line with a number, skip
... continue
...
>>> nlargest(5, as_numbers(sample_file))
[999873, 999713, 999638, 999595, 999566]