The following code creates a function Which_Line_for_Position(pos) that gives the number of the line for the position pos, that is to say the number of line in which lies the character situated at position pos in the file.
This function can be used with any position as argument, independantly from the value of the file's pointer's current position and from the historic of the movements of this pointer before the function is called.
So, with this function, one isn't limited to determine the number of the current line only during an uninterrupted iteration on the lines, as it is the case with Greg Hewgill's solution.
with open(filepath,'rb') as f:
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
.
The same solution can be written with the help of the module fileinput:
import fileinput
GIVE_NO_FOR_END = {}
end = 0
for line in fileinput.input(filepath,'rb'):
end += len(line)
GIVE_NO_FOR_END[end] = fileinput.filelineno()
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = fileinput.filelineno()+1
fileinput.close()
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
But this solution has some inconveniences:
- it needs to import the module fileinput
- it deletes all the content of the file !! There must be something wrong in my code but I don't know fileinput enough to find it. Or is it a normal behaviour of fileinput.input() function ?
- it seems that the file is first entirely read before any iteration can be launched. If so, for a file very big, the size of the file may exceed the capacity of the RAM. I am not sure of this point: I tried to test with a file of 1,5 GB but it's rather long and I dropped this point for the moment. If this point is right, it constitutes an argument to use the other solution with enumerate()
.
exemple:
text = '''Harold Acton (1904–1994)
Gilbert Adair (born 1944)
Helen Adam (1909–1993)
Arthur Henry Adams (1872–1936)
Robert Adamson (1852–1902)
Fleur Adcock (born 1934)
Joseph Addison (1672–1719)
Mark Akenside (1721–1770)
James Alexander Allan (1889–1956)
Leslie Holdsworthy Allen (1879–1964)
William Allingham (1824/28-1889)
Kingsley Amis (1922–1995)
Ethel Anderson (1883–1958)
Bruce Andrews (born 1948)
Maya Angelou (born 1928)
Rae Armantrout (born 1947)
Simon Armitage (born 1963)
Matthew Arnold (1822–1888)
John Ashbery (born 1927)
Thomas Ashe (1836–1889)
Thea Astley (1925–2004)
Edwin Atherstone (1788–1872)'''
#with open('alao.txt','rb') as f:
f = text.splitlines(True)
# argument True in splitlines() makes the newlines kept
GIVE_NO_FOR_END = {}
end = 0
for i,line in enumerate(f):
end += len(line)
GIVE_NO_FOR_END[end] = i
if line[-1]=='\n':
GIVE_NO_FOR_END[end+1] = i+1
end_positions = GIVE_NO_FOR_END.keys()
end_positions.sort()
print '\n'.join('line %-3s ending at position %s' % (str(GIVE_NO_FOR_END[end]),str(end))
for end in end_positions)
def Which_Line_for_Position(pos,
dic = GIVE_NO_FOR_END,
keys = end_positions,
kmax = end_positions[-1]):
return dic[(k for k in keys if pos < k).next()] if pos<kmax else None
print
for x in (2,450,320,104,105,599,600):
print 'pos=%-6s line %s' % (x,Which_Line_for_Position(x))
result
line 0 ending at position 25
line 1 ending at position 51
line 2 ending at position 74
line 3 ending at position 105
line 4 ending at position 132
line 5 ending at position 157
line 6 ending at position 184
line 7 ending at position 210
line 8 ending at position 244
line 9 ending at position 281
line 10 ending at position 314
line 11 ending at position 340
line 12 ending at position 367
line 13 ending at position 393
line 14 ending at position 418
line 15 ending at position 445
line 16 ending at position 472
line 17 ending at position 499
line 18 ending at position 524
line 19 ending at position 548
line 20 ending at position 572
line 21 ending at position 600
pos=2 line 0
pos=450 line 16
pos=320 line 11
pos=104 line 3
pos=105 line 4
pos=599 line 21
pos=600 line None
.
Then, having function Which_Line_for_Position() , it is easy to obtain the number of a current line : just passing f.tell() as argument to the function
But WARNING: when using f.tell() and doing movements of the file's pointer in the file, it is absolutely necessary that the file is opened in binary mode: 'rb' or 'rb+' or 'ab' or ....