I am currently working on a project and I would like to have some ideas on how to optimize my python script. I can't really give you the explicit code because I don't have it for now, but I am interested in any suggestion you might have.
So the idea is to read the lines of a .txt file (around 80,000,000 lines). The purpose of my function is to return a matrix containing sums of the information of each line. Basically, each line of the .txt file contains a vertical group of 8 adjacent cells of the matrix indicated by the position (row,column) of the first cell.
For example, the first line of my .txt file is : 80 240 11011011 It means that the 8 cells that starts at the row 80 and column 240 will be incremented by the indicated value (1 for the 1st, 1 for the second, 0 for the third, ...)
So the main pattern of my code is the following one:
resultMatrix = np.zeros((length,width))
for line in myTxtFile:
tempList = line.split(" ")
row = tempList[0]
column = tempList[1]
value = ...# We convert the "11011011" into an np.array of size 8
matrixResult[:,column][row:row+8] += value
return resultMatrix
This function takes actually 10 minutes for 60,000,000 lines. And I got to do it several times do you know How I could optimize it ?
Thank you very much for your help !