0

I am currently working on a project and I would like to have some ideas on how to optimize my python script. I can't really give you the explicit code because I don't have it for now, but I am interested in any suggestion you might have.

So the idea is to read the lines of a .txt file (around 80,000,000 lines). The purpose of my function is to return a matrix containing sums of the information of each line. Basically, each line of the .txt file contains a vertical group of 8 adjacent cells of the matrix indicated by the position (row,column) of the first cell.

For example, the first line of my .txt file is : 80 240 11011011 It means that the 8 cells that starts at the row 80 and column 240 will be incremented by the indicated value (1 for the 1st, 1 for the second, 0 for the third, ...)

So the main pattern of my code is the following one:

resultMatrix = np.zeros((length,width))
for line in myTxtFile:
    tempList = line.split(" ")
    row = tempList[0]
    column = tempList[1]
    value = ...# We convert the "11011011" into an np.array of size 8
    matrixResult[:,column][row:row+8] += value
return resultMatrix

This function takes actually 10 minutes for 60,000,000 lines. And I got to do it several times do you know How I could optimize it ?

Thank you very much for your help !

  • I think that's the best that you can do. One optimization that I can think of is the creation of the np.array of size 8. There are only 256 of such arrays, so you can create a mapping between all 256 possible bit string into the corresponding pre-initialized np.array objects. – justhalf May 17 '17 at 02:09
  • Thank you for that answer ! – Dany Am May 23 '17 at 08:45
  • Did it improve the timing? By how much? – justhalf May 23 '17 at 09:56
  • Actually I had already implemented that trick before looking to improve the timing so I am not really able to compare. But thanks anyway ;) – Dany Am May 24 '17 at 15:41

0 Answers0