1

I have a template .xlsm file with macros and a basic structure. I need to write inside the file some user object that I have in an array. Since there are many columns that aren't used (or don't have a value corresponding in the user class), I need to write only specific cells for each row.

I came out with a basic loop like this:

wb2 = openpyxl.load_workbook(XLSM_TEMPLATE_PATH, keep_vba=True)
ws2 = wb2['Datas']
row_num = 4

i = 0
for user in users:
    current_row = row_num + i
    ws2.cell(row=current_row, column=21).value = user.last_name
    ws2.cell(row=current_row, column=22).value = user.first_name
    ws2.cell(row=current_row, column=25).value = user.tax_code
    ws2.cell(row=current_row,
             column=30).value = user.residence_address.street_address + ', ' + user.residence_address.street_number
    ws2.cell(row=current_row, column=31).value = user.residence_address.city_name
    ws2.cell(row=current_row, column=34).value = user.email
    ws2.cell(row=current_row, column=38).value = user.date_of_birth
    ws2.cell(row=current_row, column=39).value = user.place_of_birth

    i += 1

exported_file_path = EXPORT_PATH.format(generate_random_code('.xlsm'))
wb2.save(exported_file_path)
wb2.close()

The problem is that looping for 3-4 users takes about 15 second of processing time, so the Amazon lambda function (that hosts the script) times out and fails... and also increasing the work time isn't a big deal since will increase the cost too much.

There is any solution to speed up the process?

Claudio
  • 123
  • 3
  • 15
  • what lambda function? – David Zemens Jul 08 '19 at 15:34
  • I forgot to write that isn't a lamba code, but an Amazon Lambda – Claudio Jul 08 '19 at 15:35
  • Did you profile your code somehow? E.g. do you know how much time is actually spent on opening the file, on updating the data, and on saving the file? On a lambda, results may be surprising. It's hard to optimize something you aren't measuring. – 9000 Jul 08 '19 at 15:53
  • Didn't try to check the time of each step... but the execution time is slow also locally, so I don't think is a lambda problem – Claudio Jul 08 '19 at 15:54
  • See if adding `write_only=True` helps, as explained [here](https://openpyxl.readthedocs.io/en/stable/optimized.html) – shahkalpesh Jul 09 '19 at 10:55
  • @shahkalpesh this doesn't work, because `write_only=True` cannot be used on `load_workbook` – Claudio Jul 09 '19 at 11:00
  • How big is the workbook? How long does it take to open locally? – Charlie Clark Jul 09 '19 at 11:54
  • @CharlieClark it has 4 sheets: 2 of them are really small (around 10 columns and 20 rows), 1 is the principal with 245 columns and 170 rows, and the last one has 20 columns with around 20-25 rows and only 1 of those columns has more than 17.000 cells... locally it take around 5 seconds only to read the file and other 5 seconds to save it – Claudio Jul 09 '19 at 13:58
  • So, close to how long it takes on the server when you add in the time to start a Python process. You check for the bottleneck using read-only mode though 245 x 170 is likely to be it but basically this is the wrong tool for a time-limited task. – Charlie Clark Jul 09 '19 at 14:11
  • @CharlieClark if I try to `load_workbook(XLSM_TEMPLATE_PATH, keep_vba=True, read_only=True)` I get around 1,1 ~ 1,3 seconds load time (locally) – Claudio Jul 09 '19 at 14:36
  • Yes, because that doesn't load any worksheets. You'd need to write your own code for that. See the performance benchmarking code for some sample. But this will just help you get an idea where any bottlenecks as as you cannot then modify the file. – Charlie Clark Jul 10 '19 at 08:45
  • @CharlieClark so the "custom code" what should do? I didn't understood it completely... – Claudio Jul 10 '19 at 10:40

1 Answers1

0

After different researches the only thing that can be done to improve the performance is to change the library with a more optimized one. I obtained better results with Pandas. It's still not "fast" but I got around -20% in computation time.

Claudio
  • 123
  • 3
  • 15