0

I have the following code:

import xlsxwriter
import numpy as np
import pandas as pd
import random
import uuid
LIMIT=1000
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
for row in range(0,LIMIT):
    worksheet.write_column(row, 0, uuid.uuid4())
    row += 1
workbook.close()

I get the following error:

TypeError: 'UUID' object is not iterable

I tried converting the UUID to a string using str() but that just gives me a single digit or letter in the excel sheet for some reason.

Munchkin
  • 857
  • 5
  • 24
  • 51

1 Answers1

1

The issue you have is that you are mixing 2 different functions (worksheet.write() and worksheet.write_column()).

If you want to use worksheet.write_column(), you need to have an iterable as data.
What the function does is to go through the iterable elements one by one and fill the column until it is done. With the code you currently have, your uuid.uuid4() is a string and therefore an iterable. So the function put one character per row. If you want a full UUID per row, you need to create a list of UUID. See below example.

LIMIT=1000
workbook = xlsxwriter.Workbook('test2.xlsx')
worksheet = workbook.add_worksheet()
uuid_list = [str(uuid.uuid4()) for _ in range(LIMIT)]
worksheet.write_column(0, 0, uuid_list)
workbook.close()

If the looping makes more sense to you as you programmed it like that, you need to use worksheet.write(). It will most probably be much slower but for 1000 strings or even much more you will probably not see the difference.

LIMIT=1000
workbook = xlsxwriter.Workbook('test.xlsx')
worksheet = workbook.add_worksheet()
for row in range(LIMIT):
    worksheet.write(row, 0, str(uuid.uuid4()))
    row += 1
workbook.close()

You can get more information directly in the xlsxwriter package documentation

PERFORMANCE:
To address the comment performance, this is what I found: For a milliion row:
worksheet.write_column(): 55s overall, 5s to create the million UUID, 50s to write the data.
worksheet.write(): 35s overall, I am quite surprised it is faster than write column...
df.to_csv(): It is much faster to create a Dataframe and write it to a CSV or an excel file after (my test says 7-8s, 5s to create the million UUID, 2s to write the dataframe to CSV)

Code to do it with a dataframe:

import pandas as pd
LIMIT = 1_000_000
uuid_list = [str(uuid.uuid4()) for _ in range(LIMIT)]
df = pd.DataFrame({"column1": uuid_list})
df.to_csv("test.csv")
Ssayan
  • 938
  • 5
  • 12
  • I am planning to generate data in the millions (the 1000 is only an example and to make it quicker). In that case I guess you would not recommend this approach then? Or maybe my approach is false from the start - maybe I should create a huge pandas DataFrame? Would that be faster? Any clue? – Munchkin Feb 17 '22 at 13:49
  • I updated my answer with performance in mind. There is likely a way to even improve on that if required but I would say panda is the way to go, especially if you know how to use them. – Ssayan Feb 17 '22 at 14:09
  • what does the `_` in `for _ in rage(LIMIT)` stand for? For some reason I get the limit amount of same UUID and then the same amount of random uuids which is "out of bounds" in the result – Munchkin Feb 17 '22 at 14:32
  • it is just a way to say that you are not using the variable, it is equivalent to `for i in range(LIMIT)`. Since I am not using `i` in this case, I use `_`. Can you share the piece of code that breaks? I get that you have a dimension issues but I don't see where. – Ssayan Feb 17 '22 at 15:00
  • I managed to solve the issue, but when I increased the limit to 100k, the script takes longer than 11mins which is above my pain tolerance and I'm running `miniconda` if that matters. – Munchkin Feb 17 '22 at 15:08
  • Can you share the full script in your question? I can try to help. I agree 11mins for 100k is really a lot, especially if you are aiming for a million :). Not use miniconda matters in that case but I don't know – Ssayan Feb 17 '22 at 15:15
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/242114/discussion-between-ssayan-and-munchkin). – Ssayan Feb 17 '22 at 15:25