0

I have a function that reads an excel file into a dataframe and then I save that dataframe in an s3 bucket using the awswrangler api to_csv function. The excel file has data starting from different rows and columns.

My conversion code looks something like this:

df = wr.s3.read_excel(
              path = 's3://bucket/filepath/', 
              sheet_name = 'sheetname', 
              header = 5, 
              index_col = False).iloc[:, 3:]

wr.s3.to_csv(df, path = "s3://bucket/filepath/', 
                 dataset = True, 
                 filename_prefix = 'sheetname')

The code works properly but I am having 2 issues:

  1. The csv outputs an additional index column even though I have specified index_col = False
  2. The file csv output name comes with additional serial numbers which I dont want. I just want something like sheetname.csv. I looked at the documentation and didn't find any good leads on how I can do that.

The current output looks something like sheetname8f1e8wefdf15f4wgfe5fef46we54f.csv

Murtaza Mohsin
  • 142
  • 1
  • 12

1 Answers1

1

You can specify the full path in the path parameter. You don't need to use filename_prefix at all. Using filename_prefix is why you are getting those extra characters in the filename.

Also, you can disable the index in the output by setting index = False.

wr.s3.to_csv(df, path = "s3://bucket/filepath/sheetname.csv", 
                 index = False)
Mark B
  • 183,023
  • 24
  • 297
  • 295
  • Okay, so this solved my first problem. – Murtaza Mohsin Aug 26 '22 at 19:38
  • As for the second issue I have it's creating a folder inside my s3 bucket and saving the CSV there in a serialized format. Since this code lies inside a for loop as excel file has multiple sheets that need processing. I did something like wr.s3.to_csv(df, path = f"s3://bucket/filepath/{sheetname}.csv" – Murtaza Mohsin Aug 26 '22 at 19:39
  • Please read my whole answer. Did you remove the `filename_prefix` setting like I show in my answer? – Mark B Aug 26 '22 at 19:39
  • yes I did remove the filename_prefix – Murtaza Mohsin Aug 26 '22 at 19:40
  • And it is still writing to S3 with `sheetname##############.csv` instead of `sheetname.csv`? – Mark B Aug 26 '22 at 19:45
  • Ah I see, the path will do that if you have `dataset = True`. Why do you have that setting? If you want a "normal" CSV file you should set `dataset = False`. – Mark B Aug 26 '22 at 19:46
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/247594/discussion-between-murtaza-mohsin-and-mark-b). – Murtaza Mohsin Aug 26 '22 at 19:52
  • Okay yes, that worked for me! I reads in the docs that I would need to set it as true in order to save it to the S3 bucket. Thats why I had enabled it. but now I know so I will keep this in mind going forward. Thank you! – Murtaza Mohsin Aug 26 '22 at 19:57