0

I have 20+ excel files in Japanese language. Most excel files are Microsoft Excel 2007+ and few them are in Microsoft Excel OOXML file type. I would like to convert these files to csv and load in Snowflake, but prior to converting to csv, I was wondering if there is any library or pre-built function that I can use in python to determine which delimiter, escape character might be better for particular file ? Please also note few excel file contains multiple sheets.

Thanks in advance for your time and efforts!

biggboss2019
  • 220
  • 3
  • 8
  • 30

1 Answers1

0

I dont really know what you mean by "right delimiter", if you want to detect which one is used, there is a library called detect_delimiter, if YOU want to choose a new delimiter the best approach is probably to choose one that is less likely to be used inside the data (% for example) to avoid splitting the data the wrong way. You can always upload the data as a pandas dataframe and then reconvert it to a csv after exploring which way is the optimal in your case.

mrCopiCat
  • 899
  • 3
  • 15
  • By "right delimiter" I meant to say any way to find which delimiter might be suitable for excel files ? – biggboss2019 May 30 '22 at 18:55
  • As far as I know, all of them work for excel files, use the excel to pandas to csv (using pandas library) it is I guess easier to manipulate – mrCopiCat May 30 '22 at 20:46