1

The reason I am opening up a new question here is because all answers I can find seem to be using code that runs on Windows.
Here is the situation...
I receive new files every month for work that I need to convert to UTF-8 from an ANSI encoding. I have enough files for the need for automation so I have resorted to a python script. Until recently, I was on Windows and everything worked fine. After switching to Mac, I realized that ANSI is a Windows only encoding type and now my script no longer works.
Question: Is there a way to convert ANSI encoded CSVs to UTF-8 encoded while using a Mac?

Here is the code that WAS working on my Windows machine.

import sys
import os

if len(sys.argv) != 2:
  print(f"Converts the contents of a folder to UTF-8 from ASCI.")
  print(f"USAGE: \n\
    python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
    If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
  sys.exit()

from_encoding = "ANSI"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"


try:
  os.mkdir(target_folder)
except FileExistsError:
  print("Target folder already exists.")
except:
  print("Error making directory!")

for root, dirs, files in os.walk(folder):
    for file in files:
        list_of_files.append(os.path.join(root,file))


for file in list_of_files:
  print(f"Converting {file}")

  original_path = file

  filename = file.split("\\")[-1].split(".")[0]
  extension = file.split("\\")[-1].split(".")[1]
  folder = "\\".join(original_path.split("\\")[0:-1])
  new_filename = filename + "." + extension
  new_path = os.path.join(target_folder, new_filename)

  f= open(original_path, 'r', encoding=from_encoding)
  content= f.read()
  f.close()
  f= open(new_path, 'w', encoding=to_encoding)
  f.write(content)
  f.close()

print(f"Finished converting {len(list_of_files)} files to {target_folder}")

It seems that no matter what approach I take, my Mac does not recognize the ANSI encoding type. Any help would be much appreciated. Thank you.

Edit 1: Reference Convert from ANSI to UTF-8
This question has two answers and neither work for me. Answer one, I get a utf8 error.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 25101: invalid continuation byte

Answer two, I believe the root cause is because I am on Mac and this OS does not understand mbcs encoding.

LookupError: unknown encoding: mbcs
  • Does this answer your question? [Convert from ANSI to UTF-8](https://stackoverflow.com/questions/31471087/convert-from-ansi-to-utf-8) – ddejohn Feb 01 '22 at 18:17
  • Unfortunately no, I edited my question to reflect my findings. – Christian Payne Feb 01 '22 at 18:29
  • `ansi` is Microsoft-specific misnomer (a result of `REG QUERY "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" -v ACP|find "ACP"` if run from an open `cmd` window) . I'd guess you could use `from_encoding = "cp1252"`. – JosefZ Feb 01 '22 at 19:01
  • That was it! Along with file path fixes, it works! I will edit this post with an answer for anyone else who comes across this problem! Thank you @JosefZ – Christian Payne Feb 01 '22 at 19:16

1 Answers1

1

I found an answer to this problem.
Changing the ANSI codec to cp1252 allowed my Mac to see which codec I was looking for. So that fixed the issue. One other issue I came across right after was the fact that Mac does file paths a bit different, using forward slashes instead of back slashes.
Further modifications to this script and I came up with a working version.

import sys
import os

if len(sys.argv) != 2:
  print(f"Converts the contents of a folder to UTF-8 from ASCI.")
  print(f"USAGE: \n\
    python ANSI_to_UTF8.py <Relative_Folder_Name> \n\
    If targeting a nested folder, make sure to use an escaped \\. ie: parent\\\\child")
  sys.exit()

from_encoding = "cp1252"
to_encoding = "UTF-8"
list_of_files = []
current_dir = os.getcwd()
folder = sys.argv[1]
suffix = "_utf8"
target_folder = folder + "_utf8"


try:
  os.mkdir(target_folder)
except FileExistsError:
  print("Target folder already exists.")
except:
  print("Error making directory!")

for root, dirs, files in os.walk(folder):
    for file in files:
        list_of_files.append(os.path.join(root,file))


for file in list_of_files:
  print(f"Converting {file}")

  original_path = file

  filename = file.split("/")[-1].split(".")[0]
  extension = file.split("/")[-1].split(".")[1]
  folder = "/".join(original_path.split("/")[0:-1])
  new_filename = filename + "." + extension
  new_path = os.path.join(target_folder, new_filename)

  f= open(original_path, 'r', encoding=from_encoding)
  content= f.read()
  f.close()
  f= open(new_path, 'w', encoding=to_encoding)
  f.write(content)
  f.close()

print(f"Finished converting {len(list_of_files)} files to {target_folder}")

There are only small changes but this version allows Mac to understand the encoding and to route correctly.
Thanks again to all who helped!