0

I'm trying to load a set of public flat files (using COPY INTO from Python) - that apparently are saved in ANSI format. Some of the files load with no issue, but there is at least one case where the COPY INTO statement hangs (no error is returned, & nothing is logged, as far as I can tell). I isolated the error to a particular row with a non-standard character e.g., the ¢ character in the 2nd row -

O}10}49771}2020}02}202002}4977110}141077}71052900}R }}N}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}08}CWI STATE A}CENTENNIAL RESOURCE PROD, LLC}PHANTOM (WOLFCAMP)


O}10}50367}2020}01}202001}5036710}027348}73933500}R }}N}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}0}08}A¢C 34-197}APC WATER HOLDINGS 1, LLC}QUITO, WEST (DELAWARE)

Re-saving these rows into a file with UTF-8 encoding solves the issue, but I thought I'd pass this along in case someone wants to take a look at the back-end to handle these types of characters and/or return some kind of error.

C. Peck
  • 3,641
  • 3
  • 19
  • 36
tylerc
  • 1

1 Answers1

0

Why do you save into a file?

If it is possible, just play with strings internally from Python with:

resultstr= bytestr.encode("utf-8")
  • Hello, I am trying to load a source file - it is 9 GB in size (approximately 70 million rows). The COPY INTO statement is expecting a file - hence the need to adjust the encoding of the file. – tylerc May 18 '21 at 20:27