0

I'm getting errors using csvcut stating that the input is not UTF-8 encoded. according to the documentation https://csvkit.readthedocs.io/en/latest/scripts/csvcut.html there's a command line option [-e ENCODING] that doesn't appear to have any further documentation. What are the list of valid encoding options?

In my particular case, the input files are strictly one byte per character; csvcut is complaining about anything above 0x7f. The files have a smattering of, e.g. 0x92 (Right single quotation mark)

If i scrub the characters above 0x7f, it all works ; but it would seem to me there's a more elegant / straightforward solution.

adding (guessing!) the command line option -e ASCII didn't seem to do anything(?)

  • Yes, the documentation doesn't appear to formally define the valid values anywhere (assuming that such a list even exists). But there are a couple of simple things you can try: [1] Just [download cvskit from GitHub](https://github.com/wireservice/csvkit) and globally search the source for **encoding** yourself. [2] Perhaps try _"latin1"_ or _"latin-1"_ as your encoding value instead of ASCII? – skomisa May 05 '23 at 17:55
  • https://github.com/wireservice/csvkit says that `csvkit` (source) language is *Python 100.0%*. Moreover, there is a note in [Arguments common to all tools](https://csvkit.readthedocs.io/en/latest/common_arguments.html): *The `--encoding` option has no effect if reading from standard input. Set the `PYTHONIOENCODING` environment variable instead.* Hence, I guess that `--encoding` option accepts anything what `PYTHONIOENCODING` does? – JosefZ May 05 '23 at 20:14

1 Answers1

0

thanks @skomisa :yes, latin , latin1 , latin-1, and iso-8859-1 are all accepted by the -e command line option, all seem to do the same thing, and all allow csvcut to deal with the file.

Now I have another problem, but that's for another question!

Ed Beighe
  • 11
  • 3