3

I need to write my program output to a file. However, some of the fields already contain spaces, commas, semicolons, tabs. So I do not want to use spaces, tabs, commas as a field separator. The data are from the web so there are possibilities of server admins using wrong format.

I though of using any made up string, like my name. But this can be unprofessional if the output might be used by other researchers or so.

Are there any recommendations in this matter? What should I use if I am afraid to use commas, semicolons, tabs, spaces as separator?

EDIT: For those answers suggesting using json or csv module, please note that I want to load the file into a MySQL database. I just can specify to MySQL that fields are separated by [some separator]. I also need a simple solution.

user9371654
  • 2,160
  • 16
  • 45
  • 78

3 Answers3

5

Use commas (or tabs), but use a proper serializer that knows how to escape characters on write, and unescape them on read. The csv module knows how to do this and seems to match your likely requirements.

Yes, you could try to find some random character that never appears in your data, but that just means you'll die horribly if that character ever does appear, and it means producing output that no existing parser knows how to handle. CSV is a well-known format (if surprisingly complex, with varied dialects), and can likely be parsed by existing libraries in whatever language needs to consume it.

JSON (handled in Python by the json module) is often useful as well as a language-agnostic format, as is pickle (though it's only readable in Python), but from what you described, CSV is probably the go to solution to start with.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
1

Generally, good separators can be any kind of normal, keyboard-typable symbol that isn't used anywhere else in the data. My suggestion would be either '|' or '/'.

Dan Lewis
  • 70
  • 1
  • 8
1

CSV files typically use quotes around fields that contain field separator characters, and use a backslash or another quote to escape a literal quote.

CSV is not a well defined format, however, and there are many variants implemented by different vendors. If you want a better-rounded text format that can store structured data you should look into using one of the better defined serialization formats such as JSON and YAML instead.

blhsing
  • 91,368
  • 6
  • 71
  • 106
  • This is very good information to know. You can confirm this yourself by creating a Google Sheets document and populating it with text. Download the file in .csv format and you'll see that cells with commas are placed within quotation marks. – Nooner Bear Dec 19 '22 at 20:27