0

I have a string of json data, though sometimes it is 'regular' json and sometimes the data is in json-lines format. Here is the current way I'm testing to see which format it is:

json_newlines = all([_line.strip()[-1].endswith((']', '}')) for _line in file_data.split('\n') if _line.strip()])

Is there a better way to do this, or does the above seem like a good way?

  • what do you call "regular" json or json-lines? please [edit] to show an example – Jean-François Fabre Jan 14 '19 at 20:25
  • Assuming [this](http://jsonlines.org/) is what you're referring to as "json-lines" or "json_newline", if the first line parses to a valid JSON object and there is any more data after, it must be JSON Lines because a standard JSON document would not parse that successfully. (Unless you're also trying to detect data that is not valid in either.) – glibdud Jan 14 '19 at 20:26
  • The best way would be for the generated document yo have an extension that indicated it like `.ndjson`. Some people try to pack different JSON objects into the same file and, while you might detect this, there is no assurance that each delimited object is regular and has the same fields. – roganjosh Jan 14 '19 at 20:40
  • A single-line json-lines file is a “regular” JSON file too, I don't see how you could handle that. Furthermore the answer to your question depends on at least 2 things: whether you can assume files to be well formed, and whether you want to parse files at the same time you determine the format or not. – Walter Tross Jan 14 '19 at 22:34

1 Answers1

0

You could use the built-in json library in order to try to load regular JSON, if fails (not the correct "regular" format) try then with this: jsonlines.

Your current solution is not wrong but has the drawback that you have to parse (manually) the entire string to check whether is a format or not, my suggestion is to delegate that work on the built-in json library.

Example:

import json
import jsonlines

loaded_data = {}

try:
    loaded_data = json.loads(json_data)
except ValueError as err:
    try:
        # Try to read the data here with.
        lines = json_data.split('\n')
        reader = jsonlines.Reader(lines)
        # ...
        # Add values to loaded_data. 
    except jsonlines.InvalidLineError as err:
        # .. Handle error here.
finally:
    # Work with loaded_data
Raydel Miranda
  • 13,825
  • 3
  • 38
  • 60