0

I'm trying to convert Json file to ndjson. I'm reading the file from GCS(google cloud Storage). sample data:

{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}{
  "Item1" : "INT",
  "Item2" : "INT",
  "Item3" : "text",
  "Item4" : "text",
  "Item5" : "Date"
}

following is my code.

bucket = client.get_bucket('bucket name')
# Name of the object to be stored in the bucket
object_name_in_gcs_bucket = bucket.get_blob('file.json')
object_to_string = object_name_in_gcs_bucket.download_as_string()
#json_data = ndjson.loads(object_to_string)
json_list = [json.loads(row.decode('utf-8')) for row in object_to_string.split(b'\n') if row]

The error I'm receiving is at json_list: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

required output:

{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
{"Item1" : "INT","Item2" : "INT","Item3" : "text","Item4" : "text","Item5" : "Date"}
Soni Sol
  • 2,367
  • 3
  • 12
  • 23
Dr.teja
  • 13
  • 3
  • I think this could help https://stackoverflow.com/questions/51300674/converting-json-into-newline-delimited-json-in-python . If not you may want to post a sample of what object_to_string's value is. – OneLiner Dec 10 '20 at 19:56
  • Your input data oesn’t appear to be valid JSON - it needs surrounding [] and , between each }{ – DisappointedByUnaccountableMod Dec 10 '20 at 20:00
  • Your JSON is not valid. If this is the output from another tool, fix that tool. Otherwise you can use tricks like the first answer (which will break on more complex JSON) or hand edit the file to make it into valid JSON. – John Hanley Dec 10 '20 at 20:33

1 Answers1

1

I think your main problem is that you are splitting on line endings instead of the closing brace. Here is an example that accomplishes what I think you are trying.

from json import loads, dumps

with open("test.json") as f:
  file_string = f.read()
  dicts = [loads(f"{x}}}".replace("\n","")) for x in file_string.split("}")[0:-1]]
  for d in dicts:
    print(d)

with open("new.json", "a+") as newf:
  for d in dicts:
    newf.write(f"{dumps(d)}\n")

Output:

[root@foohome]# ./test.py
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
{'Item1': 'INT', 'Item2': 'INT', 'Item3': 'text', 'Item4': 'text', 'Item5': 'Date'}
[root@foo home]# cat new.json
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
{"Item1": "INT", "Item2": "INT", "Item3": "text", "Item4": "text", "Item5": "Date"}
OneLiner
  • 571
  • 2
  • 6