0
    import json
    file= open('webtext.txt','a+')
    
    with open('output-dataset_v1_webtext.test.jsonl') as json_file:
         data= json.load(json_file)
         for item in data:
         file.write(item)
         print(item)
    
    
 
>>> I am getting this error:
    
        raise JSONDecodeError("Extra data", s, end)
    json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 656)

I have already tried with json.loads()

My json file look like with multiple objects:

{"id": 255000, "ended": true, "length": 134, "text": "Is this restaurant fami"}
{"id": 255001, "ended": true, "length": 713, "text": "Clinton talks about her time of 'refle"}

Any advise will be highly appreciated on how to resolve the existing issue and write the dict['text'] into text file

Community
  • 1
  • 1
Faisal
  • 151
  • 3
  • 10

4 Answers4

1

I'm certainly not a JSON expert, so there might be a better way to do this, but you should be able to resolve your issue by putting your top-level data into an array:

[
{"id": 255000, "ended": true, "length": 134, "text": "Is this restaurant fami"},
{"id": 255001, "ended": true, "length": 713, "text": "Clinton talks about her time of 'refle"}
]

The error you're getting is basically telling you, that there may be no more than one top-level JSON entity. If you want more, they have to be put in an array.

jofrev
  • 324
  • 3
  • 11
1

Looks like you need to iterate each line in the file and then use json.loads.

Ex:

with open('output-dataset_v1_webtext.test.jsonl') as json_file:
    for line in json_file:   #Iterate Each Line
        data= json.loads(line.strip())   #Use json.loads 
        for item in data:
            file.write(item)
            print(item)
Rakesh
  • 81,458
  • 17
  • 76
  • 113
  • 1
    Can you add `print(line)` before json.loads? and post the print here – Rakesh Nov 19 '19 at 11:31
  • yes, I can print line: {"id": 259607, "ended": false, "length": 1024, "text": "Those who witnessed the fatal attack on Saturday, – Faisal Nov 19 '19 at 11:41
  • 1
    Looks like you have some additional data and the end of each line...Ex: `{"id": 255000, "ended": true, "length": 134, "text": "Is this restaurant fami"} DDD` – Rakesh Nov 19 '19 at 11:44
  • text field contains \n `new lines` with in the text but at the end on parenthesis `}` there is new line starting with `{` – Faisal Nov 19 '19 at 11:52
  • so there is the link of original file [link](https://storage.cloud.google.com/gpt-2/output-dataset/v1/webtext.test.jsonl) Thanks – Faisal Nov 19 '19 at 11:55
1

you need to loop through it:

import json


with open('output-dataset_v1_webtext.test.jsonl','r') as json_file:
    for line in json_file.readlines():
         data= json.loads(line)
         for item in data:
            print(item)
LinPy
  • 16,987
  • 4
  • 43
  • 57
  • 1
    then provide the correct file you are working on , that is working fine with your sample file – LinPy Nov 19 '19 at 11:34
  • Here is the link of [original file](https://storage.cloud.google.com/gpt-2/output-dataset/v1/webtext.test.jsonl) if it is possible for you to look into. Thanks. – Faisal Nov 19 '19 at 12:00
  • I am getting this error while writing file: `return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u30b9' in position 0: character maps to ` – Faisal Nov 19 '19 at 12:23
  • 1
    I am really did not face any problem with the file I can write it also without Problems – LinPy Nov 19 '19 at 12:28
1

As others have pointed out, your JSON must be surrounded in square brackets, as it can only have one top level object. Such as like this:

[
  {"id": 255000,"ended": true, "length": 134, "text": "Is this restaurant fami"},
  {"id": 255001, "ended": true, "length": 713, "text": "Clinton talks about her time of 'refle"}
]

then, you should be able to use this code to do so what you're trying:

import json
file = open('webtext.txt', 'a')

with open('test.json') as json_file:
    data = json.load(json_file)
    for item in data:
        file.write(str(item))
        print(item)

In order to fix your file.write issue you need to cast item as a string, like so: str(item).

banf
  • 39
  • 5
  • TypeError: the JSON object must be str, bytes or bytearray, not 'TextIOWrapper' – Faisal Nov 19 '19 at 11:35
  • 1
    Are you using `json.loads` rather than `json.load`? If you are, try it with `json.load`. – banf Nov 19 '19 at 11:40
  • How can I put `[]` around data file is very big and how can where can I open it and place `[]` – Faisal Nov 19 '19 at 11:47
  • 1
    Your file is a `.jsonl` JSON Lines file, meaning it has `"\n"` characters at the end of each line rather than commas, and is not surrounded by `[ ]`. You have two options, one would be to convert this file into traditional JSON which you can look up [here](https://www.google.com/search?&rls=en&q=convert+jsonl+to+json&ie=UTF-8&oe=UTF-8) or would be to find another way, I'm afraid I can't help with JSON Lines files as I'm not experienced with them. – banf Nov 19 '19 at 11:55