Prepare json file for GPT

Question

I would like to create a dataset to use it for fine-tuning GPT3. As I read from the following site https://beta.openai.com/docs/guides/fine-tuning, the dataset should look like this

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

For this reason I am creating the dataset with the following way

import json

# Data to be written
dictionary = {
    "prompt": "<text1>", "completion": "<text to be generated1>"}, {
    "prompt": "<text2>", "completion": "<text to be generated2>"}

with open("sample2.json", "w") as outfile:
    json.dump(dictionary, outfile)

However, when I am trying to load it, it looks like this which is not as we want

import json
 
# Opening JSON file
with open('sample2.json', 'r') as openfile:
 
    # Reading from json file
    json_object = json.load(openfile)
 
print(json_object)
print(type(json_object))

>> [{'prompt': '<text1>', 'completion': '<text to be generated1>'}, {'prompt': '<text2>', 'completion': '<text to be generated2>'}]
<class 'list'>

Could you please let me know how can I face this problem?

simpleApp · Accepted Answer · 2022-12-02T14:59:31.967

2

it's more like, writing \n a new line character after each json. so each line is JSON. somehow the link jsonlines throw server not found error on me.

you can have these options:

write \n after each line:

import json
with open("sample2_op1.json", "w") as outfile:
    for e_json in dictionary:
        json.dump(e_json, outfile)
        outfile.write('\n')
#read file, as it has \n, read line by line and load as json
with open("sample2_op1.json","r") as file:
    for line in file:
        print(json.loads(line),type(json.loads(line)))

which have way to read file too, its jsonlines install the module !pip install jsonlines

import jsonlines
#write to file
with jsonlines.open('sample2_op2.jsonl', 'w') as outfile:
    outfile.write_all(dictionary)
#read the file
with jsonlines.open('sample2_op2.jsonl') as reader:
    for obj in reader:
        print(obj)

edited Dec 02 '22 at 14:59

answered Dec 02 '22 at 14:17

simpleApp

2,885
2
10
19

Thank you so much for your help!!! Could you please let me know how to open the file using the first option? If I do this '# Opening JSON file with open('sample2_op1.json', 'r') as openfile: # Reading from json file json_object = json.load(openfile) print(json_object) print(type(json_object))' There is an error – John Angelopoulos Dec 02 '22 at 14:31
1

you are welcome! added read for jsonl file, in case needed for some use cases. – simpleApp Dec 02 '22 at 15:00

Prepare json file for GPT

1 Answers1