1

I would like to create a dataset to use it for fine-tuning GPT3. As I read from the following site https://beta.openai.com/docs/guides/fine-tuning, the dataset should look like this

{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
{"prompt": "<prompt text>", "completion": "<ideal generated text>"}
...

For this reason I am creating the dataset with the following way

import json

# Data to be written
dictionary = {
    "prompt": "<text1>", "completion": "<text to be generated1>"}, {
    "prompt": "<text2>", "completion": "<text to be generated2>"}

with open("sample2.json", "w") as outfile:
    json.dump(dictionary, outfile)

However, when I am trying to load it, it looks like this which is not as we want

import json
 
# Opening JSON file
with open('sample2.json', 'r') as openfile:
 
    # Reading from json file
    json_object = json.load(openfile)
 
print(json_object)
print(type(json_object))

>> [{'prompt': '<text1>', 'completion': '<text to be generated1>'}, {'prompt': '<text2>', 'completion': '<text to be generated2>'}]
<class 'list'>

Could you please let me know how can I face this problem?

1 Answers1

2

it's more like, writing \n a new line character after each json. so each line is JSON. somehow the link jsonlines throw server not found error on me.

you can have these options:

  1. write \n after each line:
import json
with open("sample2_op1.json", "w") as outfile:
    for e_json in dictionary:
        json.dump(e_json, outfile)
        outfile.write('\n')
#read file, as it has \n, read line by line and load as json
with open("sample2_op1.json","r") as file:
    for line in file:
        print(json.loads(line),type(json.loads(line)))
  1. which have way to read file too, its jsonlines install the module !pip install jsonlines
import jsonlines
#write to file
with jsonlines.open('sample2_op2.jsonl', 'w') as outfile:
    outfile.write_all(dictionary)
#read the file
with jsonlines.open('sample2_op2.jsonl') as reader:
    for obj in reader:
        print(obj)
simpleApp
  • 2,885
  • 2
  • 10
  • 19
  • Thank you so much for your help!!! Could you please let me know how to open the file using the first option? If I do this '# Opening JSON file with open('sample2_op1.json', 'r') as openfile: # Reading from json file json_object = json.load(openfile) print(json_object) print(type(json_object))' There is an error – John Angelopoulos Dec 02 '22 at 14:31
  • 1
    you are welcome! added read for jsonl file, in case needed for some use cases. – simpleApp Dec 02 '22 at 15:00