nanoGPT with custom dataset

Question

I am trying to use nanoGPT from https://github.com/karpathy/nanoGPT on my custom input file.

I have posted this issue on the repo itself ( at issue 172 ) but not getting any response there, hence lookin for some advice here on stackoverflow.

Following the same steps as described here for the Shakespere input file, I tried this on a custom input file which contains multiple paragraphs with different headings. The file contents looks like this

Heading 1
Some information related to heading 1 goes here

Heading 2
Some information related to heading 2 goes here

Containing 20 such paragraphs.

The "prepare.py" and "train.py" file executed successfully for this input file. However, when I try to generate one sample, the output is some incorrect english like this

paceliYai ominger fromally.Satexas Stence taly mand gollag adeppirirhon temas poymais,Mcenterted Should had & days to suratication tEO - ande ymanaN tor reeson travel from the enterns tleat sompoyers asubve the candidate can travel cof grotef dosiction of inotis, an too coan cile verand ginald to Employees. All embent ire thang falor ind the to pacomvertaly of the is enotiry for

Is this input dataset format correct? Or something specific is needed?

nanoGPT with custom dataset

0 Answers0