0

i am trying to create the JSONL training files for AutoML Natural Language and it say in the docs

To help you create JSONL training files, AutoML Natural Language offers a Python script that converts plain text files into appropriately formatted JSONL files. See the comments in the script for details.

i tried to follow the comments but i didn't get them i tried runing it with this

python jason.py C:\..dic.csv C:\..text.txt gs://mybucket

but it gives me :

(with 5 blank lines skipped)
Traceback (most recent call last):
  File "jason.py", line 688, in <module>
    main()
  File "jason.py", line 680, in main
    UploadFiles(annotated_files, FLAGS.target_gcs_directory)
  File "jason.py", line 636, in UploadFiles
    f.write(csv_line)
TypeError: write() argument must be str, not bytes

can anyone help me with an example of how to run the script please

  • `csv_line` is of type bytes because of the call to .encode in this line: `csv_line = (converted_file.ml_use + ',' + dst_path + '\n').encode('utf8')` but you've opened your file in text mode with this line: `with open(csv_file_path, 'w') as f:`. Quick fix would be to change the mode to 'wb'. Better fix would be to open the file with encoding='...' and use a CSV writer. –  Feb 23 '21 at 16:25

1 Answers1

0

The tool provided in created using python2. You can run python2 jsonl_converter.py -s sample_1.txt gs://your-bucket so that you won't be editing the code provided. Or if you'd like you can follow @Justin Ezequiel suggestion if you need to run it in python3. I just used the -s option to auto split long files.

Test using python 2: enter image description here

JSONL in designated GCS bucket: enter image description here

Ricco D
  • 6,873
  • 1
  • 8
  • 18