0

I have been trying to tune a model for a week now using the Vertex AI - Language, Tune a model following this tutorial.

  • I have used the the sample JSONL as mentioned in the link:

{"input_text": "question: How many people live in Beijing? context: With over 21 million residents, Beijing is the world's most populous national capital city and is China's second largest city after Shanghai. It is located in Northern China, and is governed as a municipality under the direct administration of the State Council with 16 urban, suburban, and rural districts.[14] Beijing is mostly surrounded by Hebei Province with the exception of neighboring Tianjin to the southeast; together, the three divisions form the Jingjinji megalopolis and the national capital region of China.", "output_text": "over 21 million people"} {"input_text": "question: How many parishes are there in Louisiana? context: The U.S. state of Louisiana is divided into 64 parishes (French: paroisses) in the same manner that 48 other states of the United States are divided into counties, and Alaska is divided into boroughs.", "output_text": "64"}

#RESPONSE enter image description here

[
    {
        "data": {
            "ui": {
                "promptTuningDataValidation": null
            }
        },
        "errors": [
            {
                "message": "Internal error encountered.",
                "errorType": "DATA_FETCHING_EXCEPTION",
                "path": [
                    "ui",
                    "promptTuningDataValidation"
                ],
                "extensions": {
                    "status": {
                        "code": 13,
                        "message": "Internal error encountered."
                    }
                }
            }
        ],
        "path": [],
        "responseContext": {
            "eti": "AbmU+mq7y+LD3sXIvd0HiJ+o6+rTDYPJo/GpJvIKfhY05IvVdmxDqC42CN5gVLencEHr5W98Cj96m3Z+Hp98PScM2ITzs2rhvw1/hwP4N895kcTpJ2m1maUYnhvirPihskmaTvYX7ViruJVckQbxU9oAKrp4JgHQuGNkKLz9jTia61w3aA=="
        }
    },
    {
        "responseContext": {
            "eti": "AbmU+mqW/vpr54v4wP5AMw5cSKpYa6FCtZ+QFyPsEw3dj6C1PHPD8lXZF1lXltj5q8l8VuJ0ZO3dvVwMUG+b80GE5/Fwg3CK0BkRnlXtjciIvJn/AJfhrH4JVwQSjMcZs8RV+f648xiVPqRltAH2OK/CPAX+1C6e/EeVXl7MY2N94OI0TXv985NiQ3EB2KRAFapTdJTqTTvwIuvNXBNYBW/BJQEJAhq71JBhe4BYQ82cQh36zFYgN4asA5Uqo68Kn6Gy7sdmf/EU6zhNe9S9k4GJ1wI04MaSPZIBlwWp+Q=="
        }
    }
] 

1 Answers1

1

Answering my own question(after a number of trial/error attempts):

  • Go to the Dashboard on Vertex AI [![vertex_ai_dashboard][1]][1] [1]: https://i.stack.imgur.com/Dvf0g.png

  • Click the "Enable all related API button"

  • Hey Mohit, once you got past this step, did you receive any errors during the actual training? I've been getting "Internal error encountered. Please try again" and my tuning job not actually ever running. – user3689720 May 29 '23 at 17:13
  • 1
    Yes, for the "Internal error encountered" I found the cause after going deep into the logs. ``` ERROR 2023-06-02T05:12:10.077057547Z [resource.labels.taskName: workerpool0-0] f'Dataset is inaccessible or contains less than {min_examples} ' ``` – Mohit Arvind khakharia Jun 06 '23 at 18:28
  • Maybe different issues are needed for these errors that can happen during training. - `[2023-06-13 14:24:30,236] [ERROR]: The host has 4 TPU chips but TPU support is not linked into JAX.` --- - `workerpool0-0 [2023-06-13 14:24:30,236] [ERROR]: Transient Cloud TPU error. This Job will retry twice automatically. If the job consistently fails due to this error, some bad Cloud TPUs may have been reassigned to you repeatedly; please retry at another time or choose a different region.` – Mohit Arvind khakharia Jun 13 '23 at 16:26