0

The code below loads a model already trained in VertexAI and runs a pipeline for batch predictions. However, I get a json decoder error that I am not able to figure out where it comes from. The input file is in jsonl format and it works fine if I run batch predictions manually from the VertexAI dashboard. Therefore, there is something wrong in my pipeline that I cannot see.

Any Help?

 import kfp
 import google.cloud.aiplatform as aip
 from google_cloud_pipeline_components import aiplatform as gcc_aip
 
 import datetime
 
 from kfp.v2 import compiler 
 from kfp.v2.dsl import component, Artifact, Output

 PROJECT_ID='my-project-id'
 REGION='europe-west4'
 SOURCE_ROOT='gs://source_root/'
 JSONL_FILE='input.jsonl'
 DESTINATION_OUTPUT='gs://destination_output'
 PIPELINE_ROOT='gs://bucket/pipeline_root/'
 MODEL_ID='vertexai-model-id'

 ts = int(datetime.datetime.utcnow().timestamp() * 100000)

 @component()
 def load_ml_model(project_id: str, model: Output[Artifact]):
     """Load existing Vertex model"""
     region='europe-west4'
     model_id=MODEL_ID
     model_uid=f'projects/{project_id}/locations/{region}/models/{model_id}'
     model.uri = model_uid
     model.metadata['resourceName'] = model_uid

@kfp.dsl.pipeline(
    name='batch-pipe'+str(ts),
    pipeline_root=PIPELINE_ROOT)
def pipeline(project_id: str):
    ml_model=load_ml_model(project_id='my-project-id')

    model_batch_pred_op = gcc_aip.ModelBatchPredictOp(
         project=project_id,
         location=REGION,
         job_display_name='batch-pred',
         model=ml_model.outputs['model'],
         gcs_source_uris=f'gs://source_root/input.jsonl',
         gcs_destination_output_uri_prefix=f'gs://destination_output/'
        )

compiler.Compiler().compile(
     pipeline_func=pipeline,
     package_path="text_class_pipeline.json",
                           )

def run_batch_pred(project_id,region):
    aip.init(
       project=project_id,
       location=region,
          )

job = aip.PipelineJob(
    project=project_id,
    display_name='batch_pipeline',
    template_path='text_class_pipeline.json',
    pipeline_root=PIPELINE_ROOT,
    parameter_values={'project_id': project_id},
)

job.run()

run_batch_pred(project_id=PROJECT_ID, region=REGION)

ERROR I get

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 217 (char 216)

Also the model is loaded correctly. The batch predictions stage fails

enter image description here

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
Annalix
  • 470
  • 2
  • 6
  • 17
  • 1
    There is some problem with the json file, you are loading it wrong or you are loading an empty file. Post the full error from `Traceback (most recent call last):`, there we can see the file path. Double check the json file and try to validate it. – ewertonvsilva Jan 28 '22 at 14:47
  • Thanks, I have already validated it by running batch predictions from the VertexAI GUI and it works fine. I couldn't find the file path from the Traceback. I will double check. Thanks – Annalix Jan 28 '22 at 15:11
  • Unfortunately not. I think that the problem is in load_ml_model. I think this is the only stage that can give errors. I am still working in it. Hope to come back soon. Thanks – Annalix Feb 02 '22 at 15:28
  • @Annalix same problem here. Did you find an answer? – Max Dec 27 '22 at 20:33
  • 1
    @Max I have posted the solution. Please let me know if something is unclear – Annalix Jan 02 '23 at 16:08

1 Answers1

1

Create two components: load the model and batch prediction

model_uri = 
f'projects/{PROJECT_ID}/locations/{REGION}/models/{MODEL_ID}'

 
@kfp.dsl.pipeline(
    name='batch-pipe'+str(ts),
    pipeline_root=PIPELINE_ROOT)
def pipeline(project_id: str):
"""Load existing Vertex model"""
VertexModelArtifact = kfp.dsl.importer(artifact_uri=model_uri, 
                         artifact_class=VertexModel, 
                         metadata={"resourceName":model_uri}, 
                         reimport=False) 
"""batch predictions"""
batch_predict_op = gcc_aip.ModelBatchPredictOp( 
                   project=PROJECT_ID, 
                   location=REGION, 
                   job_display_name="batch-pred", 
                   model=VertexModelArtifact.output,
                   gcs_source_uris=
                           f'gs://source_root/input.jsonl', 
                   instances_format="jsonl",             
                   gcs_destination_output_uri_prefix=
                           f'gs://destination_output/', 
                   machine_type="n1-standard-4" )
                  

compiler.Compiler().compile(pipeline_func=pipeline,
                       package_path="pipeline.json",
                            )
Annalix
  • 470
  • 2
  • 6
  • 17
  • Thanks for your kindness @Annalix. In the meanwhile I was able to solve this loading issue, but will help other colleagues as well. Now I'm stuck trying to pass the generated `ModeBatchPredictOp`BQTable artifact to the next component, but `batch_predict_op.outputs['bigquery_output_table']` does not work for me. If you have any clue on that, please let me know. Thanks again – Max Jan 04 '23 at 13:00