When trying to use a Quick Start model in AWS Sagemaker, specifically for Object Detection, all fine tune models fail to train.
I'm attempting to fine tune a SSD Mobilenet V1 FPN 640x640 COCO '17
model.
The annotations and images are accepted, but after initializing the training session, the Training Job is unable to find a specific file: FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/annotations.json
.
The S3 directory given follows the template required, using a 1 image example for simplicity:
images/
abc.png
annotations/
abc.json
The following stack trace is returned:
We encountered an error while training the model on your data. AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "FileNotFoundError: [Errno 2] No such file or directory: '/opt/ml/input/data/training/annotations.json'
"
Command "/usr/local/bin/python3.9 transfer_learning.py --batch_size 3 --beta_1 0.9 --beta_2 0.999 --early_stopping False --early_stopping_min_delta 0 --early_stopping_patience 5 --epochs 5 --epsilon 1e-7 --initial_accumulator_value 0.1 --learning_rate 0.001 --model-artifact-bucket jumpstart-cache-prod-us-east-1 --model-artifact-key tensorflow-training/train-tensorflow-od1-ssd-mobilenet-v1-fpn-640x640-coco17-tpu-8.tar.gz --momentum 0.9 --optimizer adam --reinitialize_top_layer Auto --rho 0.95 --train_only_top_layer False", exit code: 1
There might be an internal bug where the mapping of input annotations isn't transformed and placed into this directory in the Training Job container?