I am trying to build a container running torchserve with the pretrained fast-rcnn model for object detection in a all-in-one Dockerfile, based on this example: https://github.com/pytorch/serve/tree/master/examples/object_detector/fast-rcnn
Dockerfile:
FROM pytorch/torchserve:latest
COPY ["config.properties", "model.py", "fasterrcnn_resnet50_fpn_coco-258fb6c6.pth", "index_to_name.json", "/home/model-server/"]
RUN torch-model-archiver \
--model-name=fastrcnn \
--version=1.0 \
--model-file=/home/model-server/model.py \
--serialized-file=/home/model-server/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth \
--handler=object_detector \
--extra-files=/home/model-server/index_to_name.json \
--export-path=/home/model-server/model-store
RUN rm model.py fasterrcnn_resnet50_fpn_coco-258fb6c6.pth index_to_name.json
CMD ["torchserve", \
"--start", \
"--model-store", "model-store", \
"--ts-config", "config.properties", \
"--models", "fastrcnn=fastrcnn.mar"]
model.py
and index_to_name.json
are taken from the example (https://github.com/pytorch/serve/tree/master/examples/object_detector/fast-rcnn) and placed in the root directory.
fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
can be downloaded from https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
config.properties:
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
workflow_store=/home/model-server/wf-store
default_workers_per_model=1
Building the container with:
docker build --tag aio-fastrcnn .
runs fine (aio for all-in-one).
Running the container with:
docker run --rm -it -p 8080:8080 -p 8081:8081 --name fastrcnn aio-fastrcnn:latest
also runs fine, but during start-up the worker downloads a different serialized model.
model_log.log:
model-server@15838dd41e69:~$ cat logs/model_log.log
2022-06-26T13:45:07,171 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2022-06-26T13:45:07,172 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - [PID]35
2022-06-26T13:45:07,173 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - Torch worker started.
2022-06-26T13:45:07,173 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - Python runtime: 3.8.0
2022-06-26T13:45:07,191 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2022-06-26T13:45:07,242 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - model_name: fastrcnn, batchSize: 1
2022-06-26T13:45:07,809 [INFO ] W-9000-fastrcnn_1.0-stdout MODEL_LOG - generated new fontManager
2022-06-26T13:45:08,757 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - Downloading: "https://download.pytorch.org/models/resnet50-0676ba61.pth" to /home/model-server/.cache/torch/hub/checkpoints/resnet50-0676ba61.pth
2022-06-26T13:45:08,919 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG -
2022-06-26T13:45:09,021 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 0%| | 0.00/97.8M [00:00<?, ?B/s]
2022-06-26T13:45:09,125 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 0%| | 280k/97.8M [00:00<00:36, 2.81MB/s]
2022-06-26T13:45:09,230 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 1%| | 592k/97.8M [00:00<00:34, 2.97MB/s]
2022-06-26T13:45:09,341 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 1%| | 960k/97.8M [00:00<00:31, 3.26MB/s]
...
2022-06-26T13:45:45,230 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 99%|█████████▉| 96.7M/97.8M [00:36<00:00, 2.68MB/s]
2022-06-26T13:45:45,344 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 99%|█████████▉| 97.0M/97.8M [00:36<00:00, 2.67MB/s]
2022-06-26T13:45:45,449 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 99%|█████████▉| 97.2M/97.8M [00:36<00:00, 2.63MB/s]
2022-06-26T13:45:45,547 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 100%|█████████▉| 97.5M/97.8M [00:36<00:00, 2.68MB/s]
2022-06-26T13:45:45,548 [WARN ] W-9000-fastrcnn_1.0-stderr MODEL_LOG - 100%|██████████| 97.8M/97.8M [00:36<00:00, 2.80MB/s]
https://download.pytorch.org/models/resnet50-0676ba61.pth
Once the file is downloaded the server runs fine, can be pinged and runs the correct model.
Ping:
$ curl http://localhost:8080/ping -UseBasicParsing
StatusCode : 200
StatusDescription : OK
Content : {
"status": "Healthy"
}
...
Model:
$ curl http://localhost:8081/models -UseBasicParsing
StatusCode : 200
StatusDescription : OK
Content : {
"models": [
{
"modelName": "fastrcnn",
"modelUrl": "fastrcnn.mar"
}
]
}
...
I don't understand why this new serialized file is downloaded. I thought the idea behind the torch-model-archiver
was to combine all necessary files into a single one. Have I fundamentally misunderstood something about how torchserve or docker works?
startup_log.log:
model-server@15838dd41e69:~$ cat logs/config/20220626134506612-startup.cfg
#Saving snapshot
#Sun Jun 26 13:45:06 GMT 2022
inference_address=http\://0.0.0.0\:8080
default_workers_per_model=1
load_models=fastrcnn\=fastrcnn.mar
model_store=model-store
number_of_gpu=0
job_queue_size=1000
python=/home/venv/bin/python
model_snapshot={\n "name"\: "20220626134506612-startup.cfg",\n "modelCount"\: 1,\n "created"\: 1656251106614,\n "models"\: {\n "fastrcnn"\: {\n "1.0"\: {\n "defaultVersion"\: true,\n "marName"\: "fastrcnn.mar",\n "minWorkers"\: 1,\n "maxWorkers"\: 1,\n "batchSize"\: 1,\n "maxBatchDelay"\: 100,\n "responseTimeout"\: 120\n }\n }\n }\n}
tsConfigFile=config.properties
version=0.6.0
workflow_store=model-store
number_of_netty_threads=32
management_address=http\://0.0.0.0\:8081
metrics_address=http\://0.0.0.0\:8082
I have also tried these steps replacing
fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
with
resnet50-0676ba61.pth
in all necessary places. But the worker still downloads resnet50-0676ba61.pth
during startup.
resnet50-0676ba61.pth
is needed during model building and pytorch checks if the file is in the model_dir
, if not it is downloaded. I updated the Dockerfile to copy resnet50-0676ba61.pth
into the torchserve container and included it as an extra-file in the torch-model-archiver command.
Dockerfile:
FROM pytorch/torchserve:latest
COPY ["config.properties", \
"model.py", \
"fasterrcnn_resnet50_fpn_coco-258fb6c6.pth", \
"resnet50-0676ba61.pth", \
"index_to_name.json", \
"/home/model-server/"]
RUN torch-model-archiver \
--model-name=fastrcnn \
--version=1.0 \
--model-file=/home/model-server/model.py \
--serialized-file=/home/model-server/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth \
--handler=object_detector \
--extra-files=/home/model-server/resnet50-0676ba61.pth,/home/model-server/index_to_name.json \
--export-path=/home/model-server/model-store
CMD ["torchserve", \
"--start", \
"--model-store", "model-store", \
"--ts-config", "config.properties", \
"--models", "fastrcnn=fastrcnn.mar"]
According to https://github.com/pytorch/serve/issues/633#issuecomment-677759331 resnet50-0676ba61.pth
should be accessible but it still gets downloaded on every startup.