Have you considered https://www.tensorflow.org/tfx/serving/architecture? You can definitely automate the entire workflow using tfx. I think the guide here does a good job walking through it. Depending on your use-case, you may want to use tft instead of Kubeflow like they're doing in that guide. Besides serving automation, you may also want to consider pipeline automation to separate the feature engineering from the pipeline mechanics itself. For example, you can build the pipeline, abstract out the feature engineering into a tensorflow function meeting certain requirements, and automate the deployment process also. This way you don't need to deal with the feature specs/schemas manually, and you know that your transformations are the same during serving as they were while training.
You can do the same thing with scikit-learn also, and I believe serving scikit-learn models is also supported under the vertex-ai umbrella.
To your point about latency, you definitely want the pipeline doing the transformations on the gpu, as such, I would recommend using tensorflow over something like scikit-learn if the use-case is truly time sensitive.
Best of luck!