I'm trying to translate a Python script using onnxruntime to Rust using tract_onnx. The specific POC I'm trying to implement is the rothe_vgg.py
script from the ONNX Model Zoo. This script uses three models:
- ultraface face detection (
version-RFB-320.onnx
) vgg_ilsvrc_16_age_imdb_wiki.onnx
andvgg_ilsvrc_16_gender_imdb_wiki.onnx
age and gender models
For now, I'm trying just the first model to detect faces. I can get the example Python code to work:
face_detector_onnx = "models/version-RFB-320.onnx"
face_detector = ort.InferenceSession(face_detector_onnx)
def faceDetector(orig_image, threshold = 0.7):
image = cv2.cvtColor(orig_image, cv2.COLOR_BGR2RGB)
image = cv2.resize(image, (320, 240))
image_mean = np.array([127, 127, 127])
image = (image - image_mean) / 128
image = np.transpose(image, [2, 0, 1])
image = np.expand_dims(image, axis=0)
image = image.astype(np.float32)
input_name = face_detector.get_inputs()[0].name
confidences, boxes = face_detector.run(None, {input_name: image})
boxes, labels, probs = predict(orig_image.shape[1], orig_image.shape[0], confidences, boxes, threshold)
return boxes, labels, probs
I'm basing my tract_onnx translation on the onnx-mobilenet-v2 example. My version currently looks like this:
let model = onnx()
.model_for_path("version-RFB-320.onnx")?
.with_input_fact(
0,
InferenceFact::dt_shape(f32::datum_type(), tvec!(1, 3, 240, 320)),
)?
.into_optimized()?
.into_runnable()?;
let image = image::open("bruce.jpg").unwrap().to_rgb8();
let resized = image::imageops::resize(&image, 240, 320, ::image::imageops::FilterType::Triangle);
let image: Tensor = tract_ndarray::Array4::from_shape_fn((1, 3, 240, 320), |(_, c, y, x)| {
resized[(x as _, y as _)][c] as f32 / 255.0
}).into();
let result = model.run(tvec!(image))?;
I'm running into an issue with the translation of the resized image into a tensor:
thread 'main' panicked at 'Image index (240, 0) out of bounds (240, 320)'.
Is this an issue of not having the right dimensions or the right ordering of each dimension? Am I missing something?
I know I haven't yet implemented the other translations, which are my next questions: how can I properly normalize with image_mean
, transpose, and expand dimensionality?