0

I am using person-detection-action-recognition-0005 pre-trained model from openvino to detect the person and their action.

https://docs.openvinotoolkit.org/latest/_models_intel_person_detection_action_recognition_0005_description_person_detection_action_recognition_0005.html

From the above link, I wrote a python script to get detections.

This is the script.

import cv2 

def main():

    print(cv2.__file__)

    frame = cv2.imread('/home/naveen/Downloads/person.jpg')

    actionNet = cv2.dnn.readNet('person-detection-action-recognition-0005.bin',
                    'person-detection-action-recognition-0005.xml')


    actionBlob = cv2.dnn.blobFromImage(frame, size=(680, 400))
    actionNet.setInput(actionBlob)

    # detection output
    actionOut = actionNet.forward(['mbox_loc1/out/conv/flat',
                'mbox_main_conf/out/conv/flat/softmax/flat',
                'out/anchor1','out/anchor2',
                'out/anchor3','out/anchor4'])


    # this is the part where I dont know how to get person bbox
    # and action label for those person fro actionOut

    for detection in actionOut[2].reshape(-1, 3):
        print('sitting ' +str( detection[0]))
        print('standing ' +str(detection[1]))
        print('raising hand ' +str(detection[2]))

Now, I don't know how to get bbox and action label from the output variable(actionOut). I am unable to find any documentation or blog explaining this.

Does someone have any idea or suggestion, how it can be done?

Naveen Verma
  • 367
  • 1
  • 5
  • 18

1 Answers1

0

There is a demo called smart_classroom_demo: link This demo uses the network you are trying to run.
The parsing of outputs is located here The implementation is in C++ but it should help you to understand how outputs of the network are parsed.

Hope it will help.

Artemy Skrebkov
  • 346
  • 1
  • 8