The HOG detector takes in frames from the video, so you can just treat the video as a series of independent images and calculate your precision and recall from those results.
You can find the number of detected people in a given frame by looking at the length of the output Rect array from hog.detectMultiScale.
To find the total number of detection's for the entire video you would just sum the length of the detected results array from each frame.
Recall is the percentage of positive examples that were correctly detected. Which is pretty similar to the hit rate.
However, only looking at the recall or hit rate can be extremely misleading. For example, you could classify every space in the image as a person and you would have a recall and hit rate of 100%. But, that defeats the whole purpose of trying to detect something. Which is why most people also look at precision. Precision is the percentage of your detections that are correctly labeled.
Not all the detections will contain a person. Only looking at the number of detected boxes and the number of people in an image will not give you an accurate measure of hit rate, recall or precision.