My use case is I want to calculate various gestures of a hand (the first hand) seen by the camera. I am able to find body anchors and hand anchors and poses. See my video here. I am trying to utilize previous position SIMD3 information to calculate what kind of gesture was demonstrated. I did see the example posted by Apple which shows pinching to write virtually, I am not sure that a buffer is the right solution for something like this.
A specific example of what I am trying to do is detect a swipe, long-press, tap as if the user is wearing a pair of AR glasses (made by Apple one day). For clarification I want to raycast from my hand and perform a gesture on an Entity or Anchor.
Here is a snippet for those of you that want to know how to get body anchors:
public func session(_ session: ARSession, didUpdate frame: ARFrame) {
let capturedImage = frame.capturedImage
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: capturedImage,
orientation: .right,
options: [:])
let handPoseRequest = VNDetectHumanHandPoseRequest()
//let bodyPoseRequest = VNDetectHumanBodyPoseRequest()
do {
try imageRequestHandler.perform([handPoseRequest])
guard let observation = handPoseRequest.results?.first else {
return
}
// Get points for thumb and index finger.
let thumbPoints = try observation.recognizedPoints(.thumb)
let indexFingerPoints = try observation.recognizedPoints(.indexFinger)
let pinkyFingerPoints = try observation.recognizedPoints(.littleFinger)
let ringFingerPoints = try observation.recognizedPoints(.ringFinger)
let middleFingerPoints = try observation.recognizedPoints(.middleFinger)
self.detectHandPose(handObservations: observation)
} catch {
print("Failed to perform image request.")
}
}