You can not have multiple capture sessions so at some point you will need to swap to higher resolution. First thing you are saying that face detection takes too much resources when using high res snapshots.. Why not try to simply down-sample the image and keep using high resolution all the time (send the down sampled one to the face detection, display the high res):
I would start with most common apple's graphic context and try to down scale it. If that takes too much cpu you could try to do the same on the GPU (find some library that does that or just create a simple program) or you could even try to simply drop odd lines and columns of the image as the raw data. In any of those cases you should also note that you probably do not need the face detection on the same thread as displaying, also you most likely don't even need a high frame rate for the detection (you display camera a full FPS but update the face recognition at 10 FPS for instance).
Another thing you can do is simply have the whole thing in low res, then when you need to take the image stop the session, start high res session, take a screenshot and swap back to low res for face detection.