2

We are creating an app that lets a user capture a number of images and it will try to create a 3D model of the target object. In order to help the users capture useful images we give them some guidance while they move their phone from one capture to the next.

We have a nice prototype working by means of navigator.mediaDevices.getUserMedia() that captures video, displays it in a <video> element, and has an overlay that shows how to move the phone. When they are ready they press a button and we grab the current frame of the streamed video.

We were quite happy with this until we realized that very often the captured image would not have enough quality; mainly they tend to be a bit blurred because the user may not hold the device totally still. This causes the math behind creating the 3D model to fail.

I am now tasked with attempting to improve this but I think I don't have many options. Here is what I have been investigating and their drawbacks:

  1. JavaScript's ImageCapture API. This seems to be exactly what we need: a way to actually take a picture instead of grabbing a frame from a video. While the API has still an experimental status, it seems pretty stable and Chrome has it implemented since version 59. The problem is that Safari (our main target) does not have it implemented and it seems they won't ever do. I can't really find information on what their plan is though but as of today, this is not an option.
  2. Use the input element of type file with the attribute capture. While this lets me capture images with the native camera, I cannot give the user any guide as far as I know.
  3. Create a whole mobile app. This requires another year of work and requesting our existing users to install an app, which may not be possible. Also leaves Android devices out which we'd prefer not to.
  4. While typing this I thought of perhaps using the video instead of capturing the images, but not sure this would help in any way.
  5. Instead of a different approach to the way of capturing the image, I could try to only grab the image if I can confirm that the device is as close as still as possible (using a threshold value). Perhaps I could use the gyroscope for this (we are using it to check they have moved the device to a place and angle we consider useful for the process). The drawback of this is that I am not sure it would really mitigate our problem... how still is still enough? is it possible for the person to be that still for a second?

So my question here is, can anyone think of another alternative to those I descrived? or perhaps improve one of the enumerated ones?

BTW does anyone know what are Apple's plans for the ImageCapture API?

Alejandro B.
  • 4,807
  • 2
  • 33
  • 61
  • If anyone is interested on our current prototype implementation, it's basically what this blog post descrives: https://developers.google.com/web/fundamentals/media/capturing-images. – Alejandro B. Aug 13 '21 at 07:41
  • Using a video as you suggest may actually help you - you could select the region of the video you are interested in and use some of the techniques used to stabilise videos to get a single stable frame. To avoid wasted work, you can experiment with existing online services or applications which stabilise videos first to see if they meet your needs, and if they do then look at what it would take to add similar functionality to your own solution. – Mick Aug 16 '21 at 16:11

0 Answers0