I was building a system that processes poses from videos using Python, and then, a Javascript (react) application that estimates the user pose on webcam in real time, and compares it with the Python processed poses.
The thing is that I started encountering very different results on the coordinates... I made a test running the same video on both applications, and it gives a very discrepant result. I've tried to seek for some patter to transform the data (sometimes the X axis in python seems to be the Y axis in javascript, and vice-versa), but testing more than one scenario, I just couldn't get a reliable pattern to transform and match the data.
I'm using the same version of mediapipe in both applications. I know that python and javascript mediapipe implementation can be slightly different... but it is that different or am I missing something?
Thank you!