Our team is developing a simple Angular website that sends a form data to our backend via API. This website will be published under a public IP, so the interaction will be protected with Google reCAPTCHA v3.
Recently we decided to run some stress tests in a pre-productive environment in order to see if everything is stable and works correctly. So we set up a simple JMeter tests group with 100 users and 100 loops. This way we had 10000 requests in total to our API. So, having all that configured, we ran the frontend in order to generate a reCAPTCHA token, executed the action which sends the data to the API and copied the generated token into the JMeter configuration.
The API, before passing the form data to the backend, checks if the token is valid by making a request to “https://www.google.com/recaptcha/api/siteverify”, specifying this token and the secret key generated in the reCAPTCHA admin console.
After executing the JMeter tests, we saw that a certain amount of requests bypass the validation of the token and end up in the backend, while the token has already been used.
What we tried?
Stress test of our API with 10k requests in order to try the validation of an already used reCAPTCHA token.
What we expected to happen?
All the requests to return an error code of 401, beacause the reCAPTCHA token has already been used before and those tokens are single use only (the 401 code is returned by our API if the request to "/siteverify" returns a "success:false" in the response body or a status code other than 200).
What actually resulted?
5% of the requests bypassed the validation and ended up in the backend.
(The 400 bad request errors are returned by the backend, after the validation of the token has already been done, meaning that the token was assumingly valid)
We chequed the logs of our API and we've been able to verify that, in fact, the "/siteverify" requests return a status code of 200 and a response body of "success:true" in those 5% of the tests.
To me, it seems like some kind of balancing problem, maybe some node didn’t have enough time to replicate the status of that token? Or maybe the problem comes from our implementation… Hopefully someone could give us a hint!