1

The frontend of my web app uses an access token (periodically generated by the backend) to issue requests to GCP text-to-speech. Issuing requests from the frontend rather than from the backend is essential to keep down both the costs of my service and the delays experienced by the user.

A not-well-intending user might use my app to synthesize speech for a short text, open the Network tab of the browser tools, and get the access token from the request headers sent by my frontend to GCP. He could thereafter use this access token to synthesize speech for a large corpus of text, with no way for me to catch him. How can I change the way I use access tokens to prevent this kind of fraudulent use?

Here are some directions I have thought about, but am not sure what is supported by GCP or whether there is an even better approach:

  1. Create a separate API key for each user of my app and generate access token for the specific user. Then, even if a user uses his token outside of my app, GCP would have a record of the requests made by that user and I could request that record using the API to charge him.

  2. Make access tokens single-use. This way, even if the user obtains the access token as it is being sent as part of a request, he will not be able to use it for another request.

AlwaysLearning
  • 7,257
  • 4
  • 33
  • 68
  • 2
    An Access Token grants access. You should not provide tokens to clients as there is no way to lock down the usage of that token. You will need to change your design so that the backend makes requests to Text-to-Speech on behalf of authorized clients. – John Hanley Jul 31 '21 at 20:50
  • @JohnHanley Issuing requests from the backend would greatly increase the cost of my service and probably at least double the delay experienced by the users. So, it is essential that the requests be submitted by the frontend. GCP disallows authentication with the API key from the frontend, but they do allow authentication with an access token from the frontend, so they must have had a secure way in mind... In particular, I have edited the question with some directions I think might be possible. – AlwaysLearning Jul 31 '21 at 21:24
  • 1
    Poor designs result in poor security. You will not find any official Google documents that support sending tokens to insecure clients. You will find warnings to not do it. I doubt that your costs will greatly increase - complete a cost analysis and post the results in your question. The same for the delays. You are making assumptions that are incorrect. – John Hanley Jul 31 '21 at 22:08
  • @JohnHanley I accept your demand for cost analysis. I am not sure such analysis is possible for the delays. What is clear is that the audio stream will have to travel from GCP to my server and then from the server to the user instead of a single transfer directly from GCP to the user. There is also one more issue. If my backend submits requests on behalf of thousands of users, will it not exceed some limit of API requests coming to GCP from a single IP address? – AlwaysLearning Jul 31 '21 at 22:48
  • 1
    1) I do not know about IP address limitations, but quotas are per project. You can request quota increases. 2) In cases where providing tokens is almost necessary, use short-term tokens. Tokens are valid for one hour (12 hours with an ORG policy). You can create tokens that are valid for shorter timeframes such as five minutes. I have written articles on how to create Access Tokens and how to specify the lifetime duration on my website. Example: https://www.jhanley.com/google-cloud-creating-oauth-access-tokens-for-rest-api-calls/ Change **expires_in** from 3600 to something like 300 seconds. – John Hanley Jul 31 '21 at 23:28
  • 1
    1) If you are providing Text-to-Speech for thousands of users, you will need to contact Google sales. The default quotas will not support your goals. 2) The *total characters per request** is 5,000. The resulting audio file is not that large to be of concern for transfer delays. I recommend that you investigate bandwidth charges for a) client does the request 2) backend does the request. The security and control items point to a backend service provider design. – John Hanley Jul 31 '21 at 23:39
  • @JohnHanley By default quotas, do you mean 500,000 characters per minute or something else as well? – AlwaysLearning Aug 01 '21 at 07:40
  • 1
    Yes - both (all) quotas apply. There is also the point of providing services for others. Consult the TOS (Terms of Service). – John Hanley Aug 01 '21 at 08:05
  • @JohnHanley Since your design necessitates passing user id to the backend, I am not sure I understand how it is secure either. I asked a separate question about this: https://stackoverflow.com/q/68710563 and will appreciate if you take a look. – AlwaysLearning Aug 09 '21 at 16:07
  • In order to authenticate with the backend, the backend has to know/obtain the user id. Properly designed systems use cookies and sessions and encrypt the data. The session data can be set to be destroyed when the browser is closed. If your concern is a person walking away and a hacker taking control, there is very little you can do about that. Same problem if the user was logged into their bank. – John Hanley Aug 09 '21 at 16:15

1 Answers1

0

As John Hanley mention you would rather change your design so the backend makes requests to Text-to-Speech on behalf of authorized clients

Toni
  • 1,054
  • 7
  • 12