0

Trying to transcribe an audio file stored in a S3 bucket which I have access, with AWS SDK PHP API, as follows, gives me a throttling exception: rate exceeded.

I have read the documentation a dozen of times and can't find a simple working example that shows how to successfully transcribe a file with AWS transcribe and their PHP API.

date_default_timezone_set('America/New_York');
try 
{
    require '/var/www/html/aws/sdk/aws-autoloader.php';
} 
catch (Exception $e) 
{
    echo 'Caught exception: ',  $e->getMessage(), "\n";
}
use Aws\TranscribeService\TranscribeServiceClient;

$client = new Aws\TranscribeService\TranscribeServiceClient([
    'version'       => 'latest',
    'region'        => 'us-east-1',
    'credentials'   => [
                    'key'           => 'xxxx',
                    'secret'        => 'yyyy',
                    'curl.options'  => array(CURLOPT_VERBOSE => true)
                    ]
        ]);
$job_name = "tjob".date("mdyhisa");
$job_uri = "https://s3.amazonaws.com/....mp3";          

$result = $client->startTranscriptionJob([
    'LanguageCode' => 'en-US', 
    'Media' => [ 
    'MediaFileUri' => "$job_uri",
    ],
    'MediaFormat' => 'mp3', 

    'TranscriptionJobName' => "$job_name", 
]);
/* removing this loop and the sleep() below would retrieve some structured response, 
but of course the operation status is IN_PROGRESS */
while(true)
{
    /* added to discover if holding a few seconds would work: it doesn't
       and gives back a 504 Gateway Timeout */
    sleep(rand(3,5));
    /* -- */
    $result = $client->getTranscriptionJob(['TranscriptionJobName' => "$job_name"]);
    if ( ($result['TranscriptionJob']['TranscriptionJobStatus']=='COMPLETED') || ($result['TranscriptionJob']['TranscriptionJobStatus']=='FAILED'))
    {
        break;  
    }
}
var_dump($result);

So question is: how to get the transcription output?

By the way, I don't need this asynchronously...it is fine for my little project to wait for it to process and return.

1 Answers1

0

Your code is probably working fine, but your while(true) loop is calling the API too many times, hence the throttling exception: rate exceeded error.

I suggest you put a delay of 5 seconds between each call to getTranscriptionJob. I've found that a job can take around 60 seconds to complete, so you don't need to call it continuously.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
  • I did tried that but instead I get a 504 Gateway Timeout error –  Apr 03 '19 at 02:25
  • On which line do you receive this error? Are you saying that introducing a delay between calls results in the 504 error? That seems rather strange! – John Rotenstein Apr 03 '19 at 02:37
  • Correct: adding the sleep() throws me 504 error (without further debug print), without sleep() I get rate exceeded. –  Apr 03 '19 at 02:39
  • If removing the while() loop, I receive the structured response but of course its status is IN_PROGRESS So definitely the sleep() is causing the 504 error –  Apr 03 '19 at 02:44
  • It might be that, because Transcribe is taking so long to finish (eg 60 seconds+), then your web server is treating it as a "page timeout". In this case, you will need to rearchitecture your app so that it doesn't sit and wait for Transcribe to complete, but rather it submits a job and later something tries to obtain the results (without looping and waiting). – John Rotenstein Apr 03 '19 at 02:55
  • Yes, that's what I also thought. I fiddled with set_time_limit(0) and sleep(total_audio_duration), whatever it is in seconds, and still 504. However...I do see the transcribe text in the AWS console. Unfortunately, I guess, this will need a kind of cron job to look at the S3 bucket where the transcribe jsons get placed, and continue the logics. Thx. –  Apr 03 '19 at 03:05