0

I am trying to call the language detection method of the translate client api from pyspark for each row in a file.

I created a map method as the following but the job seems to just freeze with no error. If I remove the call to the translate API it executes fine. Is it possible to call Google client API methods within pySpark map ?

mapping method to do translation

def doTranslate(data):

translate_client = translate.Client()

# Get the message information
messageId = data[0]
messageContent = data[6]

detectedLang = translate_client.detect_language(messageContent)

r = []
r.append(detectedLang)
return r
Adam Taub
  • 69
  • 4
  • How much data is going through language detection? Is it possible that the job appears to hang when making API calls? – Angus Davis Dec 04 '17 at 17:53

1 Answers1

1

Figured it out!! your question led me in the right direction. thanks!

Turns out I was getting an exception from the call because I was going past the default quota for sizes of messages. I added a try/except block and determined this was the problem. Then cutting the message size down (I am just testing so dont want to mess with the quota) fixed the issue.

Adam Taub
  • 69
  • 4