3

Here is the data set
https://gist.github.com/kirkstrobeck/d8b768867890807f9dc9

When using Google Prediction API it will go from RUNNING for about 30 minutes, then ERROR: INTERNAL ERROR.

Why does it fail? It seems to be a standard consumable regression model data set.

Kirk Strobeck
  • 17,984
  • 20
  • 75
  • 114
  • can you also include the full request and response? – Mehdi Jun 21 '15 at 23:18
  • I ran a few versions of this set and received the same error. I recommend asking this on the prediction-api-discuss Google Group (https://groups.google.com/forum/#!forum/prediction-api-discuss). There are some Google engineers on that list that are eager to help. – Grokify Jun 21 '15 at 23:29
  • It is successful when trimmed down to 1000 lines – Kirk Strobeck Jul 01 '15 at 19:53

1 Answers1

1

When attempting to answer this question, I looked at the API you speak of as well as its requirements. These requirements lie in the file format and how the text in said file is formatted. The first thing I will point out is that the Google Prediction API that "is uploaded to Google Cloud Storage as a CSV (comma-separated value) file." Your file is a TXT(at least on GitHub), but appears to have the correct structure of a CSV. However, when you take a look at the standards for this filetype, almost everyone has a different way they want it done. In the case of Google, they have very strict requirements on the file format(they also have some good examples here: cloud.google.com/prediction/docs/developer-guide#examples). Long story short, you shouldn't have spaces between your columns, it might cause an error in the processing seeing how it doesn't match the Wikipedia standards or Google's requirements.

EDIT: Sorry about the weird link stuff, I don't have enough rep to do more than two yet.

Community
  • 1
  • 1
Vera Gonzalez
  • 578
  • 2
  • 9
  • I don't think spaces after commas are the issue because Google's own example training sets have spaces after commas. For example: https://cloud.google.com/prediction/docs/language_id.txt – Grokify Jun 21 '15 at 22:04
  • I just ran this training set without spaces and also received `"trainingStatus": "ERROR: INTERNAL ERROR"` – Grokify Jun 21 '15 at 23:06
  • Interesting; that sounds like a Google problem, however unlikely that might be. – Vera Gonzalez Jun 24 '15 at 04:19
  • There have been a few reports of issues on the prediction-api-discuss Google Group and there are a number of Google engineers there to assist on debugging specific issues. I'd be interested to hear what their analysis of this is. – Grokify Jun 24 '15 at 04:22