0

I am building a classification model using AutoML and I have some basic usage questions about the GCP.

1 - Data privacy question; if we save behavior data to train our model in BigQuery, does Google have access to that data? Could Google ever use that data to learn more about behavior of individuals we collected data from?

2 - Since training costs are charged by the hour, I would like to understand the relationship between data and training time. Does the time increase linearly with the size of the training data set? For example, we trained a classification using 1.7MB of data and it took 3 hrs. So, would training a model with 17MB of data take 30 hours?

3 - A batch prediction costs 1.16 USD per hour. However, our data is in a csv and it seems that we cannot upload a csv to do a batch prediction. So, we will try using the API. Therefore I have two questions: A) can we do a batch upload using the API and B) what are the associated costs?

4 - What exactly is an online prediction?

5 - When using the cost calculator (for machine learning), what is a node hour?

JOGZ
  • 1
  • 3
    Ask one question per post. It is OK to create multiple questions. Stack Overflow is for programming questions. Read this to help improve your questions:https://stackoverflow.com/help/how-to-ask – John Hanley Feb 18 '20 at 22:59

1 Answers1

0

1- As is mentioned in the Data Usage FAQ, Google does not use any of your content for any purpose except to provide you with the Cloud AutoML service.

2- The time required to train your model depends on the size and complexity of your training data, for detailed explanation take a look at the Vision documentation for example.

3- You need to upload your csv file to Google Cloud Storage and then you can use it in the API or any of the available client libraries. See Natural Language batch prediction, for example. For costs, check the documentation for the desired product. AutoML pricing depends on what feature you are using: Vision, Natural Language, Translation, Video Intelligence.

4- After you have created (trained) a model, you can deploy the model and request online (single, low-latency and real-time) predictions. Online predictions accept one row of data and provide a predicted result based on your model for that data. You use online predictions when you need a prediction as input for your business logic flow.

5- You can think of node as a single Virtual Machine, which resources are used for computing purposes. Machine types are different depending the product and purpose for which they are used. For example in image classification, the cost for AutoML Vision Image Classification model training is $3.15 per node hour, each node is equivalent to a n1-standard-8 machine with an attached NVIDIA Tesla V100.GPU. Then, node hour are the resources of such node used by one hour.

Community
  • 1
  • 1
Fcojavmelo
  • 368
  • 1
  • 11