3

I'm doing some machine learning and data analysis on data from Google Analytics and other sources.

I've managed to deploy Cloud Datalab locally and connect to my BigQuery, however I am not sure if this is the best way to do things. I can see that just using vanilla Jupyter notebooks with Pandas I can still connect to BigQuery. Regular Jupyter has the advantage that it runs without Docker, and also has Python 3.

So I'm wondering if there's any benefit to doing this with Cloud Datalab locally besides SQL syntax highlighting? In short, are all the benefits of Cloud Datalab relevant only for cloud computing, or does it bring any advantages over Jupyter for local deployments too?

Thanks!

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
Tom
  • 113
  • 1
  • 5

1 Answers1

2

Even if you are using regular Jupyter, you can also still install the Datalab python package to use most of the datalab functionality.

My reasons for using Datalab over Jupyter when running locally are:

  1. Running a docker brings a well-tested environment.
  2. PyDatalab brings BigQuery APIs and magics, which create a good BigQuery playground. google.datalab.bigquery offers more than just creating a dataframe out of query.
  3. BigQuery's integration with charting (%%chart can take BQ queries).
  4. Machine learning tools and MLToolbox.
  5. Different UI.

Jupyter+The Datalab Package gives you 2, 3, and 4, though.

Chris Meyers
  • 1,426
  • 9
  • 14
  • OK thanks, I expected as much! I am using Jupyter with a datalab import at the moment. I have this import: import datalab.bigquery, how does it differ from google.datalab.bigquery which is what you mention in your answer? – Tom May 08 '17 at 08:41
  • datalab.bigquery was the Beta library and was kept around so that everyone's beta notebooks didn't break. google.datalab.bigquery is the GA library. The biggest changes are the improved support for Standard SQL. – Chris Meyers May 08 '17 at 16:10
  • OK thank you Chris Meyers, this is what I've been trying to understand. It would be helpful if all the deprecated repos would have a note saying which one to use for a new project with no legacy code, as I've been wading through lots of libraries which do the same thing, knowing that only one is the current version! – Tom May 12 '17 at 08:00