2

So now I'm currently using Google CloudSQL for my needs.

I'm collecting data from user activities. Every day the number of rows in my table will increase around 9-15 million rows and always updated every second. The data including several main parameters like user locations (latitude longitude), timestamp, user activities and conversations and more.

I need to constantly access a lot of insight from this user activities, like "how many users between latitude-longitude A and latitude-longitude B who use my app per hour since 30 days ago?".

Because my table become bigger every day, it's hard to manage the performance of select query in my table. (I already implemented the indexing method in my table especially for most common use parameter)

All my data insert, select, update and more is executed from API that I code in PHP.

So my question is can I get much more better benefit if I use Google BigQuery for my needs?

If yes, how can I do this? Because is Google BigQuery (forgive my if I'm wrong) designed to be used for static data? (Not a constantly update data)? How can I connect my CloudSQL data into BigQuery in real time?

Which one is better: optimizing my table in CloudSQL to maximize the select process or use BigQuery (if possible)

I also open for another alterntive or sugget to optimize my CloudSQL performance :)

Thank you

Community
  • 1
  • 1
hum_hum_pa
  • 121
  • 9
  • I'm biased because I work on BigQuery, but it sounds like it would suit your needs better. It supports real-time streaming as well as batch imports through its API, and you'll undoubtedly see much faster performance. – Elliott Brossard Oct 27 '16 at 05:05
  • @ElliottBrossard Hi so I can use streaming API on BigQuery to sync the data with CloudSQL and then use BigQuery to process my select data? – hum_hum_pa Oct 27 '16 at 05:25
  • Can you have your application directly [stream data into BigQuery](https://cloud.google.com/bigquery/streaming-data-into-bigquery) instead? If you want to copy your existing data from CloudSQL to BigQuery, you can [export to CSV](https://cloud.google.com/sql/docs/import-export/exporting#csv) and then [import from CSV](https://cloud.google.com/bigquery/loading-data#loading_csv_files). – Elliott Brossard Oct 27 '16 at 05:54
  • Hi @ElliottBrossard thanks for your help. I have tried to import my MySQL data to BigQuest but I found a problem with "point" data type. I used to use Geospatial function on MySQL such as to find data between polygon of latitude and longitude. Is this possible with BigQuery? Thanks – hum_hum_pa Nov 01 '16 at 03:41
  • We don't have geospatial functions yet, but you can emulate some of the functionality using [user defined functions](https://cloud.google.com/bigquery/sql-reference/user-defined-functions). You would need to store the points as records (structs) with two fields. – Elliott Brossard Nov 01 '16 at 03:56

1 Answers1

1

Sounds like BigQuery would be far better suited your use case. I can think of a good solution:

  1. Migrate existing data from CloudSQL to BigQuery.
  2. Stream events directly to BigQuery (using a async queue).
  3. Use time partitioned table in BigQuery.

If you use BigQuery, you don't need to worry about performance or scaling. That's all handled for you by Google.

Graham Polley
  • 14,393
  • 4
  • 44
  • 80
  • Hi @polleyg thanks for your solution. But I have one question: I used to use geospatial query on MySQL to get a data inside polygon of latitude and longitude, is it possible in BigQuery? Because I found BigQuery doesn't have "point" data type? Thanks – hum_hum_pa Nov 01 '16 at 03:40
  • You don't need a "point" data type. Just use float. See http://stackoverflow.com/questions/27998739/query-geospatial-data-with-bigquery and https://cloud.google.com/bigquery/docs/reference/legacy-sql#mathfunctions – Graham Polley Nov 08 '16 at 12:27