2

I have a table with many (hundreds) & later will be thousands of update queries that I would like to execute from the datalab.

the code is the following: Reading the table with the commands:

%%sql --module std_sql_table
WITH q1 as (
  SELECT *  FROM `speedy-emissary-167213.pgp_orielresearch.update_queries`
)
select * from q1

import datalab.bigquery as bq
#uses panda for the dataframe
query_table_df = bq.Query(std_sql_table).to_dataframe(dialect='standard',use_cache=True)
print(query_table_df.head(10))
col_name = list(query_table_df) # the name of the column
print(col_name)
#THIS LOOP IS FOR THE UPDATE COMMAND ROWS THAT I WANT TO EXECUTE

#for index, row in query_table_df.iterrows():
    #print "running " + row[col_name]
    #row_query = row[col_name] 
    #query_result_row_df = bq.Query(row_query).to_dataframe(dialect='standard',use_cache=True)

The output is the following, I would like to execute every row in the table: 0 UPDATE speedy-emissary-167213.pgp_orielresear... 1 UPDATEspeedy-emissary-167213.pgp_orielresear... 2 UPDATE speedy-emissary-167213.pgp_orielresear... 3 UPDATEspeedy-emissary-167213.pgp_orielresear... 4 UPDATE speedy-emissary-167213.pgp_orielresear... 5 UPDATEspeedy-emissary-167213.pgp_orielresear... 6 UPDATE speedy-emissary-167213.pgp_orielresear... 7 UPDATEspeedy-emissary-167213.pgp_orielresear... 8 UPDATE speedy-emissary-167213.pgp_orielresear... 9 UPDATEspeedy-emissary-167213.pgp_orielresear... [u'f0_']

any idea is very welcomed!

eilalan
  • 669
  • 4
  • 20

1 Answers1

1

Please read the following doc: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-manipulation-language

Essentially you need to merge the update statements, or you will hit quota issue, pay much more than needed and get worse performance. Bigquery is good for analytics, but it should not be treated as general-purpose database.

xuejian
  • 195
  • 5
  • thank you for the advice. I will merge the update commands and will keep you updated – eilalan Jun 23 '17 at 00:00
  • I was thinking to create and populate the data on Google sheet and than import it to BigQuery, however, there is a 256 columns limit on google sheet. Since I am building this table from scratch (creating a table and then populating the data from two other tables), I have the flexibility to do it in different ways. Do you have another idea on how to create this table. will google cloud SQL work for that? I dont have any experience with it but happy to use it if needed. please advice. Or, creative ways to merge updates. Thanks for any advice! – eilalan Jun 23 '17 at 13:28
  • Interesting, I was not aware of the 256 columns limit on google sheet. But are you going to create the original table by hand, or by program? If it's the latter, then you could create your own JSON/CSV/Avro files and upload. If you plan to create it by hand, then I would probably find another editor (say liberoffice calc, which supports up to 1024 columns) to prepare the file. Please also read following docs: [loading-data](https://cloud.google.com/bigquery/loading-data) [preparing-data-for-loading](https://cloud.google.com/bigquery/preparing-data-for-loading) – xuejian Jun 23 '17 at 19:46
  • I am planning to generate it by program. I will take a look at the CSV option. thanks – eilalan Jun 23 '17 at 20:03