Questions tagged [google-bigquery]

Google BigQuery is a Google Cloud Platform product providing serverless queries of petabyte-scale data sets using SQL. BigQuery provides multiple read-write pipelines, and enables data analytics that transform how businesses analyze data.

Google BigQuery is a web service that lets you do interactive analysis of massive datasets—up to billions of rows. Scalable and easy to use, BigQuery lets developers and businesses tap into powerful data analytics on demand.

Official sites:

Other sites for related topics and discussion:

25130 questions
4
votes
1 answer

Query key value in different columns from Google BigQuery

I gather analytics with Firebase Analytics which I linked to Google BigQuery. I have the following data in BigQuery (unnecessary columns/rows are left off, the dataset looks similar to…
Timon
  • 362
  • 4
  • 10
4
votes
4 answers

How do I Authenticate a Service Account to Make Queries against a GDrive Sheet Backed BigQuery Table?

My situation is as follows: Google Account A has some data in BigQuery. Google Account B manages Account A's BigQuery data, and has also been given editor privileges for Account A's Cloud Platform project. Account B has a Sheet in Google Drive that…
4
votes
3 answers

How to create partitioned BigQuery table in Java

https://cloud.google.com/bigquery/docs/creating-partitioned-tables shows how to create partitioned table in Python. I've been there, I've done that. Now the question is, how to do the same thing with Java API? What is the corresponding Java code…
Marcin Pietraszek
  • 3,134
  • 1
  • 19
  • 31
4
votes
1 answer

bigquery Repeated record added outside of an array

Error: Exception in thread "main" java.lang.RuntimeException {"errors":[{"debugInfo":"generic::failed_precondition: Repeated record added outside of an array.","reason":"invalid"}],"index":0} Language: Scala Gradle bigquery dependency: compile…
Sergey
  • 91
  • 1
  • 6
4
votes
1 answer

In Google BigQuery API, what is the default timeout for a query response?

In Google BigQuery API, what is the default timeout for a query response? In other words, how long does it wait by default until the response returns null for an incomplete job.
user1311888
  • 773
  • 3
  • 11
  • 24
4
votes
1 answer

How can you do SQL joins on dates between specific dates?

This may be incredibly hard or incredibly simple, I don't know which but I'm stuck. How do I join data that happened between specific days? How do I write that? The tricky thing is that every row would have a different time period, a unique period…
mike winston
  • 169
  • 1
  • 2
  • 9
4
votes
1 answer

How to schedule a job to execute Python script in cloud to load data into bigquery?

I am trying to setup a schedule job/process in cloud to load csv data into Bigquery from google buckets using a python script. I have manage to get hold off the python code to do this but not sure where do I need to save this code so that this task…
LondonUK
  • 437
  • 1
  • 8
  • 20
4
votes
1 answer

how to enable standard sql for BigQuery using bq shell

bq query has --use_legacy_sql flag that can be set to false to enable standard query. How to do the same if bq shell is used I tried below variations and both of those failed with error Unknown command line flag 'use_legacy_sql'. bq…
vaichidrewar
  • 9,251
  • 18
  • 72
  • 86
4
votes
1 answer

Google BigQuery - Using wildcard table query with date partitioned table?

I am trying to use wildcard table functions to query bunch of date-partitioned tables. This query works: select * from `Mydataset.fact_table_1` where _partitiontime='2016-09-30' limit 10 This query does not work: select * from…
Tim S
  • 185
  • 1
  • 13
4
votes
2 answers

BigQuery - Fact table update logic

I am working on to build prototype on a Big Query to performance and cost analysis, Requirements: Build a DW (star schema) for sales operations (incentives, leads,entitlements,forecast,marketing, leads etc) data for reporting and advanced…
Tim S
  • 185
  • 1
  • 13
4
votes
1 answer

Check if data already exists before inserting into BigQuery table (using Python)

I am setting up a daily cron job that appends a row to BigQuery table (using Python), however, duplicate data is being inserted. I have searched online and I know that there is a way to manually remove duplicate data, but I wanted to see if I could…
fragilewindows
  • 1,394
  • 1
  • 15
  • 26
4
votes
1 answer

How can I train from BigQuery instead of csv files in Cloud ML?

My training data is in BigQuery. How can I use it to train a model in Cloud ML?
rhaertel80
  • 8,254
  • 1
  • 31
  • 47
4
votes
3 answers

How to not count NULL values in DENSE_RANK()?

Say I have the following table: col NULL 1 1 2 Then I select: SELECT col, DENSE_RANK() OVER(ORDER BY col) as rnk from table Then I get: col rnk NULL 1 1 2 1 2 2 3 What I want to get is this: col rnk NULL NULL 1 1 1 1 2 2 But…
cshin9
  • 1,440
  • 5
  • 20
  • 33
4
votes
3 answers

Get any not null value of other fileds in aggregations

I want to aggregate on some fields and get any not null value on others. To be more precise the query looks something like: SELECT id, any_value(field1), any_value(field2) FROM mytable GROUP BY ID and the columns are like: ID | field1 | field…
S.Mohsen sh
  • 2,028
  • 3
  • 21
  • 32
4
votes
1 answer

Writing to BigQuery from within a ParDo function

I would like to call a beam.io.Write(beam.io.BigQuerySink(..)) operation from within a ParDo function to generate a separate BigQuery table for each key in the PCollection (i'm using the python SDK). Here are two similar threads, which unfortunately…
1 2 3
99
100