-1

I have created a flask api in python and deployed as a container image in gcp cloud run and running through the cloud scheduler, in my code i am reading large data (15 million rows and 20 columns) from big query, i have set my system config to 8gm ram 4 cpu.

problem1: It is taking too much time to read for about (2200 secs to read data)

import numpy as np
import pandas as pd
from pandas.io import gbq
query = """ SELECT * FROM TABLE_SALES"""
df = gbq.read_gbq(query), project_id="project_name") 

Is there any efficient way to read the data from BQ?

Problem2 : my code has stopped working after reading the data. when i checked the logs, i got this:

error - 503
textPayload: "The request failed because either the HTTP response was malformed or connection to the instance had an error.
While handling this request, the container instance was found to be using too much memory and was terminated. This is likely to cause a new container instance to be used for the next request to this revision. If you see this message frequently, you may have a memory leak in your code or may need more memory. Consider creating a new revision with more memory."

one of the work around is to enhance the system config if that's the solution please let me know the cost around it.

James Z
  • 12,209
  • 10
  • 24
  • 44
Kalyan Rao
  • 15
  • 4
  • have you looked at this ---> https://cloud.google.com/bigquery/docs/reference/libraries – Mr.Batra Oct 04 '21 at 06:46
  • Hint: Please don't use Indian words like "crore". People aren't going to understand it and might think you mean "core". – James Z Oct 04 '21 at 08:16
  • You can't load a petabyte datawarehouse on only 1 instance. It's not suitable and scalable. You need to use distributed system like Dataflow or Dataproc to process your data. BigQuery itself allow you to process efficiently the data and then to extract only the result of that. – guillaume blaquiere Oct 04 '21 at 15:11

2 Answers2

0

You can try GCP Dataflow batch job to read through a large data from BQ.

Soumik Das
  • 156
  • 11
0

Depending on the complexity of your Bigquery query you may want to consider the high performant Google Bigquery Storage API https://cloud.google.com/bigquery/docs/reference/storage/libraries

Jan Krynauw
  • 1,042
  • 10
  • 21