1

I'd like to use JS function in BigQuery when checking if file exists in Google Cloud Storage. But BigQuery kept showing error when **select UDF_FUNCTION_NAME() **

ReferenceError: require is not defined at 

My JS code is like this.

BUCKET PATH is gs://MY_BUCKET_PATH/FILE_NAME This function is checking if the file exists in the path.

function FileExist(){
  const {Storage} = require('@google-cloud/storage');
  const storage=new Storage();
  return (storage.bucket('MY_BUCKET_PATH').file('FILE_NAME').exists());
}

And my UDF is like this.

  My_Function ()
  RETURNS BOOL
  LANGUAGE js 
  OPTIONS (
    library=["gs://MY_BUCKET_PATH/FILE_NAME"]
  )
  AS r"""
    return FileExist();
  """;

How can I make this bigquery UDF use JS in google cloud storage?

I tried put JS code in udf but it didn't work because there is npm library problem I think. So now I'm trying to use function in library.

Eunhee Cho
  • 11
  • 2
  • As far as I know, udf are sandboxed and cannot access further cloud tools such as gcs. You would need to write a cloud function which can be started from bigquery as an external udf. Please describe your goal and why you would like to check for a file in gcs. There are external tables which read in files from gcs as well. – Samuel Feb 28 '23 at 18:11
  • @Samuel Thank you for suggesting. But, I can't use creating external table in udf. This is for checking data quality so I need sql query that can work in dataplex. Also, the file name that I have to check keeps changing every day. Any suggestion but external table? – Eunhee Cho Feb 28 '23 at 19:57
  • Agree with samuel. JS UDF can't call external API (external from BigQuery). External UDF (also called remote function) is the key here. https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions – guillaume blaquiere Feb 28 '23 at 20:52

1 Answers1

0

How to list all files in google cloud storage (GCS).

Please read the manual first.

Lets assume that your GCS is gs://test1234 and in the region zone europe-west3.

First add a connection in BigQuery:

enter image description here

enter image description here

Set the zone and the name (here) test123

enter image description here

Create this connection and go to it in BigQuery. Find under Service account id the e-mail address of it and copy it.

Go to IAM, add an user with that e-mail address and give this the right to view cloud storage. enter image description here

In BigQuery run this query to generate a table showing all content of the gcs bucket:

CREATE EXTERNAL TABLE `yourdataset.yournewtable`
WITH CONNECTION `europe-west3.test123`
OPTIONS(
  object_metadata = 'SIMPLE',
  uris = ['gs://test1234/*'],
  max_staleness = INTERVAL 4 HOUR,
  metadata_cache_mode = 'MANUAL');
Samuel
  • 2,923
  • 1
  • 4
  • 19
  • Thank you for explanation. I tried this also before trying with javascript. The reason why I gave up this solution is that I couldn't find the way of using this in one sql query with this DDL for dataplex. Since the file name in google cloud storage keeps changed, I need to use parameter in dataplex in query. – Eunhee Cho Mar 01 '23 at 02:01