0

I'm trying to move a bunch of data that I have in a bucket (newline delimited json files) into BigQuery. BigQuery forbids certain characters in their field names, such as dashes - or slashes. Our data unfortunately has dashes in many of the field names, i.e.

jsonPayload.request.x-search

I tried renaming the field in the BigQuery schema to jsonPayload.request.x_search hoping that the loader would do some magic, but nope.

Aside from running a job to rename the fields in storage (really undesirable, especially because new files come in hourly), is there a way to map fields in the JSON files to fields in the BQ schema?

I've been using the console UI but it makes no difference to me what interface to use with BQ.

vasia
  • 1,093
  • 7
  • 18

2 Answers2

1

I see a few option to work around this:

  1. Create a Cloud Function to trigger when your new files arrive. Inside that function, read the contents of the file, and transform it. Write the results back to a new file and load it into BigQuery. I'm not sure how scalable this is in your situation. If your files are quite big, then this might not work.
  2. Create a Cloud Function to trigger when your new files arrive, and then invoke a Dataflow templated pipeline to ingest, transform, and write the data to BigQuery. This is scalable, but comes with extra costs (Dataflow). However, it's a nice pattern for loading data from GCS into BigQuery.
Graham Polley
  • 14,393
  • 4
  • 44
  • 80
1

Lazily, within BigQuery:

  • Import as CSV
    • One column per row, pick a delimiter that doesn't occur inside the files
  • Parse within BigQuery
    • Either with the BQ JSON functions
    • Or with javascript UDFs for maximum flexibility

At least this is what I usually do.

Felipe Hoffa
  • 54,922
  • 16
  • 151
  • 325