1

I want to transfer json files from Amazon S3 to BigQuery, but i've got the problem that BigQuery only support new delimited json files. I don't know how to transform my json files to ndjson. The problem is i dont know how to transform, because the load date and the record source within the outer JSON structure.

Here you can see an example of my json files:

{
    "edwLoadDate": "2017-01-18T18-08-44",
    "edwRecordSource": "MAILCHIMP",
    "data": [
        {
             "data": {
                  "lastName": "John",
                  "firstName": "Doe",
                  "email": "john@example123.com"
              },
            "edwSequence": "0"
          },  
          {
             "data": {
                  "lastName": "John",
                  "firstName": "Doe",
                  "email": "john@example123.com"
              },
            "edwSequence": "1"
           }
     ]
}

databekele
  • 11
  • 2
  • Hi OP. Let us know if the answer helped. If it's useful, consider upvoting it. If it answered your question, then please accept it. That way others know that you've been (sufficiently) helped. Also see [What should I do when someone answers my question](https://stackoverflow.com/help/someone-answers)? – Donnald Cucharo Oct 18 '21 at 23:22

1 Answers1

0

I've tried transfering your sample JSON files to BigQuery with the following steps:

  1. In your example JSON file, add a "[...]":

input.json

[{
    "edwLoadDate": "2017-01-18T18-08-44",
    "edwRecordSource": "MAILCHIMP",
    "data": [
        {
             "data": {
                  "lastName": "John",
                  "firstName": "Doe",
                  "email": "john@example123.com"
              },
            "edwSequence": "0"
          },  
          {
             "data": {
                  "lastName": "John",
                  "firstName": "Doe",
                  "email": "john@example123.com"
              },
            "edwSequence": "1"
           }
     ]
}]

  1. Run a shell script command:
$ cat input.json | jq -c '.[]' > output.json

After running the shell script command, here's the output.json file:

{"edwLoadDate":"2017-01-18T18-08-44","edwRecordSource":"MAILCHIMP","data":[{"data":{"lastName":"John","firstName":"Doe","email":"john@example123.com"},"edwSequence":"0"},{"data":{"lastName":"John","firstName":"Doe","email":"john@example123.com"},"edwSequence":"1"}]}
  1. Upload the converted JSON file to Cloud Storage buckets using the command:
gsutil cp output.json gs://sample-bucket
  1. Create a sample BigQuery Dataset that will hold the uploaded BigQuery JSON file. Here's the result after querying:

enter image description here

RJC
  • 1,224
  • 2
  • 12