-2

I am trying to convert a json to ndjson so that I can use it in BQ.

The full script can be found here.

When I run the conversion, I get the following error: "There was an error or your JSON is not valid. Details: jsonObject.map is not a function."

I can't find jsonObject.map anywhere in the script :/

Any help would be greatly appreciated!

Fabien
  • 87
  • 8
  • Hi @Fabien, If you find my answer helpful, please consider to accept & upvote it as per [Stack Overflow guidelines](https://stackoverflow.com/help/someone-answers), helping more Stack contributors with their researches. If not, let me know so that I can improve the answer – Prajna Rai T Oct 12 '22 at 13:32
  • Hey @PrajnaRaiT Thank you for your answer! I have tried to use the BQ documentation but when I upload the file, I get the following error message: "Error while reading data, error message: Failed to parse JSON: Unexpected end of string; Unexpected end of string; Expected key File: prod-scotty-3a363ee6-0199-43da-a1de-5f6037779996". Where do you run the code you sent? Is it in python? – Fabien Oct 13 '22 at 12:24
  • You can run the python [code](https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#python_1) in cloud shell editor. Make sure you upload json data file in the correct format and provide the schema of the table in code correctly. – Prajna Rai T Oct 13 '22 at 14:51

1 Answers1

1

You can consider this code to convert json to NEWLINE_DELIMITED_JSON and load the data into BigQuery.

from google.cloud import bigquery
from google.oauth2 import service_account
import json

client = bigquery.Client(project="project-id")

dataset_id = "dataset-id"
table_id = "table-id"

list_dict =[
        {   
        "name": "jack",
        "id": "101",
        "info": [
         {
           "place": "india"
          }
         ]}]

with open ("sample-json-data.json", "w") as jsonwrite:
   for item in list_dict:
       jsonwrite.write(json.dumps(item) + '\n')  #newline delimited json file

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)

job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
job_config.autodetect = True

with open("sample-json-data.json", "rb") as source_file:
   job = client.load_table_from_file(
       source_file,
       table_ref,
       location="us",  # Must match the destination dataset location.
       job_config=job_config,
   )  # API request

job.result()  # Waits for table load to complete.

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

Output:

enter image description here Your json data has to be in the format:

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"}]}

Also you can consider this code by speciying BigQuery schema and storing json data file in GCS bucket.

Prajna Rai T
  • 1,666
  • 3
  • 15