2

I am trying to export my existing and all future data from my Firestore database to BigQuery.

I am using Export Collections to BigQuery firebase extension, which does export my new data to BigQuery. The Users_raw_latest table has correct data in it.

The problem is it does not have the data in tabular format as I want. I am expecting the schema fields to be collection properties. The table provides following schema:

enter image description here

The data field has the json of each document in the collection.

When I tried to store the old data (the data already generated before the extension was installed) in the BigQuery using the BigQuery guide, it created another table Users and had the exact correct schema I wanted, i.e. the document properties are the column of the table.

Is there any way I can use this existing Users table (I don't mind deleting and craeting one again) with document properties as columns be filled with new data inserted in Firestore with this extension or any alternative automation? Or do I have to write my own functions to do so?

Renaud Tarnec
  • 79,263
  • 10
  • 95
  • 121
Sunil Chauhan
  • 2,074
  • 1
  • 15
  • 33

1 Answers1

2

As explained in the doc, you need to generate your own schema view by using the fs-bq-schema-views script

You only need to provide a JSON schema file that describes your data structure, and the schema-views script will create the views.

The detailed doc is here.

The new schema will contain the columns corresponding to the fields declared in the JSON schema file you will define, with the addition of a set of "technical" columns (document_name, document_id, timestamp and operation), as shown below:

enter image description here

Renaud Tarnec
  • 79,263
  • 10
  • 95
  • 121
  • 1
    Thank you. I should have read the full doc first. I was able to make the fields of my own. With the script, I was able to generate the columns I was expecting. Now the problem is (as you already mentioned) is that it has duplicate entries with all the operation those were performed on the documents. I am willing to see only the latest view of the database. The script generates two views: `Users_schema_raw_changelog` and `Users_schema_raw_latest`. Both of the schemas are identical and logs changelog in each documents. How do I get the `Users_schema_raw_latest` to have only latest db state? – Sunil Chauhan Jul 01 '21 at 15:48