22

How to manage schema migrations for Google BigQuery, we have used Liquibase and Flyway in the past. What kind of tools can we use to manage schema modifications and the like (e.g. adding a new column) across dev/staging environments.

Rpj
  • 5,348
  • 16
  • 62
  • 122
  • This library performs schema migrations: https://github.com/medjed/bigquery_migration. Also see the docs on how to manually alter a schema in BigQuery: https://cloud.google.com/bigquery/docs/managing-table-schemas – Victor M Perez Mar 12 '18 at 11:22
  • Have you found a tool that suits your need ? – Muldec Feb 19 '20 at 10:31

5 Answers5

4

Found open source framework for BigQuery schema migration

https://github.com/medjed/bigquery_migration

One more solution

https://robertsahlin.com/automatic-builds-and-version-control-of-your-bigquery-views/

PS

In flyway someone opened the ticket to support BigQuery.

SANN3
  • 9,459
  • 6
  • 61
  • 97
4

Flyway, a very popular database migration tool, now offers support for BigQuery as a beta, while pending certification.

You can get access to the beta version here: https://flywaydb.org/documentation/database/big-query after answering a short survey.

I've tested it from the command line and it works great! Took me about an hour to get familiar with Flyway's configuration, and now calling it with a yarn command.

Here's an example for a NodeJS project with the following files structure:

package.json
fireway/
    <SERVICE_ACCOUNT_JSON_FILE>
    flyway.conf
    migrations/
        V1_<YOUR_MIGRATION>.sql

package.json

{
  ...
  "scripts": {
    ...
    "migrate": "flyway -configFiles=flyway/flyway.conf migrate"
  },
  ...
}

and flyway.conf:

flyway.url=jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;ProjectId=<YOUR_PROJECT_ID>;OAuthType=0;OAuthServiceAcctEmail=<SERVICE_ACCOUNT_NAME>;OAuthPvtKeyPath=flyway/<SERVICE_ACCOUNT_JSON_FILE>;

flyway.schemas=<YOUR_DATASET_NAME>
flyway.user=
flyway.password=

flyway.locations=filesystem:./flyway/migrations
flyway.baselineOnMigrate=true

Then you can just call yarn migrate any time you have new migrations to apply.

1

I created an adapter for Sequel, sequel-bigquery, so we could manage our BigQuery database schema as a set of Ruby migration files which use Sequel's DSL - the same way we do for our PostgreSQL database.

Example

# migrations/bigquery/001_create_people_table.rb

Sequel.migration do
  change do
    create_table(:people) do
      String :name, null: false
      Integer :age, null: false
      TrueClass :is_developer, null: false
      DateTime :last_skied_at
      Date :date_of_birth, null: false
      BigDecimal :height_m
      Float :distance_from_sun_million_km
    end
  end
end
require 'sequel-bigquery'
require 'logger'

db = Sequel.connect(
  adapter: :bigquery,
  project: 'your-gcp-project',
  database: 'your_bigquery_dataset_name',
  location: 'australia-southeast2',
  logger: Logger.new(STDOUT),
)

Sequel.extension(:migration)

Sequel::Migrator.run(db, 'migrations/bigquery')
ZimbiX
  • 485
  • 3
  • 7
0

There is a BigQuery extension for Liquibase now. https://github.com/liquibase/liquibase-bigquery/

Gabriel Deal
  • 935
  • 1
  • 8
  • 19
-3

According to the BQ docs, you can add a row to the schema without any additional process.

For more complex transformations, if it can be resolved in a SQL query, you can just run that query setting the destination table as the source table (although I would suggest creating a backup of the table in case something goes wrong).

Example

Let's say I have a table with a column that is a integer (column d), but at the insertion time it was written as a string. I can modify the table by setting itself as a destination table and running a query like:

SELECT
  a,
  b,
  c,
  CAST(d AS INT64) AS d,
  e,
  f
FROM
  `example.dataset.table`

This is an example for changing the schema, but this can be applied as long as you can get the result with a BQ query.

Jofre
  • 3,718
  • 1
  • 23
  • 31
  • 2
    Thanks Jofre. But my question was more towards how to manage schema migrations in a certain environment and across environments. Since a table could go through multiple changes and we would like a framework (say liquibase or flyway) to manage our schema migrations so that there is a flexibility to either deploy or rollback your changes on a need basis. I was wondering what tools are available for this product. – Rpj Mar 06 '18 at 14:06
  • I'm guessing that using the [Simba driver](https://cloud.google.com/bigquery/partners/simba-drivers/) you could use any tool that can connect to a SQL DB, but never tried it, so no idea how that would work out. – Jofre Mar 06 '18 at 14:48
  • 1
    I don't the above driver would help as well, liquibase and flyway provide a mechanism to manage migrations – Rpj Mar 08 '18 at 19:13