6

I have a couple of SQL statements stored as files which get executed by a Python script. The database is hosted in Snowflake and I use Snowflake SQLAlchemy to connect to it.

How can I test those statements? I don't want to execute them, I just want to check if they could be executable.

One very basic thing to check if it is valid standard SQL. A better answer would be something that considers snowflake-specific stuff like

copy into s3://example from table ...

The best answer would be something that also checks permissions, e.g. for SELECT statements if the table is visible / readable.

Martin Thoma
  • 124,992
  • 159
  • 614
  • 958
  • 2
    Typical Python (and other language) _unit_ tests would just mock away the database and focus on the software. If you actually want your unit test to hit the database, then it's no longer a unit test, it's an integration test. If Python has an in memory database, you might be able to use that for a strict unit test. – Tim Biegeleisen Jul 12 '19 at 05:44
  • 1
    Ah, OK. So you suggest to use sqlite to create a fake db for testing locally? – Martin Thoma Jul 12 '19 at 05:45
  • An in memory database is one option. Other than this, there isn't much Python can do, not without doing something like writing a SQL parser etc. – Tim Biegeleisen Jul 12 '19 at 05:46
  • Well, what's required is a mock for testing the SQL, such that the statements can be tested. – MikeW Sep 15 '22 at 07:54

4 Answers4

1

An in-memory sqlite database is one option. But if you are executing raw SQL queries against snowflake in your code, your tests may fail if the same syntax isn't valid against sqlite. Recording your HTTP requests against a test snowflake database, and then replaying them for your unit tests suits this purpose better. There are two very good libraries that do this, check them out:

  1. vcrpy
  2. betamax
Anuj Kumar
  • 130
  • 1
  • 9
1

We do run integration tests on our Snowflake databases. We maintain clones of our production databases, for example, one of our production databases is called data_lake and we maintain a clone that gets cloned on a nightly basis called data_lake_test which is used to run our integration tests against.

Like Tim Biegeleisen mentioned, a "true" unittest would mock the response but our integration tests do run real Snowflake queries on our test cloned databases. There is the possibility that a test drastically alters the test database, but we run integration tests only during our CI/CD process so it is rare if there is ever a conflict between two tests.

Brock
  • 244
  • 2
  • 11
0

I very much like this idea, however I can suggest a work around, as I often have to check my syntax and need help there. What I would recommend, if you plan on using the Snowflake interface would be to make sure to use the LIMIT 10 or LIMIT 1 on the SELECT statements that you would be needing to validate.

Another tip I would recommend is talking to a Snowflake representative about a trial if you are just getting started. They will also have alot of tips for more specific queries you are seeking to validate.

And finally, based on some comments, make sure it uses SQL: ANSI and the live in the https://docs.snowflake.net/manuals/index.html for reference.

Rachel McGuigan
  • 511
  • 4
  • 22
-1

As far as the validity of the sql statement is a concern you can run explain of the statement and it should give you error if syntax is incorrect or if you do not have permission to access the object/database. That being there still some exceptions which you cannot run explain for like 'use' command which I do not think is needed for validation.

Hope this helps.