Unit testing a function that depends on database

Question

I am running tests on some functions. I have a function that uses database queries. So, I have gone through the blogs and docs that say we have to make an in memory or test database to use such functions. Below is my function,

def already_exists(story_data,c):
    # TODO(salmanhaseeb): Implement de-dupe functionality by checking if it already
    # exists in the DB.
    c.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
    (number_of_rows,)=c.fetchone()
    if number_of_rows > 0:
        return True
    return False

This function hits the production database. My question is that, when in testing, I create an in memory database and populate my values there, I will be querying that database (test DB). But I want to test my already_exists() function, after calling my already_exists function from test, my production db will be hit. How do I make my test DB hit while testing this function?

Write a unit test that uses a specific cursor/connection to your test database. — dfundako, Sep 12 '18 at 17:09
Sounds like bad design. First you should have a QA staging area to run these tests, and the QA environment should have a replica of the prod system. Your application should know what environment it is in and change the connection creds (or use a seperated network that uses the same IP and domain names) and test via this. Testing against production is a bad idea. — eatmeimadanish, Sep 12 '18 at 17:10
Hi, and welcome to StackOverflow. Presumably the problem is you want to test something else which calls `already_exists` and passes in its own database connection. It sounds like you have to make sure that all of your code is consistently using the same database connection. We can't help with that without more information about your setup. — Schwern, Sep 12 '18 at 17:16

jpmc26 · Answer 1 · 2018-09-16T19:12:16.950

There are two routes you can take to address this problem:

Make an integration test instead of a unit test and just use a copy of the real database.
Provide a fake to the method instead of actual connection object.

Which one you should do depends on what you're trying to achieve.

If you want to test that the query itself works, then you should use an integration test. Full stop. The only way to make sure the query as intended is to run it with test data already in a copy of the database. Running it against a different database technology (e.g., running against SQLite when your production database in PostgreSQL) will not ensure that it works in production. Needing a copy of the database means you will need some automated deployment process for it that can be easily invoked against a separate database. You should have such an automated process, anyway, as it helps ensure that your deployments across environments are consistent, allows you to test them prior to release, and "documents" the process of upgrading the database. Standard solutions to this are migration tools written in your programming language like albemic or tools to execute raw SQL like yoyo or Flyway. You would need to invoke the deployment and fill it with test data prior to running the test, then run the test and assert the output you expect to be returned.

If you want to test the code around the query and not the query itself, then you can use a fake for the connection object. The most common solution to this is a mock. Mocks provide stand ins that can be configured to accept the function calls and inputs and return some output in place of the real object. This would allow you to test that the logic of the method works correctly, assuming that the query returns the results you expect. For your method, such a test might look something like this:

from unittest.mock import Mock

...

def test_already_exists_returns_true_for_positive_count():
    mockConn = Mock(
        execute=Mock(),
        fetchone=Mock(return_value=(5,)),
    )
    story = Story(post_id=10) # Making some assumptions about what your object might look like.

    result = already_exists(story, mockConn)

    assert result

    # Possibly assert calls on the mock. Value of these asserts is debatable.
    mockConn.execute.assert_called("""SELECT COUNT(*) from posts where post_id = ?""", (story.post_id,))
    mockConn.fetchone.assert_called()

score -1 · Accepted Answer · answered Sep 12 '18 at 17:32

-1

The issue is ensuring that your code consistently uses the same database connection. Then you can set it once to whatever is appropriate for the current environment.

Rather than passing the database connection around from method to method, it might make more sense to make it a singleton.

def already_exists(story_data):
    # Here `connection` is a singleton which returns the database connection.
    connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (story_data.post_id,))
    (number_of_rows,) = connection.fetchone()
    if number_of_rows > 0:
        return True
    return False

Or make connection a method on each class and turn already_exists into a method. It should probably be a method regardless.

def already_exists(self):
    # Here the connection is associated with the object.
    self.connection.execute("""SELECT COUNT(*) from posts where post_id = ?""", (self.post_id,))
    (number_of_rows,) = self.connection.fetchone()
    if number_of_rows > 0:
        return True
    return False

But really you shouldn't be rolling this code yourself. Instead you should use an ORM such as SQLAlchemy which takes care of basic queries and connection management like this for you. It has a single connection, the "session".

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

from sqlalchemy_declarative import Address, Base, Person

engine = create_engine('sqlite:///sqlalchemy_example.db')
Base.metadata.bind = engine

DBSession = sessionmaker(bind=engine)
session = DBSession()

Then you use that to make queries. For example, it has an exists method.

session.query(Post.id).filter(q.exists()).scalar()

Using an ORM will greatly simplify your code. Here's a short tutorial for the basics, and a longer and more complete tutorial.

answered Sep 12 '18 at 17:32

Schwern

153,029
25
195
336

1

Make your connection a singleton is *nuts*; you cannot have separate transactions. Making it an instance variable is common, but is problematic as it makes it difficult to understand what queries are run against any particular transaction as the program executes (and how the database's state changes in any given invocation). Furthermore, ORMs are *not* simpler. They might make the code look simpler at first glance, but in reality they do so much behind the scenes that it's ultimately more complex; they also increase the difficulty of running the query outside of the application. – jpmc26 Sep 16 '18 at 18:26
@jpmc26 `connection` need not strictly be a singleton object. Typically it's a method to get a connection object, it can do whatever it likes behind the scenes; I'm not sure what you call that pattern. A good ORM reduces complexity by hiding SQL compatibility issues, managing schema migrations, hiding table and column names, managing relationships, allowing queries to be built upon (ie. adding an order by or limit) etc... It works out far, far better than manual SQL. ORM quality varies. I like Sequel. Or use a SQL builder like Knex.js. – Schwern Sep 16 '18 at 18:39
I call it a bad pattern, as you're never sure what connection you're dealing with in any given block of code. I've done it. It makes deciphering the logic much more difficult. I undid it (replacing it with passing the connection as a parameter around), and my code was immediately clearer. "hiding SQL compatibility issues" Complete and utter nonsense. Most apps only target a single DB, and DBs have plenty of features that are just not even supported by ORMs or other DBs. "managing schema migrations" You can do this without an ORM, and an ORM doesn't do this well since it has to be so general. – jpmc26 Sep 16 '18 at 19:08
"managing relationships" Done that. Found the ORM was silently loading a bunch of related data that wasn't even needed *one record at a time*. Absolutely horrible. The ORM shouldn't decide when I need related data or not or how to fetch it. "allowing queries to be built upon" ORMs are not just SQL builders, and you can have a SQL builder tool without an ORM when you need it. But the ORM adds a lot more than that, trying to shove a system that is not even close to object oriented into an object paradigm. The further they are from raw SQL, the more trouble they cause trying to do things for you. – jpmc26 Sep 16 '18 at 19:11
1

@jpmc26 ORMs are settled tech, this isn't the place to debate them. I'm glad you have the experience to handle all what they do yourself, and for your extra requirements, but please don't DV people for recommending an ORM to new developers who do not. – Schwern Sep 16 '18 at 19:29
As outlined above, that is not the only reason I downvoted. Your recommendation to move away from using a function parameter is even more harmful. Object Orient programming is "settled tech," too, and yet there is also a rising understanding that it fails to deliver on its promises; there is nothing wrong with pointing out the failure of something popular to deliver on its promises. The answer to not having expertise in an area is not to attempt to delegate those tasks to a tool or set of rules that is unfit for the job. It's to develop your expertise. – jpmc26 Sep 16 '18 at 20:40

Unit testing a function that depends on database

2 Answers2

Linked