Auto-load the seed data from db/seeds.rb with rake

Question

I'm using rails-rspec gem and I have several specs (models, controllers, etc). When I run:

bundle exec rake

everything is tested. However, I would like to improve my specs by seeding some data (from db/seeds.rb) just after the database is created (in test environment).

My spec/spec_helper.rb file looks like this:

ENV["RAILS_ENV"] ||= 'test'

require File.expand_path("../../config/environment", __FILE__)
require 'rspec/rails'
require 'capybara/rspec'
require 'ruby-debug'

# Requires supporting ruby files with custom matchers and macros, etc,
# in spec/support/ and its subdirectories.
Dir[Rails.root.join("spec/support/**/*.rb")].each {|f| require f}

RSpec.configure do |config|
  config.mock_with :rspec

  # Remove this line if you're not using ActiveRecord or ActiveRecord fixtures
  config.fixture_path = "#{::Rails.root}/spec/fixtures"

  # If you're not using ActiveRecord, or you'd prefer not to run each of your
  # examples within a transaction, remove the following line or assign false
  # instead of true.
  config.use_transactional_fixtures = false

  config.include SpecHelper

  config.before(:suite) do
    DatabaseCleaner.strategy = :truncation
    DatabaseCleaner.clean_with(:truncation)
  end

  config.before(:each) do
    DatabaseCleaner.start
    stub_xmpp_rest_client!  
  end

  config.after(:each) do
    DatabaseCleaner.clean
  end

  config.include Devise::TestHelpers, :type => :controller
  config.include Delorean
  config.after(:each) { back_to_the_present }
  config.include Factory::Syntax::Methods
  config.extend ControllerMacros, :type => :controller
end

What could do the best way to do so? Thanks.

score 47 · Answer 1 · answered Dec 05 '11 at 14:53

47

Bad idea! Never, ever, seed your test database. Use factories to create, within each test, only the records necessary for that test to pass. Seeding the test database will make your tests less reliable, because you'll be making lots of assumptions that aren't explicitly stated in your tests.

answered Dec 05 '11 at 14:53

Marnen Laibow-Koser

5,959
1
28
33

2

I agree with you, @marnen-laibow-koser. However, as you know, seed purpose is to be the less data that are vital for the app. That's why I think it's useful in order to test scenarios. – Zag zag.. Dec 05 '11 at 15:06
1

No, no, no. Your tests should not need all your seed records. Just create the records that each test needs. For example, if you're seeding countries, no test will need more than one or two countries. If your test only needs France and Spain, don't seed Italy. Again, there is *never* a good reason to seed the test DB. Never do it. – Marnen Laibow-Koser Dec 05 '11 at 15:07
In other words, seed data is for the actual production application, not for your tests. – Marnen Laibow-Koser Dec 05 '11 at 15:08
I think you've sparked the "factory vs fixture" debate here. Be aware that @MarnenLaibow-Koser is voicing one of the two currently supported Opinions on this subject :) – Taryn East Dec 06 '11 at 21:28
This is not exactly that debate. Even diehard fixture fans don't load all the fixtures for every test. – Marnen Laibow-Koser Dec 06 '11 at 21:44
1

I don't get the attitude of not including seed data. It's a part of the application just as much as the code is, just that it happens to be stored in the database. It should simply be present for all tests. – David N. Welton Jan 31 '12 at 23:18
2

@DavidN.Welton You're quite wrong. Seed data is not part of the application, any more than any other data is. Production seed data should be present for *no* tests, because including it would go far beyond the minimal setup necessary for the tests to run. What *should* be present is minimal crafted data (generally no more than about 10 records) -- just enough for the test to pass. If you seed the DB for tests, you wind up with brittle, unclear, tightly coupled tests -- everything that tests aren't supposed to be. This isn't an "attitude"; it's a demonstrable fact. – Marnen Laibow-Koser Feb 01 '12 at 17:50
@DavidN.Welton Also, your claim that seed data is "part of the application" is irrelevant here. Especially when we're unit testing, we don't need or want to include the entire application in order to test one part of it -- that's why we design loosely coupled systems. – Marnen Laibow-Koser Feb 01 '12 at 17:52
84

Some of us are building systems coupled with... reality, where the coupling needs to be fairly strong. Mostly, I would not want or use seed data in my tests, but sometimes you do, and in those cases, it's nice to find an elegant way of doing it. My app, for instance, has some constant values stored in the DB rather than Ruby CONSTANTS, and they need to be involved in tests. – David N. Welton Feb 02 '12 at 20:25
@DavidN.Welton Amazingly enough, most of us deal with reality. You don't have a monopoly on that. :) Application constants are probably better stored in config files than in the DB (so I see your scenario as a bit of a design problem), but even if you have to store them in the DB, I would still advise not seeding them for your tests. The one exception would be if you're testing that the constants are set correctly by your production setup script or whatever. – Marnen Laibow-Koser Feb 02 '12 at 21:18
@DavidN.Welton Also, I get the feeling you may not know the software-engineering sense of the word "coupling" and why it's usually a bad thing. – Marnen Laibow-Koser Feb 02 '12 at 21:20
My reason for the down vote is simple: it is not an answer. It is a comment. Perhaps it should have been added to the OPed question. – IAmNaN Aug 07 '12 at 19:48
@IAmNaN When the OP asks how to do something inadvisable, I think saying "you don't want to do that, and here's why; furthermore, here's a better way of accomplishing the same thing" is a valid answer, don't you? – Marnen Laibow-Koser Aug 07 '12 at 22:00
I agree with your argument in principle, but never say never. There are situations where having the necessary seed data in place is still the best way to do it. An alternative might be to use fixtures, but there's no sense reloading these for each test case if they truly are static seed data. – Andrew Vit Aug 08 '12 at 08:07
1

@AndrewVit This is a "never". There is no possible testing situation where seed data makes sense. If you disagree, please provide a concrete example. IMHO it's always better to generate your seed records in every test case so your assumptions are explicit. – Marnen Laibow-Koser Aug 08 '12 at 20:56
1

@MarnenLaibow-Koser - We have a db that was indeed designed badly. There are many foreign key constraints on data that is assumed to be in the db (can't create Widget before WidgetType is populated, along with User and UserType and UserRole and UserRoleType and...) The app is huge with a *lot* of intrlocking, required data even just to get up off the ground. There are also zero tests to cover it. I'm not about to refactor the app to change the design until I have tests in place... therefore I need a way to seed *only the required minimum data* for each test. This is the "reality" spoken of. – Taryn East Aug 15 '12 at 03:59
@TarynEast "therefore I need a way to seed only the required minimum data for each test."—Exactly! Of course you do! And the best way to do that is generally with factories. They're tailor-made for exactly the sort of deeply associated objects you're speaking of. Just have your Widget factory call your WidgetType factory (`association :widget_type`). Then you don't need seed data: you can just do `Factory :widget` in the test itself, and it will invoke `Factory :widget_type` and so on behind the scenes. Simple. No seeds necessary. I do this all the time. – Marnen Laibow-Koser Aug 15 '12 at 19:51
@MarnenLaibow-Koser - hundreds of factories... or a single db/seeds file - it's a matter of tucking away a huge blob of code that is not needed *except* to setup the base necessary data. We then only have factories for the data that is required *on top* of the base and have a much smaller set of factories to wade through. Given we don't actually *use* db/seeds to seed our databases... I think it's a simple solution to put all the necessary stuff in one place. - out of the way. Simple, no factories necessary ;) – Taryn East Aug 16 '12 at 03:05
2

@TarynEast Then I think you're doing it wrong, probably because you don't really understand the disadvantages of your approach. With a db/seeds file, you're sweeping the seeds under the rug, and thus making your assumptions implicit. In testing, that's *bad*. Every test should make its assumptions explicit—and the way to do that is by starting from zero in every test, and creating *only* the records you need *for that test*. The factory definitions will generally be no more complex than the seeds file, and may be less complex. (Also, you yourself point out that you're abusing db/seeds.) – Marnen Laibow-Koser Aug 16 '12 at 15:36
@TarynEast The idea that you have factories for data "on top of the base" points to what's wrong with your approach. You're thinking of the base and the factory data as two different things, which leads to poor testing (I'm working with a system like that right now—it's hard to write reliable tests). This also means your tests aren't minimal. Instead, recognize that it's all just data, and that you must make your assumptions explicit in each test so you know exactly what data you're testing. – Marnen Laibow-Koser Aug 16 '12 at 15:38
@MarnenLaibow-Koser - I do indeed want to hide this stuff under the covers for the time being. I'm doing it *on purpose*, not accidentally. The intent being that I put all these assumptions in one place, so I know which assumptions underlie the existing architecture. In the test I'll put the actual preferred behaviour. When I tease out the underlying behaviour that I *really want* to happen - I'll move it out of the seed-base into the tests, gradually reducing to nothing the set of underlying assumptions that *shouldn't be there in the first place* but are due to poor architecture design. ok? – Taryn East Aug 19 '12 at 23:42
1

@TarynEast "I'm doing it on purpose, not accidentally." Then I think you're making a colossal mistake. "The intent being that I put all these assumptions in one place, so I know which assumptions underlie the existing architecture." Noble idea, but it never seems to work that way in practice. Rather, by moving the assumptions farther from the test code, you're making it *harder*, not easier, to know what assumptions underlie what. "In the test I'll put the actual preferred behaviour." Yes, do—and put all the assumptions right in there with it. – Marnen Laibow-Koser Aug 20 '12 at 20:31
@TarynEast "When I tease out the underlying behaviour that I really want to happen - I'll move it out of the seed-base into the tests" Then you don't know which test depends on which assumption, and you won't be able to get rid of the underlying assumptions as easily. The reason what you're doing is bad is that it obscures the link between tests and particular assumptions. Unless your case is vanishingly unusual, not every test depends on every assumption. Therefore, not every test needs the whole set of seed data. Just load the few records that a particular test needs. Much clearer. – Marnen Laibow-Koser Aug 20 '12 at 20:34
@MarnenLaibow-Koser So far almost every test *does* rely on all the "seed" data that I'm talking about. Except for a few vanishingly rare models that don't rely on having a party set up in the db (eg some of the "_type" models that really should be a hash of data in a constant somewhere rather than a table in the db). Every other model requires a party... and every party requires *at least* 15 tables populated with initial data (yes really, no hyperbole here) just to run the test. You really aren't getting "this is an awful pile of code that needs redesigning" very well yet ;) – Taryn East Aug 20 '12 at 23:14
@TarynEast 'So far almost every test does rely on all the "seed" data that I'm talking about.'—Then that *is* a different case. But I wonder, because later you say: "Every other model requires a party... and every party requires at least 15 tables populated with initial data". This situation is not uncommon. I have a similar situation. Set up associated factories, and do `Factory :party` as appropriate. It's more intention-revealing and explicit than your seeds. 'You really aren't getting "this is an awful pile of code that needs redesigning" very well yet'—I get that. That's not the issue. :) – Marnen Laibow-Koser Aug 20 '12 at 23:17
@TarynEast You know, `Factory :party { group }`, `Factory :group { group_type; country }`, `Factory :country { continent; iso_code 'XX' }` and so on. – Marnen Laibow-Koser Aug 20 '12 at 23:18
If it was as simple as that, it'd be ok... but it's not. Your example is almost linear - 1 model relates to 1 other model and so on... our situation is a complex digraph. A party relies on about fifteen database tables (not fifteen instances of objects). and not just models that are associations of this specific party object... some of the logic is circular - to set up a party that can login, you need to set up a company, a role and a "role_user" - to set up a company, you need *another* party and a role and a "role_company", role_types party_types country state gender... – Taryn East Aug 21 '12 at 00:29
and all have foreign key refs and "not null" columns in the *db* which means that you simply can't set up a few of them as real objects and some as mocks... – Taryn East Aug 21 '12 at 00:30
@TarynEast "Your example is almost linear - 1 model relates to 1 other model and so on... our situation is a complex digraph." So is mine, really. I was trying to stay concise. 'A party relies on about fifteen database tables…to set up a party that can login, you need to set up a company, a role and a "role_user"'—None of this is any problem for the complex factory setup that I have successfully used in cases like this. For circularity, just use FactoryGirl callbacks. 'and all have foreign key refs and "not null" columns in the db'—As well they should. Again, not an obstacle for factories. – Marnen Laibow-Koser Aug 21 '12 at 02:31
@TarynEast "you simply can't set up a few of them as real objects and some as mocks"—In theory, sure you could—you can always mock. But you normally shouldn't. I'm beginning to think you simply don't know how to get the most out of factories (hint: they're great at creating complex graphs), and so you've made a very bad testing decision as a result. – Marnen Laibow-Koser Aug 21 '12 at 02:33
Of course it's *possible* to write factories that way, and even mocks. I've been trying to explain my reasoning for why I have chosen not to do it. Clearly I have failed to explain it in a convincing manner. However - I stand by my choice as being the correct one for me, and I'm not happy with the way you've chosen to express your disagreement. As such I have basically had enough of this discussion, thanks. – Taryn East Aug 21 '12 at 04:38
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15593/discussion-between-taryn-east-and-marnen-laibow-koser) – Taryn East Aug 21 '12 at 04:38
@TarynEast Perhaps I too have failed to be as convincing as I'd like to be. The thing is that while it's very tempting to seed the test DB, I firmly believe that it's usually the wrong thing to do -- and from the way you've explained your case, I think that your case is *probably* one of the many cases in which it's the wrong thing. While seeding is seductive for its apparent simplicity, it basically means that it's impossible to tell what test depends on which data, which is not a good thing. This is why Rails fixtures are bad, and this is also why IMHO test DB seeding is bad. – Marnen Laibow-Koser Aug 22 '12 at 20:06
The root issue here is that people have different ideas about what "seed" data is. Some people think of seeds as sort of permanent fixtures, and I'm with MLK on that usage, for the reasons given. But some data may properly belong to system of which the database is a part, and must be present for the system as a whole to work correctly. An example might be a table of roles in a role-based authentication system. "Seeds" could a good place to load such data. – Jan Hettich Sep 24 '12 at 15:52
@JanHettich Perhaps I'm awfully dense, but I don't think that's the root issue. I think you've just described the same type of seed data in 2 different ways. Of course seed data always properly belongs to the production system. That's kind of irrelevant to the question of testing approaches. – Marnen Laibow-Koser Sep 24 '12 at 17:56
1

I generally prefer to keep seed data to a minimum, but for some things it's warranted. My criteria is: 1. It is fundamental to the app. 2. It does not change (unless global assumptions change). 3. You frequently need the presence of the seed data to test other model relations. For one, I prefer to test without database persistence when it's not needed, but for testing relational queries you can't just mock that. If there's a static table of facts that my relations need to join against, I'd rather have it dormant in the database than have to reload it for each test example using a factory. – Andrew Vit Sep 28 '12 at 00:23
@AndrewVit "If there's a static table of facts that my relations need to join against, I'd rather have it dormant in the database than have to reload it for each test example using a factory."—I don't think either of these alternatives is a good idea. Instead, just create *the 1 or 2 records you need* for each test example, which won't be the whole table. Having it dormant in the DB means you've got a bunch of unnecessary data confusing your tests. Put another way, nothing should be "dormant" in a test; everything should be there because it needs to be there *for that test*. – Marnen Laibow-Koser Sep 28 '12 at 18:28
@MarnenLaibow-Koser I generally agree and I prefer the same as you: it helps to see explicitly what goes into a test. It's not ideal for 100% of situations though; it becomes a question of pragmatism over "pure" isolation. Reloading something that *always* remains exactly the same in *every* instance is silly: both in database & mental context. When I said I "have to reload it for each test", yes that means just the 1 or 2 extra records, but it's distracting noise for an axiom that never changes. – Andrew Vit Sep 29 '12 at 00:45
For this, I disagree with your "never ever" answer. It's not about making "lots" of assumptions, but only the few very base ones. – Andrew Vit Sep 29 '12 at 00:52
@AndrewVit "Reloading something that always remains exactly the same in every instance is silly"—Agreed. That's not at all what I'm talking about. It sounds like your testing approach creates too many records. If you have a table of countries, one test may only need Italy, another may only need France, and a third may need Ruritania, which doesn't actually exist but is convenient for testing purposes! • "it's distracting noise for an axiom that never changes"—If it never changes, why do you need so many tests? For different cases, right? Don't the different cases need different records set up? – Marnen Laibow-Koser Sep 29 '12 at 01:19
@AndrewVit "It's not about making "lots" of assumptions, but only the few very base ones"—Every record in your test DB before a test begins is an assumption (or more than one). You should generally make as few assumptions as possible in testing, which means you should generally make as few records as possible. Usually the best way of doing that is to tailor only the few you need for each test. – Marnen Laibow-Koser Sep 29 '12 at 01:22
@AndrewVit "I disagree with your "never ever" answer"—Why? Any testing situation involving seed data can be logically reduced to one that does not. Every time I have seen that reduction made, it has resulted in clearer tests. I've been trying quite hard to think of a situation in which test seeds could possibly be beneficial, and I really can't; OTOH, I've seen them cause problems many, many times. If I am right on this, then the conclusion is quite clear. – Marnen Laibow-Koser Sep 29 '12 at 01:24
1

Wow. Thanks Marnen. You just changed my life. – Trip Aug 18 '13 at 01:28
I think that what I don't get about this is: why would static configuration data that is stored in the database be treated differently to static configuration information stored in yaml files? I've not heard that yaml files should only be populated with sufficient data to pass tests, and it conceptually seems the same to me. I personally need the config info in the database because I need access to it from SQL. – David Aldridge Mar 11 '18 at 19:34
@DavidAldridge: You bring up an interesting question. I normally don't use Yaml config files for any but the simplest things (e.g. DB connection info) in my testing environment, so this is not something I've run into much. But I think I'd say that the principle here is the same: your test setup should always be as minimal as possible, and so I would recommend against big application config files in your test environment. As with the DB, my inclination would to let each test case set only those config values that are necessary for it to pass. – Marnen Laibow-Koser Mar 14 '18 at 20:28
@MarnenLaibow-Koser It;s an interesting discussion. For me the difficulty of anticipating side-effects in a complex system means that I would want a test system as similar as possible to production configuration for the purpose of systemic testing, as additional data/configuration items can cause different logical paths to be followed. For unit testing, maybe a different approach would be appropriate but as I need that full configuration for integration tests I wouldn't set up a limited environment for unit testing. – David Aldridge Mar 16 '18 at 10:42
@DavidAldridge "For me the difficulty of anticipating side-effects in a complex system means that I would want a test system as similar as possible to production configuration" Yes, but only for your integration tests. • "as I need that full configuration for integration tests I wouldn't set up a limited environment for unit testing." But that's exactly what you must do. A unit test should only test *that unit*. If it relies on side-effects from other config data, then you're testing your configuration; it's no longer a unit test. That kind of testing has a place, but not in unit tests. – Marnen Laibow-Koser Mar 29 '18 at 00:15
@MarnenLaibow-Koser Could you modify your answer to be more clear that you're only objecting to this in the context of unit tests, not integration tests? – David Aldridge Mar 29 '18 at 11:59
@DavidAldridge No, because I am also objecting to it in the context of integration tests. If you load the full seed or config data, you should only do it in a limited set of integration tests whose specific purpose is *testing that data*. You should not do it in all integration tests. – Marnen Laibow-Koser Mar 29 '18 at 16:10
@MarnenLaibow-Koser We'll just have to disagree then, because what I want from system/integration testing is an environment absolutely as close as possible to the production system's configuration. I think that your approach might be manageable for simple systems, but creates a high management cost, level of complexity, and risk for anything more. – David Aldridge Mar 30 '18 at 10:41
@DavidAldridge You know, you've got me thinking about this, and I think I may have to modify my answers to you about config data. The more I think about this, the more I think that there *is* a fundamental difference between Yaml files/environment variables on the one hand and DB records on the other. The difference is *perhaps* that the Yaml/env are put there before the application is ever deployed, so you can treat them as assumptions just like the OS. DB seeds, OTOH, are generally in the domain of the application, so your tests shouldn't run with them there. Maybe? – Marnen Laibow-Koser Mar 31 '18 at 16:20

score 29 · Accepted Answer · edited Dec 03 '13 at 14:45

29

Depending on how your seed file is configured, you might just be able to load/run it from a before(:each) or before(:all) block:

load Rails.root + "db/seeds.rb"

edited Dec 03 '13 at 14:45

Jordan Running

102,619
17
182
182

answered Dec 05 '11 at 14:25

Taryn East

27,486
9
86
108

1

Today the same line can be written like this Rails.application.load_seed – Montells May 27 '21 at 15:10

score 12 · Answer 3 · answered Sep 28 '12 at 01:01

I set up my rake spec task to automatically load db/seeds.rb, so I depend on that for setting up the database for a first run. Add this to your Rakefile:

task :spec     => "db:seed"
task :cucumber => "db:seed"

Then, on subsequent runs I just call rspec spec directly to skip the seed step and avoid unnecessary reloading. I configure database cleaner to ignore the seed tables like this:

RSpec.configure do |config|

  config.add_setting(:seed_tables)
  config.seed_tables = %w(countries roles product_types)

  config.before(:suite) do
    DatabaseCleaner.clean_with(:truncation, except: config.seed_tables)
  end

  config.around(:each) do |example|
    if example.metadata[:no_transactions]
      DatabaseCleaner.strategy = :truncation, {except: config.seed_tables}
    else
      DatabaseCleaner.strategy = :transaction
    end
    DatabaseCleaner.start
    example.run
    DatabaseCleaner.clean
  end
end

For scenarios that need committed data, I can add:

describe "commit for real", use_transactions: false do
  # ...
end

This will truncate everything after the example runs, except the seed tables. It's assumed that you never write anything to the seed tables! This is generally a safe assumption, since seed data is typically static.

For all other normal scenarios, I depend on transactions to roll back any inserted records. The database is returned to the original state, with the data in the seed tables intact. It's safe to write to the seed tables here if you need to.

To rebuild the seed tables, you just need to run rake spec again.

We are managing a test suite riddled with fixtures and assumptions, and moving it over to a more factory-based set up. Moving our seed data out of fixtures and into this one time set up strategy was a key part of that - thanks for your answer, it helped DRY up some of our implementation — Phantomwhale, Mar 19 '14 at 04:55
It worked for me (Rails 4 and a lot of stuff to load on seeds). Thanks. You are my hero today. — Juanin, Feb 11 '16 at 18:55

score 7 · Answer 4 · answered Nov 07 '13 at 07:15

7

To load seeds in rspec you need to add it after database cleanup in confg.before(:suite) in spec_helper

config.before(:suite) do
  DatabaseCleaner.clean_with(:truncation)
  load "#{Rails.root}/db/seeds.rb" 
end

answered Nov 07 '13 at 07:15

Ahmad Hussain

2,443
20
27

score 7 · Answer 5 · answered Mar 03 '15 at 17:18

In Rails 4.2.0 and RSpec 3.x, this is how my rails_helper.rb looks.

RSpec.configure do |config|
  config.include FactoryGirl::Syntax::Methods
  # Remove this line if you're not using ActiveRecord or ActiveRecord fixtures
  config.fixture_path = "#{::Rails.root}/spec/fixtures"

  # If you're not using ActiveRecord, or you'd prefer not to run each of your
  # examples within a transaction, remove the following line or assign false
  # instead of true.
  config.use_transactional_fixtures = false

  config.before(:suite) do
    DatabaseCleaner.clean_with(:truncation)
  end

  config.before(:each) do
    DatabaseCleaner.strategy = :transaction
  end

  config.before(:each, :js => true) do
    DatabaseCleaner.strategy = :truncation
  end

  config.before(:each) do
    DatabaseCleaner.start
  end

  config.after(:each) do
    DatabaseCleaner.clean
  end

  config.before(:all) do
    Rails.application.load_seed # loading seeds
  end
end

vidur punj · Answer 6 · 2017-12-05T14:37:03.267

0

copy the seed.rb file inside the config/initializers folder.So seed.rb file will be executed on server start.
Run the below command to fill the test db with the seed.rb data

RAILS_ENV=test rake db:seed

edited Dec 05 '17 at 14:37

answered Dec 05 '17 at 14:30

vidur punj

5,019
4
46
65

score -1 · Answer 7 · edited May 23 '17 at 10:30

-1

I think we should use

config.before(:each) do
  Rails.application.load_seed # loading seeds
end

as before(:all) runs the block one time before all of the examples are run.

So if we use before :all, the seed data will be cleared.

edited May 23 '17 at 10:30

Community

1
1

answered Sep 13 '16 at 07:19

Yang

389
2
15

Auto-load the seed data from db/seeds.rb with rake

7 Answers7

Linked