3

We use Travis CI to maintain our project on git. The issue here is on Travis we have 2 processes running a random selection of specs each with different seed numbers, now when there is a failure, I try to run:

  1. the exact spec with the seed number
  2. the exact spec without a seed number
  3. the spec file with a seed number
  4. the spec file with a seed number and --bisect
  5. the spec file without a seed number but with --bisect

In the above 5 scenarios whether locally or even on a ssh when debugging the travis build, I find no failures and bisect always fails.

Also in a completely different scenario if I run parallel:spec locally with the default 8 processes, I do get failures but if I run each alone with the 'rspec' cmd, gives no failures.

I've also tried locally to run parallel:spec whilst having the --bisect option in the .parallel-spec file in the root of our application. the minimal reproduction commands I get still give no failures.

What am I missing here? does this issue have to do with running multiple processes and having to run the minimal reproduction lines with rspec? becuase currently it seems to me that if specs are run on more than 1 process I'm never able to reproduce the failing specs. On the other hand if locally i run rspec --bisect after 8 hours I find it has not started 1 process even and I'm on a macbook pro (but yeah we have around 4k specs)

p.s. we're on rails 4.2.7.1, ruby 2.3.3 and rspec 3.4.4

Thanks

Update: ran parallel spec verbose to obtain the specs order and then ran the process command in which a spec fails with the seed number then another time with the seed number and --bisect. still no failures.

Shalaby
  • 87
  • 1
  • 10

2 Answers2

1

Among those 5 things you try I can't see Try to run all the specs from the process with the seed

It is possible that your specs interfere with each other and if spec A is run before spec B, it will cause B to fail... Or even if A is run before B it can cause C to fail.

So if you run all the specs from one process with the seed - maybe you'll reproduce the fail - only then you run same thing with --bisect to find the smallest set that gives you the fail.

If you can not reproduce it that way - I can see another option: your parallel specs are using shared resource (DB, files?) and the fails are caused by a race conditions. Those are hard to find - especially among specs. Make sure each process actually uses a separate DB (simple mistakes, like forgetting to change database.yml can cause that).

If that doesn't help - inspect your code for other possible shared resources. You didn't mention how many specs usually fail. If it's a small number - you can focus on those.

Greg
  • 5,862
  • 1
  • 25
  • 52
  • Thanks. I actually only yesterday tried running the specs from the process with the seed and also another time with the seed and bisect but still no failures, I updated that part in my question just now. Regarding the failures they are random unfortunately each time i run them in parallel something new fails and I actually initially addressed most of these random failures individually, everything seems to fall apart when I'm running specs in parallel only that leads me to a conclusion that this might be because of the dryness within the spec files. – Shalaby Oct 04 '17 at 07:55
  • I have investigated the db each process sees a separate DB as expected. I think in terms of parallel processes, each spec need to be isolated with its own objects and not have the objects declared globally and used by all specs. I will be having to refactor probably all of the specs but I got here from Bakir's article on parallel execution: https://www.atlantbh.com/blog/parallel-test-execution/ now I'm not exactly sharing files between the specs but I am sharing objects. I'll have to test this theory and start refactoring I guess – Shalaby Oct 04 '17 at 07:55
  • How are you sharing objects between processes? – Greg Oct 04 '17 at 10:07
  • Im letting with factory girl i.e. let(:event) { create(:event) } then I'm using event in all specs within the file. what happens is if each process is accessing a different test case within the same context 1 of them has to fail and that applies on most files where I'm letting an object for all specs – Shalaby Oct 05 '17 at 10:11
  • 1
    I thought this is impossible, or I'm missing something. When two processes load the same _spec.rb file, and one is running case1 and case2 - they do not share the objects created by `let`. Maybe we're digging in the wrong place? You can check that - create two specs with `binding.pry` and run them in two terminals - changes in one process should not influence stuff in the other object. Unless when you say 'process' you actually meant 'thread'? [running to google about processes and threads again in case I'm confusing something] – Greg Oct 05 '17 at 15:15
  • I've tried your suggestion, yes both specs pass, I might very well be digging in the wrong place, maybe I need to optimize the specs and configurations first in terms of sort out time sensitive test cases as well as moving refreshing elasticsearch indices to hooks in spec_helper so I can maybe get a clearer idea of what is going on exactly, general cleaning up I guess might help if not solve – Shalaby Oct 06 '17 at 18:21
  • For time sensitive test cases you might consider Timecop. Regarding the cleanup - I can only say: good luck. – Greg Oct 06 '17 at 18:29
  • yes I have actually used Timecop, was of great help. Thanks for helping :) – Shalaby Oct 06 '17 at 20:57
1

You are likely dealing with a race condition between two tests that are accessing the same shared global data store in CI, maybe memcached, Redis, or some other shared data.

Here's a question I previously answered on hunting down flakiness.

aridlehoover
  • 3,139
  • 1
  • 26
  • 24