0

I have a Sidekiq worker that reaches out to an external API to get some data back. I am trying to write tests to make sure that this worker is designed and functioning correctly. The worker grabs a local model instance and examines two fields on the model. If one of the fields is nil, it will send the other field to the remote API.

Here's the worker code:

class TokenizeAndVectorizeWorker
  include Sidekiq::Worker
  sidekiq_options queue: 'tokenizer_vectorizer', retry: true, backtrace: true

  def perform(article_id)
    article = Article.find(article_id)
    tokenizer_url = ENV['TOKENIZER_URL']

    if article.content.nil?
      send_content = article.abstract
    else
      send_content = article.content
    end

    # configure Faraday
    conn = Faraday.new(tokenizer_url) do |c|
      c.use Faraday::Response::RaiseError
      c.headers['Content-Type'] = 'application/x-www-form-urlencoded'
    end

    # get the response from the tokenizer
    resp = conn.post '/tokenize', "content=#{URI.encode(send_content)}"

    # the response's body contains the JSON for the tokenized and vectorized article content
    article.token_vector = resp.body

    article.save
  end
end

I want to write a test to ensure that if the article content is nil that the article abstract is what is sent to be encoded.

My assumption is that the "right" way to do this would be to mock responses with Faraday such that I expect a specific response to a specific input. By creating an article with nil content and an abstract x I can mock a response to sending x to the remote API, and mock a response to sending nil to the remote API. I can also create an article with x as the abstract and z as the content and mock responses for z.

I have written a test that generically mocks Faraday:

    it "should fetch the token vector on ingest" do
      # don't wait for async sidekiq job
      Sidekiq::Testing.inline!

      # stub Faraday to return something without making a real request
      allow_any_instance_of(Faraday::Connection).to receive(:post).and_return(
        double('response', status: 200, body: "some data")
      )

      # create an attrs to hand to ingest
      attrs = {
        data_source: @data_source,
        title: Faker::Book.title,
        url: Faker::Internet.url,
        content: Faker::Lorem.paragraphs(number: 5).join("<br>"),
        abstract: Faker::Book.genre,
        published_on: DateTime.now,
        created_at: DateTime.now
      }

      # ingest an article from the attrs
      status = Article.ingest(attrs)

      # the ingest occurs roughly simultaneously to the submission to the
      # worker so we need to re-fetch the article by the id because at that
      # point it will have gotten the vector saved to the DB
      @token_vector_article = Article.find(status[1].id)

      # we should've saved "some data" as the token_vector
      expect(@token_vector_article.token_vector).not_to eq(nil)
      expect(@token_vector_article.token_vector).to eq("some data")
    end

But this mocks 100% of uses of Faraday with :post. In my particular case, I have no earthly idea how to mock a response of :post with a specific body...

It's also possible that I'm going about testing this all wrong. I could be instead testing that we are sending the right content (the test should check what is being sent with Faraday) and completely ignoring the right response.

What is the correct way to test that this worker does the right thing (sends content, or sends abstract if content is nil)? Is it to test what's being sent, or test what we are getting back as a reflection of what's being sent?

If I should be testing what's coming back as a reflection of what's being sent, how do I mock different responses from Faraday depending on the value of something being sent to it/

** note added later **

I did some more digging and thought, OK, let me test that I'm sending the request I expect, and that I'm processing the response correctly. So, I tried to use webmock.

    it "should fetch token vector for article content when content is not nil" do
      require 'webmock/rspec'
      # don't wait for async sidekiq job
      Sidekiq::Testing.inline!

      request_url = "#{ENV['TOKENIZER_URL']}/tokenize"

      # webmock the expected request and response
      stub = stub_request(:post, request_url)
             .with(body: 'content=y')
             .to_return(body: 'y')

      # create an attrs to hand to ingest
      attrs = {
        data_source: @data_source,
        title: Faker::Book.title,
        url: Faker::Internet.url,
        content: "y",
        abstract: Faker::Book.genre,
        published_on: DateTime.now,
        created_at: DateTime.now
      }

      # ingest an article from the attrs
      status = Article.ingest(attrs)

      # the ingest occurs roughly simultaneously to the submission to the
      # worker so we need to re-fetch the article by the id because at that
      # point it will have gotten the vector saved to the DB
      @token_vector_article = Article.find(status[1].id)

      # we should have sent a request with content=y
      expect(stub).to have_been_requested

      # we should've saved "y" as the token_vector
      expect(@token_vector_article.token_vector).not_to eq(nil)
      expect(@token_vector_article.token_vector).to eq("y")
    end

But I think that webmock isn't getting picked up inside the sidekiq job, because I get this:

1) Article tokenization and vectorization should fetch token vector for article content when content is not nil
     Failure/Error: expect(stub).to have_been_requested

       The request POST https://zzzzz/tokenize with body "content=y" was expected to execute 1 time but it executed 0 times

       The following requests were made:

       No requests were made.
       ============================================================

If I try to include webmock/rspec in any of the other places, for example, at the beginning of my file, random things start to explode. For example, if I have these lines in the beginning of this spec file:

require 'spec_helper'
require 'rails_helper'
require 'sidekiq/testing'
require 'webmock/rspec'

Then I get:

root@c18df30d6d22:/usr/src/app# bundle exec rspec spec/models/article_spec.rb:174
database: test
Run options: include {:locations=>{"./spec/models/article_spec.rb"=>[174]}}
There was an error creating the elasticsearch index for Article: #<NameError: uninitialized constant Faraday::Error::ConnectionFailed>
There was an error removing the elasticsearch index for Article: #<NameError: uninitialized constant Faraday::Error::ConnectionFailed>

Which I am guessing is because the test suite is trying to initialize stuff, but webmock is interfering...

Erik Jacobs
  • 841
  • 3
  • 7
  • 19

1 Answers1

1

I ended up abandoning Faraday and a more complicated test as an approach. I decomposed the worker into both a Service class and a worker. The worker simply invokes the Service class. This allows me to test the service class directly, and then just validate that the worker calls the service class correctly, and that the model calls the worker correctly.

Here's the much simpler service class:

require 'excon'

# this class is used to call out to the tokenizer service to retrieve
# a tokenized and vectorized JSON to store in an article model instance
class TokenizerVectorizerService
  def self.tokenize(content)
    tokenizer_url = ENV['TOKENIZER_URL']

    response = Excon.post("#{tokenizer_url}/tokenize",
               body: URI.encode_www_form(content: content),
               headers: { 'Content-Type' => 'application/x-www-form-urlencoded' },
               expects: [200])

    # the response's body contains the JSON for the tokenized and vectorized
    # article content
    response.body
  end
end

Here's the test to see that we are calling the right destination:

require 'rails_helper'
require 'spec_helper'
require 'webmock/rspec'

RSpec.describe TokenizerVectorizerService, type: :service do

  describe "tokenize" do
    it "should send the content passed in" do
      request_url = "#{ENV['TOKENIZER_URL']}/tokenize"

      # webmock the expected request and response
      stub = stub_request(:post, request_url).
         with(
           body: {"content"=>"y"},
           headers: {
          'Content-Type'=>'application/x-www-form-urlencoded',
           }).
         to_return(status: 200, body: "y", headers: {})

      TokenizerVectorizerService.tokenize("y")
      expect(stub).to have_been_requested
    end
  end
end
Erik Jacobs
  • 841
  • 3
  • 7
  • 19