0

I'm writing a scraper that uses selenium to navigate & login to a certain website; search for the newest data and then store it into a database. I'm using selenium-webdriver to navigate the website, and now I'm trying to write tests for the most important edge cases.

I downloaded the HTML and built a fake Sinatra website, that mimics the behavior of the original site so that I can test my code. However, I have to run the puma server separately in an environment independent of my code.

I need to be able to mock everything in the same environment so that I can have better control of how the application behaves. I think I can take the same approach as the guys from Capybara do but I don't know how to start.

I created a short mocking class, and it runs but as soon as puma starts RSpec is halted waiting for puma to stop its execution.

What's the best approach that I can take to actually test this scraper correctly, are there any technologies that already exist and that I can use?

My scraper works the same as explained in this tutorial:

https://dev.to/mknycha/serverless-web-scraper-in-ruby-tutorial-50hg

I tried to make it work by starting the mocked website when starting rspec like this:

require 'webmock'
require 'puma'
require 'puma/events'
require 'spec/support/fake_website'
require 'rack/handler/puma'
include WebMock::API

WebMock.reset!

def enable_external_connections!
  WebMock.allow_net_connect!
end

def disable_external_connections!
  WebMock.disable_net_connect!(allow_localhost: true, allow: ['app.local'])
end

def stub_net_connections!(options = {})
  registry = {
    "fake_website.com" => { /fake_website.com/ => proc { FakeWebsite } }
  }

  if !options[:only].to_a.empty?
    [options[:only]].flatten.each do |key|
      WebMock::API.stub_request(:any, registry[key].keys.first).to_rack(registry[key].values.first.call)
    end

    enable_external_connections!

  elsif !options[:except].to_a.empty?
    (registry.keys - [options[:except]].flatten).flatten.each do |key|
      WebMock::API.stub_request(:any, registry[key].keys.first).to_rack(registry[key].values.first.call)
    end

    enable_external_connections!

  else
    registry.keys.flatten.each do |key|
      WebMock::API.stub_request(:any, registry[key].keys.first).to_rack(registry[key].values.first.call)
    end

    disable_external_connections!
  end
end

def run_puma(example)
  options = { Host: '127.0.0.1', Port: '80', Threads: '0:4', workers: 1, daemon: true, Verbose: true }

  conf = Rack::Handler::Puma.config(FakeWebsite, options)
  events = conf.options[:Silent] ? ::Puma::Events.strings : ::Puma::Events.stdio
  puma_ver = Gem::Version.new(Puma::Const::PUMA_VERSION)

  events.log 'App starting Puma...'
  events.log "* Version #{Puma::Const::PUMA_VERSION} , codename: #{Puma::Const::CODE_NAME}"
  events.log "* Min threads: #{conf.options[:min_threads]}, max threads: #{conf.options[:max_threads]}"

  Puma::Server.new(FakeWebsite, ::Puma::Events.stdio, conf.options).tap do |s|
    s.binder.parse conf.options[:binds], s.events
    s.min_threads, s.max_threads = conf.options[:min_threads], conf.options[:max_threads]
  end.run.join
end

# Disable all external requests by default.
disable_external_connections!

RSpec.configure do |config|

  # Disable external connections and stub all external services
  #
  config.before(:each) do |example|
    stub_net_connections!
    if example.metadata[:external_connections] == true
      enable_external_connections!

    elsif example.metadata[:external_connections] == false
      run_puma(example)
      disable_external_connections!
    end
  end

  config.after(:each) do |example|
  end

end

However, as soon as this function runs the tests are stopped because the server is started:

def run_puma(example)
  options = { Host: '127.0.0.1', Port: '80', Threads: '0:4', workers: 1, daemon: true, Verbose: true }

  conf = Rack::Handler::Puma.config(FakeWebsite, options)
  events = conf.options[:Silent] ? ::Puma::Events.strings : ::Puma::Events.stdio
  puma_ver = Gem::Version.new(Puma::Const::PUMA_VERSION)

  events.log 'Chimera starting Puma...'
  events.log "* Version #{Puma::Const::PUMA_VERSION} , codename: #{Puma::Const::CODE_NAME}"
  events.log "* Min threads: #{conf.options[:min_threads]}, max threads: #{conf.options[:max_threads]}"

  Puma::Server.new(FakeWebsite, ::Puma::Events.stdio, conf.options).tap do |s|
    s.binder.parse conf.options[:binds], s.events
    s.min_threads, s.max_threads = conf.options[:min_threads], conf.options[:max_threads]
  end.run.join
end

These lines are the ones that make the tests stop:

Puma::Server.new(FakeWebsite, ::Puma::Events.stdio, conf.options).tap do |s|
  s.binder.parse conf.options[:binds], s.events
  s.min_threads, s.max_threads = conf.options[:min_threads], conf.options[:max_threads]
end.run.join

Is there another way to achieve this? Are there any tools to test this type of application out there?

anothermh
  • 9,815
  • 3
  • 33
  • 52
Alvaro Alday
  • 343
  • 3
  • 19
  • Your question seems to be opinion-based. Please be more specific and read the guidelines on [how to ask good a question](https://stackoverflow.com/help/how-to-ask) – builder-7000 Mar 30 '20 at 22:51
  • 1
    The word is scraper. Scrapper has an entirely different meaning in English. – anothermh Mar 30 '20 at 23:02

0 Answers0