I'm scraping ~10 websites for the same information, and currently have a script for each one of them that works on its own. These scripts all have the same base (iterate over available pages, scrape information, save it), but different attributes.
As an example, these are examples of how I'm extracting the author
element from two pages:
page.at('b[itemprop="author"]').children.text.strip
page.at('.author-username').text.strip
My goal is to refactor this so the main logic is handled by in a class, but I'm having trouble figuring out how to pass in the above extractors depending on the source. I'm aware that I can pass CSS selectors as arguments, but as you can see there is some additional logic for each extraction.
While I could have a separate method to handle this (as outlined in the previous link), this would quickly get out of hand with ~10 sources.
What is the best way to refactor this code?