I am using django-dynamic-scraper
in one of my applications, I have gone through the docs and following is my setup:
object class url I am using is : http://www.example.com/products/brandname_products.html
The pagination on the site is something like the following.
page 1: http://www.example.com/products/brandname_products.html
page 2: http://www.example.com/products/brandname_products2.html
page 3: http://www.example.com/products/brandname_products3.html
page 4: http://www.example.com/products/brandname_products4.html
The brandname
in the above urls is dynamic and depends on a brand's products page. I cannot have a different scraper for each brand as there are over 10000 brands so I am trying to use a single scraper object.
In the scraper object that I am using I have defined the pagination options as follows:
pagination_type
: RANGE_FUNCT
pagination_append_str
: _products{page}.html
pagination_page_replace
: 1,100,2
but the scraper requests the following pagination urls
http://www.example.com/products/brandname_products.html_products2.html
http://www.example.com/products/brandname_products.html_products3.html
http://www.example.com/products/brandname_products.html_products4.html
Instead of
http://www.example.com/products/brandname_products2.html
http://www.example.com/products/brandname_products3.html
http://www.example.com/products/brandname_products4.html
Q: Why is it appending the replace string to the end of the url instead of actually replacing it with _products.html
in the object class url ? What am I doing wrong and how can I fix this.