1

I've created a script in python to fetch all the asins that are available in a certain node. There are around 1000 asins in there. The way I've tried below can fetch me 146 asins out of 1000. Although the number of pages is changing accordingly when I hit the SHOW MORE button located at the bottom of that page, I get the exact same asins when I change the page numbers within my script.

webpage address

I've tried so far with:

import re
import json
import requests
from bs4 import BeautifulSoup

node = '15529609011'

r = requests.get(f'https://www.amazon.com/stores/node/{node}?productGridPageIndex=1')
soup = BeautifulSoup(r.content,'lxml')
slot_num = soup.select_one('.stores-widget-btf')['id']
res = requests.get(f'https://www.amazon.com/stores/slot/{slot_num}?node={node}')
p = re.compile(r'var config = (.*);')
data = json.loads(p.findall(res.text)[0])
asins = data['content']['ASINList']
print(len(asins))

How can I grab all the asins available in there using requests?

MITHU
  • 113
  • 3
  • 12
  • 41
  • 1
    It's not just your script: I see the same behavior if I try to manually edit the url in my browser. It looks like the "show more" button probably relies on javascript in order to operate, which if true means you're not going to be able to simulate a button click using the `requests` module. – larsks Nov 23 '19 at 12:24
  • 1
    Selenium or any other tool that replicates the behavior in a browser is needed here. – Harish Vutukuri Nov 24 '19 at 12:00

1 Answers1

1

The data from Show More button is loaded via an ajax requests.

You can either:

  1. Easier, but more resource consuming: Using a headless browser (e.g: chromedriver headless) with selenium
  2. Harder, but lighter: Open broswer's Dev Tool. Find and analyze the ajax request, build one and send via python.
hunzter
  • 554
  • 4
  • 11