find page number in ajax sites for Web Scraping

Question

I want to scrape a site with python and BeautifulSoup but I can't find the page number and I can scrape only the first page of the site, I think this site used Ajax and when I change the page the URL address doesn't change.

It's the link of the site:

https://ihome.ir/sell-residential-apartment/th-tehran

And this is my code, I want to scrape 20 pages of this site, scrape houses with their details like prices, foundation, etc

import requests
from bs4 import BeautifulSoup

response = requests.get("https://ihome.ir/sell-residential-apartment/th-tehran")


soup = BeautifulSoup(response.json(), "html.parser")
prices = soup.select('.sell-value')
titles = soup.select('.title')

homes_prices = []
for home in prices:
    homes_prices.append(int(''.join(filter(str.isdigit, home.getText()))))


homes_titles = []
for title in titles:
    homes_titles.append(title.getText())

res = dict(zip(homes_titles, homes_prices))

for key, value in res.items():
    p = str(res[key])
    if len(str(res[key])) <= 2:
        p += '000000000'
    if len(str(res[key])) > 2:
        p += '000000'

    print(key.strip(), int(p))

here is it [check](https://scorpion.ihome.ir/v1/flatted-properties?is_sale=1&source=website&paginate=24&page=2&locations[]=iran.th.tehran&property_type[]=residential-apartment) — αԋɱҽԃ αмєяιcαη, Apr 12 '20 at 08:48
@αԋɱҽԃαмєяιcαη thanks, how can I use request with this link? — Ariaban, Apr 12 '20 at 08:59
use this link with `reuqests` like any other link `r = requests.get(link)`. It seems it doesn't need any special header for this page. Difference is only that you can get result `r.json()` instead of `r.text` and you don't have to use `BeautifulSoup`. — furas, Apr 12 '20 at 09:18
@furas tnx for your answer but I want scrape site by beautiful soup, I edit my question. how can I do that? — Ariaban, Apr 12 '20 at 09:35
if you can get it directly as JSON then don't waste time for BeautifulSoup. — furas, Apr 12 '20 at 18:37
@furas thank you, what is the api of the this link? https://ihome.ir/sell-residential-apartment/th-tehran/district1-zafaraniyeh How can I find that? — Ariaban, Apr 13 '20 at 07:31
use [DevTools](https://developers.google.com/web/tools/chrome-devtools) built-in in `Chrome/Firefox` (tab `Network`) to see all requests send from browser to server. If you also use filter `XHR` which means `AJAX` then you should see all requests send by JavaScript. If you select some of this link then you can see all headers and response - and you can check (manually) if there are data which you need. — furas, Apr 13 '20 at 12:58

αԋɱҽԃ αмєяιcαη · Accepted Answer · 2020-04-12T10:08:17.873

1

There's no need to use BeautifulSoup as the data which you are looking for it. is already presented within the JSON dict!

Here's the Back-End API, where the data fetched from it.

As you are looking to scrape 20 pages and each page containing 24 items.

So it's 24 * 20 = 480, So I've adjusted the result per page to be 480 and called the API one time better than looping over the pages multiple times.

Now you do have a JSON dict which you can access and extract whatever you want!

import requests


params = {
    'is_sale': '1',
    'source': 'website',
    'paginate': '480',
    'page': '1',
    'locations[]': 'iran.th.tehran',
    'property_type[]': 'residential-apartment'
}


def main(url):
    r = requests.get(url, params=params).json()
    for item in r['data']:
        print(item.keys())


main("https://scorpion.ihome.ir/v1/flatted-properties")

edited Apr 12 '20 at 10:08

answered Apr 12 '20 at 09:20

αԋɱҽԃ αмєяιcαη

11,825
3
17
50

tnx for your answer but I want scrape site by beautiful soup, I edit my question. how can I do that? – Ariaban Apr 12 '20 at 09:35
@Ariaban well to just not keep rotating the cycle, [edit] your question and show me one simple of desired output. – αԋɱҽԃ αмєяιcαη Apr 12 '20 at 09:39
I want to execute my code for the 20 pages of that site. but I dont know the url address of the second and third and ... twenty of this site. response = requests.get("https://ihome.ir/sell-residential-apartment/th-tehran") – Ariaban Apr 12 '20 at 09:49
@Ariaban am asking about the final desired output? what is it – αԋɱҽԃ αмєяιcαη Apr 12 '20 at 09:58
In the final I scrape 20 pages of the site and add it to a database, I just have a problem with page number in the response. I don't have problem with the code. – Ariaban Apr 12 '20 at 10:05
@ αԋɱҽԃ αмєяιcαη thanks a lot that was what I want, How do you find the API of that site? – Ariaban Apr 12 '20 at 13:44
@Ariaban [check](https://stackoverflow.com/questions/61044616/using-find-all-function-returns-an-unexpected-result-set/61045691#61045691) – αԋɱҽԃ αмєяιcαη Apr 12 '20 at 14:13

find page number in ajax sites for Web Scraping

1 Answers1