Review scraping form tripadvisor

Question

I am new to web scraping in python3. I want to scrape the reviews of all the hotels in dubai but the problem is I can only scrape the hotel review which I describe in the url. Can anyone show me how I can get all of the hotel reviews without implicitly giving url of each hotel?

import requests
from bs4 import BeautifulSoup


importurl = 'https://www.tripadvisor.com/Hotel_Review-g295424-d302778-Reviews-Roda_Al_Bustan_Dubai_Airport-Dubai_Emirate_of_Dubai.html'
r = requests.get(importurl)
soup = BeautifulSoup(r.content, "lxml")
 resultsoup = soup.find_all("p", {"class" : "partial_entry"})
#save the reviews to a test text file locally
for review in resultsoup:
review_list = review.get_text()
print(review_list)
with open('testreview.txt', 'w') as fid: 
    for review in resultsoup:
        review_list = review.get_text()
        fid.write(review_list)

宏杰李 · Accepted Answer · 2017-01-04T13:57:24.237

3

you should find the index page of all hotel, get all the link into a list, than loop the url list to get comment.

import bs4, requests
index_pages = ('http://www.tripadvisor.cn/Hotels-g295424-oa{}-Dubai_Emirate_of_Dubai-Hotels.html#ACCOM_OVERVIEW'.format(i) for i in range(0, 540, 30))
urls = []
with requests.session() as s:
    for index in index_pages:
        r = s.get(index)
        soup = bs4.BeautifulSoup(r.text, 'lxml')
        url_list = [i.get('href') for i in soup.select('.property_title')]
        urls.append(url_list)

out:

len(urls): 540

edited Jan 04 '17 at 13:57

answered Jan 04 '17 at 13:04

宏杰李

11,820
2
28
35

1

This is not the full list of hotels, but hotels from first page only: there are 18 pages more.. – Andersson Jan 04 '17 at 13:25
@Andersson This is a example, if you can get 1 page, just use loop to get 18 page . – 宏杰李 Jan 04 '17 at 13:33
But there is no page numeration for results. `URL` is always `http://www.tripadvisor.cn/Hotels-g295424-Dubai_Emirate_of_Dubai-Hotels.html` no matter which page it is: 1st or 19th... – Andersson Jan 04 '17 at 13:36
@Andersson yes, I notice that, this page use JavaScript to fetch data, it's hard to use requests to handle that. – 宏杰李 Jan 04 '17 at 13:38
@Andersson Done! – 宏杰李 Jan 04 '17 at 13:57
Just a little remark about your code: you should use `urls.extend(url_list)` instead of `urls.append(url_list)` as `append()` is for `one_list + value` and `extend()` is for `one_list + another_list` – Andersson Jan 04 '17 at 14:27
...and one more :) because of some duplicates, final version of `urls` should be: `urls = set(urls)` – Andersson Jan 04 '17 at 14:31
@Andersson thx for point out, and please accept the answer to close this question. – 宏杰李 Jan 04 '17 at 15:17
I'm afraid I'm not able to do so, as it isn't my question :) – Andersson Jan 04 '17 at 15:19
@Andersson I assume you are OP.............................................. thanks for your comment. – 宏杰李 Jan 04 '17 at 15:29
@宏杰李 thanku sooo much :) my problem is solved now thankx again – Techgeeks1 Jan 05 '17 at 11:28
@Hifza ahmad please accept the answer to close this question. – 宏杰李 Jan 05 '17 at 11:43

Review scraping form tripadvisor

1 Answers1