0

I want to scrape this website Hotel Association Data and need help for the css selector. If you see the below image, I am trying to extract address from here using a css selector.

Data I want to scrape : 20 West 29th Street and New York, NY 10001

enter image description here

Using Next Sibling Method

I know we can find next sibling using + sign, but the problem here is both the address text doesn't have any attribute associated with it. I don't want to use xpath here but a generic css selector to find all the siblings of .hanyccompany and then extract text from it.

Can anyone tell me how to find all the siblings of class='hanyccompany

<span class="hanyccompany"><a href="http://www.acehotel.com/" target="_blank">ACE HOTEL NEW YORK</a></span><br />
20 West 29th Street<br />
New York, NY 10001<br />
python
  • 4,403
  • 13
  • 56
  • 103

1 Answers1

2

You can parse and extract data easily using BeautifulSoup.

from bs4 import BeautifulSoup
from mechanize import Browser

br = Browser()
br.addheaders = [('User-agent', 'Firefox')]
response = br.open("http://www.hanyc.org/members/hotels/")

web_data = response.read()

soup = BeautifulSoup(web_data, "html.parser")
tags = soup.find_all('span', attrs={"class": "hanyccompany"})

for tag in tags:
    print(tag.parent.text)
    print("------------------------------")

if you print text of span's parent, you'll get something like

ACE HOTEL NEW YORK
20 West 29th Street
New York, NY 10001
Jan Rozenveld, Managing Director
(212) 679-2222
(212) 679-1947
jan.rozenveld@acehotel.com

...
Saleem
  • 8,728
  • 2
  • 20
  • 34
  • I want to extract `address` not the `class=hanyccompany` or I want to extract all the siblings of `class=hanyccompany` – python Mar 21 '16 at 04:02
  • See updated post. you can iterate over tags list and get parent tag. – Saleem Mar 21 '16 at 04:18
  • 1
    Or `tag.parent.text` will just give you inner text without all tags. – Saleem Mar 21 '16 at 04:18
  • web_data is not a url. you are responsible to read url and pull html contents with whatever method you want. web_data contains final html – Saleem Mar 21 '16 at 04:30
  • Nope its not working. My `web_data` contains all the html content but I can't see any output. – python Mar 21 '16 at 04:35
  • your program seems to be correct but it is so weird that I can see any tags :/ – python Mar 21 '16 at 04:38
  • check what you get when you read web page. I'm sure you are not getting what you expect. – Saleem Mar 21 '16 at 04:42
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/106874/discussion-between-saleem-and-python). – Saleem Mar 21 '16 at 04:43