1

I'm trying to extract the titles from a URL but it doesn't have a class. The following code is taken from the page source.

<a href="/f/oDhilr3O">Unatama Don</a>

The title actually does have a class but you can see that I have use index 3 as the first 3 titles aren't what I want. However, I don't want to use hard coding. But in the website the title is also a link, hence, the link above.

title_name=soup.find_all('div',class_='food-description-title')
title_list=[]

for i in range (3,len(title_name)):
    title=title_name[i].text
    title_list.append(title)

"Unatama Don" is the title I'm trying to get.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Elliot Tan
  • 11
  • 4

2 Answers2

0

Here's an example of searching for an anchor element with a specific URL in BS:

from bs4 import BeautifulSoup

document = '''
  <a href="https://www.google.com">google</a>
  <a href="/f/oDhilr3O">Unatama Don</a>
  <a href="test">Don</a>
'''

soup = BeautifulSoup(document, "lxml")
url = "/f/oDhilr3O"

for x in soup.find_all("a", {"href" : url}):
    print(x.text)

Output:

Unatama Don
ggorlen
  • 44,755
  • 7
  • 76
  • 106
0

The requests and bs4 modules are very helpful for tasks like this. Have you tried something like below?

import requests
from bs4 import BeautifulSoup

url = ('PASTE/YOUR/URL/HERE')
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('a', href=True)

for each in links:
    print(each.text)

I think this has the desired outcome you are looking for. If you would like the hyperlinks as well. Add another loop and add "print(each.get('href'))" within the loop. Let us know how it goes.

Steven M
  • 204
  • 1
  • 4
  • 13