4

So I'm brand new the whole web scraping thing. I've been working on a project that requires me to get the word of the day from here. I have successfully grabbed the word now I just need to get the definition, but when I do so I get this result:

Avuncular (Correct word of the day)

Definition:

[]

here's my code:

from lxml import html
import requests

page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = html.fromstring(page.content)

word = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[1]/div[2]/div[1]/div/h1/text()')

WOTD = str(word)
WOTD = WOTD[2:]
WOTD = WOTD[:-2]

print(WOTD.capitalize())


print("Definition:")

wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[2]/div[1]/div/div[1]/p[1]/text()')

print(wordDef)

[] is supposed to be the first definition but won't work for some reason.

Any help would be greatly appreciated.

jaden
  • 43
  • 8

2 Answers2

1

Your xpath is slightly off. Here's the correct one:

wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[3]/div[1]/div/div[1]/p[1]/text()')

Note div[3] after main/article instead of div[2]. Now when running you should get:

Avuncular
Definition:
[' suggestive of an uncle especially in kindliness or geniality']
chris
  • 1,267
  • 7
  • 20
1

If you wanted to avoid hardcoding index within xpath, the following would be an alternative to your current attempt:

import requests
from lxml.html import fromstring

page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = fromstring(page.text)
word = tree.xpath("//*[@class='word-header']//h1")[0].text
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p/strong")[0].tail.strip()
print(f'{word}\n{wordDef}')

If the wordDef fails to get the full portion then try replacing with the below one:

wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p")[0].text_content()

Output:

avuncular
suggestive of an uncle especially in kindliness or geniality
SIM
  • 21,997
  • 5
  • 37
  • 109