Why am I not getting any data back from website?

Question

So I'm brand new the whole web scraping thing. I've been working on a project that requires me to get the word of the day from here. I have successfully grabbed the word now I just need to get the definition, but when I do so I get this result:

Avuncular (Correct word of the day)

Definition:

[]

here's my code:

from lxml import html
import requests

page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = html.fromstring(page.content)

word = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[1]/div[2]/div[1]/div/h1/text()')

WOTD = str(word)
WOTD = WOTD[2:]
WOTD = WOTD[:-2]

print(WOTD.capitalize())


print("Definition:")

wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[2]/div[1]/div/div[1]/p[1]/text()')

print(wordDef)

[] is supposed to be the first definition but won't work for some reason.

Any help would be greatly appreciated.

score 1 · Accepted Answer · answered Feb 26 '19 at 20:36

1

Your xpath is slightly off. Here's the correct one:

wordDef = tree.xpath('/html/body/div[1]/div/div[4]/main/article/div[3]/div[1]/div/div[1]/p[1]/text()')

Note div[3] after main/article instead of div[2]. Now when running you should get:

Avuncular
Definition:
[' suggestive of an uncle especially in kindliness or geniality']

answered Feb 26 '19 at 20:36

chris

1,267
7
20

Thanks! I copied the XPath using chrome. I must have copied the wrong thing. – jaden Feb 26 '19 at 20:44

SIM · Answer 2 · 2019-02-27T09:37:46.630

If you wanted to avoid hardcoding index within xpath, the following would be an alternative to your current attempt:

import requests
from lxml.html import fromstring

page = requests.get('https://www.merriam-webster.com/word-of-the-day')
tree = fromstring(page.text)
word = tree.xpath("//*[@class='word-header']//h1")[0].text
wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p/strong")[0].tail.strip()
print(f'{word}\n{wordDef}')

If the wordDef fails to get the full portion then try replacing with the below one:

wordDef = tree.xpath("//h2[contains(.,'Definition')]/following-sibling::p")[0].text_content()

Output:

avuncular
suggestive of an uncle especially in kindliness or geniality

Why am I not getting any data back from website?

2 Answers2