0

I've set up a Python script to open this web page with PyQuery.

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)
pqPage = PyQuery(page.content)

But pqPage("li") returns only a blank list, []. Meanwhile, pqPage.text() shows the text of the page's HTML, which includes li elements.

Why won't the code return a list of li elements? How do I make it do that?

Username
  • 3,463
  • 11
  • 68
  • 111

1 Answers1

1

In seems PyQuery has problem to work with this page - maybe because it is xhtml page. Or maybe because it use namespace xmlns="http://www.w3.org/1999/xhtml"

When I use

pqPage.css('li')

then I get

[<{http://www.w3.org/1999/xhtml}html#sfFrontendHtml>]

which shows {http://www.w3.org/1999/xhtml} in element - it is namespace. Some modules has problem with HTML which uses namespaces.


I have no problem to get it using Beautifulsoup

import requests
from bs4 import BeautifulSoup as BS

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)

soup = BS(page.text, 'html.parser')
for item in soup.find_all('li'):
    print(item.text)

EDIT: after digging in Google I found that using parser="html" in PyQuery() I can get li.

import requests
from pyquery import PyQuery

url = "http://www.floridaleagueofcities.com/widgets/cityofficials?CityID=101"
page = requests.get(url)

pqPage = PyQuery(page.text, parser="html")
for item in pqPage('li p'):
    print(item.text)
furas
  • 134,197
  • 12
  • 106
  • 148