I am having trouble with a web scraping function. The XPath for the two things I am trying to get are
/html/body/div/table[2]/tbody/tr[5]/td[1]/div[1]/ul/li[1]/text()
/html/body/div/table[2]/tbody/tr[5]/td[1]/div[1]/ul/li[1]/a
The html is
<li><a href="http://www.acu.edu/" target="_blank" class="institution">Abilene Christian University</a> (TX)</li>
I am trying to have a function to loop through each li in tr[5]. The problem I am having is getting the text(). I have tried a number of different variations of this function
from lxml.html import parse
from urllib2 import urlopen
def _clean(lst):
for elm in lst:
lnk=elm.findall('.//a')
for this in lnk:
lnk_txt.append(this.text_content())
state_txt.append(elm.findall('.//text()'))
This specific function returns an KeyError on the '()'. If I remove (), it returns a list of empty elements. The lnk_txt works.
What I am trying to get are two list. One is the name of the University. The other is the location of the University. The ultimate goal is to make tuples (name, state).