Normal Use Case
lxml
is a great library, and it has decent support for filling out and submitting forms, as documented here. The real challenge for this particular use case is rooted in the way the form works.
The regional laboratory select box is not part of the form; its value is submitted with a cookie instead. This makes things a little more difficult.
If this wasn't the case, you could just issue your GET
, pull the form out of it, change the values you're interested in, submit it, and examine the links that come back. That script might look something like this:
req = requests.get('http://www.questdiagnostics.com/testcenter/BUSearch.action?submitValue=BUSearch&keyword=Toxoplasma+Abs+IgG+%2F+IgM')
hdoc = lxml.html.fromstring(req.content)
form = hdoc.forms[1]
# Set form inputs using `form.fields = dict(...)`
form.action = "http://www.questdiagnostics.com" + form.action
submitResult = lxml.html.parse(lxml.html.submit_form(form)).getroot()
links = submitResult.xpath('//*[@id="maincolumn"]/ol/li/a[@class="title"]/@href')
While you can add arbitrary request parameters when calling lxml.html.submit_form()
, I don't see a way to add arbitrary cookies.
This Use Case
That said, since this form essentially works by redirecting back to itself (with an additional cookie to identify the lab), you could simulate this behavior by just adding the cookie to your initial GET
. You might not need to mess around with a form submission at all. This script will show the first ten links for the SKB lab:
cookies = dict(TC11SelectedLabCode='SKB')
req = requests.get('http://www.questdiagnostics.com/testcenter/BUSearch.action?submitValue=BUSearch&keyword=Toxoplasma+Abs+IgG+%2F+IgM', cookies=cookies)
hdoc = lxml.html.fromstring(req.content)
links = hdoc.xpath('//*[@id="maincolumn"]/ol/li/a[@class="title"]/@href')
print(links)
You could take this a step further, and issue a GET
with no cookies to obtain the list of labs, and then iterate over that list, calling requests.get()
on each one, sending the appropriate TC11SelectedLabCode
cookie to simulate the form submission.
Notes
Note that while lxml
has decent form submission support, you're not actually clicking anything. There is nothing "breathing life into" the DOM.
None of the javascript on the page is running.
To illustrate why this is important, consider this example. If you wanted to verify the links on page 2 of the results, I can't say how you'd accomplish that. If your tests need to exercise javascript on the page, I think you'll need more than requests
and lxml
.