2

I am scraping in Python using Selenium and Firefox.

I am able to get my href's into an object using the following:

HREF = node.find_elements_by_xpath(".//a") # Get the href's under the current node

Which returns a bunch of <a> tags that look like this:

<a href="http://example.com" class="" title="The Link" data-ipshover="" data-ipshover-target="http://example.com/?preview=1" data-ipshover-timeout="1.5" id="ips_uid_1234_9">
    <span>The Link</span>
</a>

There are multiple links returned, but if I just focus on the first one:

print dir(HREF[0])
print "#########"
print HREF[0].text
print HREF[0].id
print HREF[0].get_attribute("title")
print HREF[0].get_attribute("href")
print HREF[0].get_attribute("data-ipshover-timeout")
print HREF[0].get_attribute("id")
print "#########"

Outputs this:

['__class__', '__delattr__', '__dict__', '__doc__', '__eq__',
'__format__', '__getattribute__', '__hash__', '__init__', '__module__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'__weakref__', '_execute', '_id', '_parent', '_upload', '_w3c',
'anonymous_children', 'clear', 'click',
'find_anonymous_element_by_attribute', 'find_element',
'find_element_by_class_name', 'find_element_by_css_selector',
'find_element_by_id', 'find_element_by_link_text',
'find_element_by_name', 'find_element_by_partial_link_text',
'find_element_by_tag_name', 'find_element_by_xpath', 'find_elements',
'find_elements_by_class_name', 'find_elements_by_css_selector',
'find_elements_by_id', 'find_elements_by_link_text',
'find_elements_by_name', 'find_elements_by_partial_link_text',
'find_elements_by_tag_name', 'find_elements_by_xpath', 'get_attribute',
'get_property', 'id', 'is_displayed', 'is_enabled', 'is_selected',
'location', 'location_once_scrolled_into_view', 'parent', 'rect',
'screenshot', 'screenshot_as_base64', 'screenshot_as_png', 'send_keys',
'size', 'submit', 'tag_name', 'text', 'value_of_css_property']
#########
The Link
101b851e-67dd-4907-a2da-2dc1828cb09c
The Link
http://example.com
1.5

#########

Note that last attribute print is blank, when is should return ips_uid_1234_9. Every other attribute returns fine, so I'm not sure why "id" won't return correctly.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user3246693
  • 679
  • 11
  • 22
  • Are you sure that the first hyperlink has id ? – Lakshmikant Deshpande Nov 03 '17 at 05:30
  • Positive. The link text and hyper link match the HTML I am looking at. Also, if you select a non-existent attribute with get_attribute, it returns "None", it doesn't return a blank unicode string. – user3246693 Nov 03 '17 at 05:34
  • I tried with duplicate ids, and different ids. It worked correctly in both the cases. Which website you're scraping ? The website might be changing ids during the run using JavaScript. – Lakshmikant Deshpande Nov 03 '17 at 06:03
  • Sorry, just figured it out. I'm scraping with firefox but have been looking at the page source in Chrome. Apparently the "id" attribute doesn't exist if you look at the site in Firefox. Odd, but that at least explains my confusion. – user3246693 Nov 06 '17 at 05:12

1 Answers1

1

I'm a knucklehead. Next time I need to use the same browser for scraping and viewing source code... Attribute doesn't load in Firefox, but it does load in Chrome.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
user3246693
  • 679
  • 11
  • 22