Here is a way to get unique values. It doesn't work if i want to get unique attribute. For example:
<a href = '11111'>sometext</a>
<a href = '11121'>sometext2</a>
<a href = '11111'>sometext3</a>
I want to get unique hrefs. Restricted by using xpath 1.0
page_src.xpath( '(//a[not(.=preceding::a)] )')
page_src.xpath( '//a/@href[not(.=preceding::a/@href)]' )
return duplicates.
Is it possible to resolve this nightmare with unique-values
absence ?
UPD : it's not a solution like function i wanted, but i wrote python function, which iterates over parent elements and check if adding parent tag filters links to needed count.
Here is my example:
_x_item = (
'//a[starts-with(@href, "%s")'
'and (not(@href="%s"))'
'and (not (starts-with(@href, "%s"))) ]'
%(param1, param1, param2 ))
#rm double links
neededLinks = list(map(lambda vasa: vasa.get('href'), page_src.xpath(_x_item)))
if len(neededLinks)!=len(list(set(neededLinks))):
uniqLength = len(list(set(neededLinks)))
breakFlag = False
for linkk in neededLinks:
if neededLinks.count(linkk)>1:
dupLinks = page_src.xpath('//a[@href="%s"]'%(linkk))
dupLinkParents = list(map(lambda vasa: vasa.getparent(), dupLinks))
for dupParent in dupLinkParents:
tempLinks = page_src.xpath(_x_item.replace('//','//%s/'%(dupParent.tag)))
tempLinks = list(map(lambda vasa: vasa.get('href'), tempLinks))
if len(tempLinks)==len(set(neededLinks)):
breakFlag = True
_x_item = _x_item.replace('//','//%s/'%(dupParent.tag))
break
if breakFlag:
break
This WILL work if duplicate links has different parent, but same @href
value.
As a result i will add parent.tag prefix like //div/my_prev_x_item
Plus, using python, i can update result to //div[@key1="val1" and @key2="val2"]/my_prev_x_item
, iterating over dupParent.items()
. But this is only working if items are not located in same parent object.
In result i need only x_path_expression, so i cant just use list(set(myItems))
.
I want easier solution ( like unique-values()
), if it exists. Plus my solution does not work if link's parent is same.