0

I want to have the score value that is in the div tag here is the function that I wrote

def get_score(text):
    soup = BeautifulSoup(text,features="lxml")
    scores=soup.text[Score]
    score=scores.find_all(attrs={"Score":"value"})
    return(score) 

a="[<a aria-label=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)"" class=""result-options report-result" "href=""/evaluation?data=eyJmcm9tIjoicmVmZXJlbmNlZF9zaXRlcyIsImRpZCI6MjE1MjIsInB1YmxpY1VybCI6Imh0dHA6XC9cL29yYW5nZWNhcnRvLnNzby5pbmZyYS5mdGdyb3VwXC9iaW5jYXJ0b1wvUGFnZXNcL0NvbXBvbmVudHNcL0NvbXBvbmVudC5hc3B4P2lkPTY2MSZ0YWI9RGVzY3JpcHRpb24iLCJwb3NpdGlvbiI6MSwic2NvcmUiOjE1Mzc4MCwiZGlmZlNjb3JlIjowLCJib29zdGVkIjowLCJ0aGVtZXMiOiJJbmZvcm1hdGlxdWUsIGdyb3VwZSIsInN0YXR1cyI6Im9rIiwiY2xvdWRWaWV3VG90YWxQcm9jZXNzaW5nVGltZSI6MTY4NTMyLCJjbG91ZFZpZXdJc1N1Z2dlc3Rpb25Qcm9wb3NlZCI6MCwicXVlcnlEYXRhIjp7InJkYXRhIjoiYmRlIiwiaWhtIjoiZnIiLCJjbG91ZHZpZXdSZGF0YSI6ImJkZSIsInF1ZXJ5SWQiOiI0NzQ1ZDBiMjE5NjY3ZTJkYzVkN2JkYjFmY2JlMjNhNSJ9LCJxdWVyeU5NYXRjaCI6MzgyMCwicXVlcnlOSGl0cyI6MzAyNCwiaXNOb3RGb3VuZCI6MCwib3JPcGVyYXRvciI6MCwiYmFzaWNhdCI6bnVsbCwicmVmZXJlciI6bnVsbH0=&amp;eval=eyJyZXF1ZXN0IjoiYmRlIiwib3JkZXIiOm51bGwsInJlc3BvbnNlVXJsIjoiaHR0cDpcL1wvb3JhbmdlY2FydG8uc3NvLmluZnJhLmZ0Z3JvdXBcL2JpbmNhcnRvXC9QYWdlc1wvQ29tcG9uZW50c1wvQ29tcG9uZW50LmFzcHg/aWQ9NjYxJnRhYj1EZXNjcmlwdGlvbiIsInJlc3BvbnNlUmFuayI6MSwicmVzcG9uc2VTY29yZSI6IjE1Mzc4MCIsInJlc3BvbnNlVGhlbWVzIjoiSW5mb3JtYXRpcXVlLCBncm91cGUiLCJyZXF1ZXN0UmVzdWx0c0NvdW50IjozMDI0fQ==&amp;typeicon=1&amp;url=https://enquete.orange.com/store/itw/answer/s/hmaw6nljru/k/Qd2BPas?idDeRequete=4745d0b219667e2dc5d7bdb1fcbe23a5&amp;requete=bde&amp;urlResultat=http%3A%2F%2Forangecarto.sso.infra.ftgroup%2Fbincarto%2FPages%2FComponents%2FComponent.aspx%3Fid%3D661%26tab%3DDescription&amp;mail=rym.boukriba%40orange.com&amp;rangURL=1" "target=""blank" "title=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)""><span class=""icon-1013-Reseau""></span></a>], [<div class=""hit-debug-info""><div>Score : 153780.000000</div><div>Term score : 3495</div><div>Date Pallier : 4</div><div>Boost Site : 0</div><div>Boost Type : 10</div><div>Boost Actu : 1</div><div>Thèmes du hit : Informatique, groupe</div></div>]"

i only want as an output :153780.000000

Corey Goldberg
  • 59,062
  • 28
  • 129
  • 143
Rym Boukriba
  • 15
  • 1
  • 5
  • have you tried `find_one()`? – Moshe Slavin Jul 29 '19 at 09:46
  • yes i tried find , find_all and find one ! – Rym Boukriba Jul 29 '19 at 09:52
  • Post the HTML around the `DIV` that contains the score you want. You should be able to write a locator to find it but we can't do that without the relevant HTML. It looks like a CSS selector, `div.hit-debug-info > div` will get you what you `Score : 153780.000000` but it's hard to tell with all the other stuff in there. – JeffC Jul 29 '19 at 13:38

2 Answers2

0

Since there is no label on your div tag I could not just "search" for the tag in the soup. So here is how I did it given the source you provided.

page_source = """ a="[<a aria-label=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)"" class=""result-options report-result" "href=""/evaluation?data=eyJmcm9tIjoicmVmZXJlbmNlZF9zaXRlcyIsImRpZCI6MjE1MjIsInB1YmxpY1VybCI6Imh0dHA6XC9cL29yYW5nZWNhcnRvLnNzby5pbmZyYS5mdGdyb3VwXC9iaW5jYXJ0b1wvUGFnZXNcL0NvbXBvbmVudHNcL0NvbXBvbmVudC5hc3B4P2lkPTY2MSZ0YWI9RGVzY3JpcHRpb24iLCJwb3NpdGlvbiI6MSwic2NvcmUiOjE1Mzc4MCwiZGlmZlNjb3JlIjowLCJib29zdGVkIjowLCJ0aGVtZXMiOiJJbmZvcm1hdGlxdWUsIGdyb3VwZSIsInN0YXR1cyI6Im9rIiwiY2xvdWRWaWV3VG90YWxQcm9jZXNzaW5nVGltZSI6MTY4NTMyLCJjbG91ZFZpZXdJc1N1Z2dlc3Rpb25Qcm9wb3NlZCI6MCwicXVlcnlEYXRhIjp7InJkYXRhIjoiYmRlIiwiaWhtIjoiZnIiLCJjbG91ZHZpZXdSZGF0YSI6ImJkZSIsInF1ZXJ5SWQiOiI0NzQ1ZDBiMjE5NjY3ZTJkYzVkN2JkYjFmY2JlMjNhNSJ9LCJxdWVyeU5NYXRjaCI6MzgyMCwicXVlcnlOSGl0cyI6MzAyNCwiaXNOb3RGb3VuZCI6MCwib3JPcGVyYXRvciI6MCwiYmFzaWNhdCI6bnVsbCwicmVmZXJlciI6bnVsbH0=&amp;eval=eyJyZXF1ZXN0IjoiYmRlIiwib3JkZXIiOm51bGwsInJlc3BvbnNlVXJsIjoiaHR0cDpcL1wvb3JhbmdlY2FydG8uc3NvLmluZnJhLmZ0Z3JvdXBcL2JpbmNhcnRvXC9QYWdlc1wvQ29tcG9uZW50c1wvQ29tcG9uZW50LmFzcHg/aWQ9NjYxJnRhYj1EZXNjcmlwdGlvbiIsInJlc3BvbnNlUmFuayI6MSwicmVzcG9uc2VTY29yZSI6IjE1Mzc4MCIsInJlc3BvbnNlVGhlbWVzIjoiSW5mb3JtYXRpcXVlLCBncm91cGUiLCJyZXF1ZXN0UmVzdWx0c0NvdW50IjozMDI0fQ==&amp;typeicon=1&amp;url=https://enquete.orange.com/store/itw/answer/s/hmaw6nljru/k/Qd2BPas?idDeRequete=4745d0b219667e2dc5d7bdb1fcbe23a5&amp;requete=bde&amp;urlResultat=http%3A%2F%2Forangecarto.sso.infra.ftgroup%2Fbincarto%2FPages%2FComponents%2FComponent.aspx%3Fid%3D661%26tab%3DDescription&amp;mail=rym.boukriba%40orange.com&amp;rangURL=1" "target=""blank" "title=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)""><span class=""icon-1013-Reseau""></span></a>], [<div class=""hit-debug-info""><div>Score : 153780.000000</div><div>Term score : 3495</div><div>Date Pallier : 4</div><div>Boost Site : 0</div><div>Boost Type : 10</div><div>Boost Actu : 1</div><div>Thèmes du hit : Informatique, groupe</div></div>]" """

soup = BeautifulSoup(page_source, 'html.parser')
x = soup.find_all('div', '')
for i in x:
    if str(i.contents[0]).startswith("Score : "):
        print(str(i.contents[0]).split(" ")[2])

This outputs.

153780.000000
0

Use Regular expression re.

import re
def get_score(text):
    soup = BeautifulSoup(text,features="lxml")
    scoretag=soup.find('div', text=re.compile("Score :"))
    score=scoretag.text.split("Score :")[1].strip()
    return(score)

print(get_score("<a aria-label=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)"" class=""result-options report-result" "href=""/evaluation?data=eyJmcm9tIjoicmVmZXJlbmNlZF9zaXRlcyIsImRpZCI6MjE1MjIsInB1YmxpY1VybCI6Imh0dHA6XC9cL29yYW5nZWNhcnRvLnNzby5pbmZyYS5mdGdyb3VwXC9iaW5jYXJ0b1wvUGFnZXNcL0NvbXBvbmVudHNcL0NvbXBvbmVudC5hc3B4P2lkPTY2MSZ0YWI9RGVzY3JpcHRpb24iLCJwb3NpdGlvbiI6MSwic2NvcmUiOjE1Mzc4MCwiZGlmZlNjb3JlIjowLCJib29zdGVkIjowLCJ0aGVtZXMiOiJJbmZvcm1hdGlxdWUsIGdyb3VwZSIsInN0YXR1cyI6Im9rIiwiY2xvdWRWaWV3VG90YWxQcm9jZXNzaW5nVGltZSI6MTY4NTMyLCJjbG91ZFZpZXdJc1N1Z2dlc3Rpb25Qcm9wb3NlZCI6MCwicXVlcnlEYXRhIjp7InJkYXRhIjoiYmRlIiwiaWhtIjoiZnIiLCJjbG91ZHZpZXdSZGF0YSI6ImJkZSIsInF1ZXJ5SWQiOiI0NzQ1ZDBiMjE5NjY3ZTJkYzVkN2JkYjFmY2JlMjNhNSJ9LCJxdWVyeU5NYXRjaCI6MzgyMCwicXVlcnlOSGl0cyI6MzAyNCwiaXNOb3RGb3VuZCI6MCwib3JPcGVyYXRvciI6MCwiYmFzaWNhdCI6bnVsbCwicmVmZXJlciI6bnVsbH0=&amp;eval=eyJyZXF1ZXN0IjoiYmRlIiwib3JkZXIiOm51bGwsInJlc3BvbnNlVXJsIjoiaHR0cDpcL1wvb3JhbmdlY2FydG8uc3NvLmluZnJhLmZ0Z3JvdXBcL2JpbmNhcnRvXC9QYWdlc1wvQ29tcG9uZW50c1wvQ29tcG9uZW50LmFzcHg/aWQ9NjYxJnRhYj1EZXNjcmlwdGlvbiIsInJlc3BvbnNlUmFuayI6MSwicmVzcG9uc2VTY29yZSI6IjE1Mzc4MCIsInJlc3BvbnNlVGhlbWVzIjoiSW5mb3JtYXRpcXVlLCBncm91cGUiLCJyZXF1ZXN0UmVzdWx0c0NvdW50IjozMDI0fQ==&amp;typeicon=1&amp;url=https://enquete.orange.com/store/itw/answer/s/hmaw6nljru/k/Qd2BPas?idDeRequete=4745d0b219667e2dc5d7bdb1fcbe23a5&amp;requete=bde&amp;urlResultat=http%3A%2F%2Forangecarto.sso.infra.ftgroup%2Fbincarto%2FPages%2FComponents%2FComponent.aspx%3Fid%3D661%26tab%3DDescription&amp;mail=rym.boukriba%40orange.com&amp;rangURL=1" "target=""blank" "title=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)""><span class=""icon-1013-Reseau""></span></a>], [<div class=""hit-debug-info""><div>Score : 153780.000000</div><div>Term score : 3495</div><div>Date Pallier : 4</div><div>Boost Site : 0</div><div>Boost Type : 10</div><div>Boost Actu : 1</div><div>Thèmes du hit : Informatique, groupe</div></div>"))

Output:

153780.000000

EDIT:

from bs4 import BeautifulSoup
import re
data='''<a aria-label=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)"" class=""result-options report-result" "href=""/evaluation?data=eyJmcm9tIjoicmVmZXJlbmNlZF9zaXRlcyIsImRpZCI6MjE1MjIsInB1YmxpY1VybCI6Imh0dHA6XC9cL29yYW5nZWNhcnRvLnNzby5pbmZyYS5mdGdyb3VwXC9iaW5jYXJ0b1wvUGFnZXNcL0NvbXBvbmVudHNcL0NvbXBvbmVudC5hc3B4P2lkPTY2MSZ0YWI9RGVzY3JpcHRpb24iLCJwb3NpdGlvbiI6MSwic2NvcmUiOjE1Mzc4MCwiZGlmZlNjb3JlIjowLCJib29zdGVkIjowLCJ0aGVtZXMiOiJJbmZvcm1hdGlxdWUsIGdyb3VwZSIsInN0YXR1cyI6Im9rIiwiY2xvdWRWaWV3VG90YWxQcm9jZXNzaW5nVGltZSI6MTY4NTMyLCJjbG91ZFZpZXdJc1N1Z2dlc3Rpb25Qcm9wb3NlZCI6MCwicXVlcnlEYXRhIjp7InJkYXRhIjoiYmRlIiwiaWhtIjoiZnIiLCJjbG91ZHZpZXdSZGF0YSI6ImJkZSIsInF1ZXJ5SWQiOiI0NzQ1ZDBiMjE5NjY3ZTJkYzVkN2JkYjFmY2JlMjNhNSJ9LCJxdWVyeU5NYXRjaCI6MzgyMCwicXVlcnlOSGl0cyI6MzAyNCwiaXNOb3RGb3VuZCI6MCwib3JPcGVyYXRvciI6MCwiYmFzaWNhdCI6bnVsbCwicmVmZXJlciI6bnVsbH0=&amp;eval=eyJyZXF1ZXN0IjoiYmRlIiwib3JkZXIiOm51bGwsInJlc3BvbnNlVXJsIjoiaHR0cDpcL1wvb3JhbmdlY2FydG8uc3NvLmluZnJhLmZ0Z3JvdXBcL2JpbmNhcnRvXC9QYWdlc1wvQ29tcG9uZW50c1wvQ29tcG9uZW50LmFzcHg/aWQ9NjYxJnRhYj1EZXNjcmlwdGlvbiIsInJlc3BvbnNlUmFuayI6MSwicmVzcG9uc2VTY29yZSI6IjE1Mzc4MCIsInJlc3BvbnNlVGhlbWVzIjoiSW5mb3JtYXRpcXVlLCBncm91cGUiLCJyZXF1ZXN0UmVzdWx0c0NvdW50IjozMDI0fQ==&amp;typeicon=1&amp;url=https://enquete.orange.com/store/itw/answer/s/hmaw6nljru/k/Qd2BPas?idDeRequete=4745d0b219667e2dc5d7bdb1fcbe23a5&amp;requete=bde&amp;urlResultat=http%3A%2F%2Forangecarto.sso.infra.ftgroup%2Fbincarto%2FPages%2FComponents%2FComponent.aspx%3Fid%3D661%26tab%3DDescription&amp;mail=rym.boukriba%40orange.com&amp;rangURL=1" "target=""blank" "title=""Signaler le document BDE-BSS MOB ( Application ) (nouvelle fenêtre)""><span class=""icon-1013-Reseau""></span></a>], [<div class=""hit-debug-info""><div>Score : 153780.000000</div><div>Term score : 3495</div><div>Date Pallier : 4</div><div>Boost Site : 0</div><div>Boost Type : 10</div><div>Boost Actu : 1</div><div>Thèmes du hit : Informatique, groupe</div></div>'''
soup = BeautifulSoup(data, features="lxml")
scorestag=soup.find_all('div', text=re.compile("Score :"))
scores=[score.text.split("Score :")[1].strip() for score in scorestag]
print(scores)
KunduK
  • 32,888
  • 5
  • 17
  • 41
  • an error like this one [link](https://stackoverflow.com/questions/886381/beautiful-soup-and-utidy) appears i add str to text and it doeasn't wok – Rym Boukriba Jul 29 '19 at 10:48
  • @RymBoukriba : If your html like that as you mentioned in your post above code should work.Check the edited section. – KunduK Jul 29 '19 at 10:56
  • 1
    It works now , sorry i couldn't post the entire html because it contains some confidential information and as i said the web site is only accessible from the intranet of the frame. Thanks a lot – Rym Boukriba Jul 29 '19 at 15:30