1

I'm new with Python and I've been trying to use BeautifulSoup to extract one particular data line from a variable defined in a script element.

Code:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
print(chart)

OUTPUT:

var data = {
status: 'success',  
baseline: 29,       
communicate: null,  
company: 'Facebook',
max: 66,
series: [

                      { x: '2020-05-30T13:22:28.168484-04:00', y: 25  },

                      { x: '2020-05-30T13:37:28.168484-04:00', y: 27  },

                      .....

                      { x: '2020-05-31T13:07:28.168484-04:00', y: 30  },

                  ]
                }

                $(function () {
                  chartThis(data, 'holder', 'line')
                });

                if (data.communicate && $('#dd-communicate').length) {
                  $('#dd-communicate').html('<div class="border text-left d-inline-block p-2"><i class="fa" aria-hidden="true" style="color: red; width:16px; height:12px; background:url(https://cdn2.downdetector.com/d328eb8cbe4e164/images/v2/message.svg) no-repeat"></i>'
                    +'<span class="d-inline-block px-1">'+ data.company+' &bull;  ' + moment.utc(data.communicate.created_at).fromNow()
                    + '</span><p class="font-weight-bold my-0">'+ data.communicate.message + '</p></div>')
                }

Do you know an easy way to extract the 'max' value from the var result above?

I've tried using esprima, but still no luck as I've hit error:

Traceback (most recent call last): File "c:/test.py, line 31, in if token["type"] == "Identifier" and token["value"] == "max": TypeError: 'BufferEntry' object is not subscriptable

My code with esprima looked like this:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()

tokens = esprima.tokenize(chart)

token_iterator = iter(tokens)

for token in token_iterator:
    if token["type"] == "Identifier" and token["value"] == "max":
        value_token = next(next(token_iterator))
        result = value_token["value"]

Any help would be greatly appreciated!

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Shaggy
  • 33
  • 2

1 Answers1

0

A quick solution to extract the max value would be to use split on the chart:

import requests
from bs4 import BeautifulSoup

URL = 'https://downdetector.com/status/facebook/'
browser = {'user-agent': 'my agent'}

page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
max_val= chart.split("max: ")[1].split(",")[0]

print(max_val)

OUT: 64
Nico Müller
  • 1,784
  • 1
  • 17
  • 37
  • Nico, thanks for your quick answer! With this information, I've tried to automate an email to my inbox and I've received an error: " File "c:/test.py", line 46, in if (max_val < 50): TypeError: '<' not supported between instances of 'str' and 'int'" Any ideas how to get around this? – Shaggy May 31 '20 at 18:12
  • you can use int(max_val) < 50 (split returns a str) If this solves your question it would be great if you can mark it as answered (gray checkmark next to the answer) – Nico Müller May 31 '20 at 18:13
  • Yeah, I think I might need to raise another question for the second part as you've initially answered the original question - thanks a lot. After adding int(max_val) > 50 it still gives the same error. :( I'll leave additional code in the next comment: – Shaggy May 31 '20 at 18:22
  • "import smtplib if int(max_val > 50): send_mail() def send_mail(): server = smtplib.SMTP('smtp.gmail.com', 587) server.ehlo() server.starttls() server.ehlo() server.login('email', 'pw') subject = 'test' body = 'test/' msg = f"Subject: {subject}\n\n{body}" server.sendmail( 'from email', 'to email', msg ) print('succesfully sent') server.quit()" – Shaggy May 31 '20 at 18:22
  • No worries! it should be `if int(max_val) > 50:` Alternatively, you can do `max_val = int(max_val)` to always have max_val as an integer value in your program – Nico Müller May 31 '20 at 18:25
  • 1
    Oh wow that's a quick fix haha - it worked! Thank you so much, Nico. :) – Shaggy May 31 '20 at 18:34