0

How could I scrape with Python the data from this page, specifically from the charts? I've tried beautifulsoup but I've inspeced the HTML page and it seems not to be in any available tag to scrape.

I can't find the numbers present in charts at my request response, and I also coudn't find them at inspect HTML (see image below).

Input

from bs4 import BeautifulSoup
import requests
url = "https://viz.saude.gov.br/extensions/CobVac_MOV/CobVac_MOV.html"
r = requests.get(url)
soup = BeautifulSoup(r.text, "html")
print(soup.prettify())

Output

<!DOCTYPE html>
<html>
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <title>
   MS-SUS COVID-19 Distribuição de Vacinas
  </title>
  <meta charset="utf-8"/>
  <meta content="True" name="HandheldFriendly"/>
  <meta content="320" name="MobileOptimized"/>
  <meta content="width=device-width, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0, user-scalable=no" name="viewport"/>
  <meta content="yes" name="apple-mobile-web-app-capable"/>
  <meta content="black" name="apple-mobile-web-app-status-bar-style"/>
  <meta content="on" http-equiv="cleartype"/>
  <!--Polymer stuff -->
  <script src="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/webcomponentsjs/webcomponents-lite.min.js">
  </script>
  <script src="https://kit.fontawesome.com/a076d05399.js">
  </script>
  <link href="qliksense-card.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/iron-flex-layout/iron-flex-layout-classes.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/paper-header-panel/paper-header-panel.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/paper-toolbar/paper-toolbar.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/paper-drawer-panel/paper-drawer-panel.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/paper-icon-button/paper-icon-button.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/iron-icons/iron-icons.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/iron-pages/iron-pages.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/paper-menu/paper-menu.html" rel="import"/>
  <link href="https://cdn.rawgit.com/download/polymer-cdn/1.7.0.2/lib/paper-item/paper-item.html" rel="import"/>
  <link href="polymer-mixins.html" rel="import"/>
  <style include="iron-flex iron-positioning" is="custom-style">
  </style>
  <style include="polymer-mixins" is="custom-style">
  </style>
  <!-- Bootstrap css -->
  <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css" rel="stylesheet"/>
  <!-- Font Awesome -->
  <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
  <!-- Qlik -->
  <link href="../../resources/autogenerated/qlik-styles.css" rel="stylesheet"/>
  <script src="../../resources/assets/external/requirejs/require.js">
  </script>
  <!-- Bootstrap js -->
  <script crossorigin="anonymous" src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js">
  </script>
  <!-- google fonts -->
  <link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro" rel="stylesheet"/>
  <!-- Project code -->
  <link href="CobVac_MOV.css" rel="stylesheet"/>
  <script src="CobVac_MOV.js">
  </script>
  <!-- fontawesome -->
  <link href="https://maxcdn.bootstrapcdn.com/font-awesome/4.7.0/css/font-awesome.min.css" rel="stylesheet"/>
 </head>
 <body class="fullbleed vertical layout">
  <paper-drawer-panel disable-edge-swipe="true" force-narrow="true" right-drawer="" z-index="1000">
   <!-- FILTROS INI ============================================================ -->
   <div drawer="">
    <div class="drawer-title">
     Filtros
    </div>
    <div class="filter-container">
     <div class="qvobject" id="qvfilters">
     </div>
    </div>
   </div>
   <!-- FILTROS FIM ============================================================ -->
   <!-- PAGINA INI ============================================================ -->
   <paper-header-panel main="">
    <!-- HEADER INI ============================================================ -->
    <div class="paper-header">
     <paper-toolbar style="background-color: #306BBC; color: #ffffff;">
      <paper-icon-button class="visible-xs-block" icon="menu" id="nav-menu-button">
      </paper-icon-button>
      <img src="LOGO_TOPO.png" style="height:33px; width:161px;"/>
      <div class="title" style="font-size:18px;">
       <b>
        COVID-19 Vacinação
        <br/>
        Distribuição de Vacinas
       </b>
      </div>
      <!--TITLE-->
      <paper-icon-button class="filter-drawer-toggle" icon="search" paper-drawer-toggle="">
      </paper-icon-button>
      <paper-icon-button class="filter-drawer-toggle" data-target="#basic" data-toggle="modal" icon="help">
      </paper-icon-button>
     </paper-toolbar>
     <!-- BARRA DE FILTROS =================== more-vert -->
     <div class="qvobjects" id="CurrentSelections" style="position:relative; top:0; left:0; width:100%; height:38px;">
     </div>
    </div>
    <!-- HEADER FIM ============================================================ -->
    <!-- PAGINA UTIL INI ============================================================ -->
    <paper-drawer-panel drawer-width="0px" id="nav-drawer">
     <!-- PAGINAS ## INI ============================================================ -->
     <iron-pages main="" selected="0" style="background-color:#eee;">
      <!-- Each .paper-body contained within <iron-pages> is a view. Copy and paste to add more views. -->
      <!-- Don't forget to add a <paper-item> in the <paper-menu> above to be able to navigate to any view you add -->
      <!-- ========================== -->
      <!-- PAGINA 0 -->
      <!-- ========================== -->
      <div class="paper-body">
       <div class="container-fluid">
        <!-- A .qvplaceholder will become a droppable area in the dev-hub -->
        <!-- Each .qvplaceholder must have a unique id -->
        <!-- These .qvplaceholder objects below have an extra class, .kpi, which applies some simple styles intended for kpi objects -->
        <!--
                            <div class="row">
                            <p style="color:red">
                             <b>IMPORTANTE: As informações mostradas neste painel referem-se apenas às doses enviadas a partir do Ministério da Saúde.</b>
                            </p>
                            </div>
-->
        <!--<div class="row">
                             <b>DOSES ENVIADAS PELO MINISTÉRIO DA SAÚDE AOS ESTADOS</b>
                            </div>-->
        <!-- =================== -->
        <!-- KPIS -->
        <!-- =================== -->
        <!-- ====================================================== -->
        <!-- Ver os icones em https://fontawesome.com/v4.7.0/icons/ -->
        <!-- ====================================================== -->
        <div class="row kpi-row">
         <div class="col-xs-12 col-sm-12 col-lg-7">
          <div class="kpi-side">
           <i class="fas fa-syringe">
           </i>
          </div>
          <div class="kpi corkpi01 qvobject" id="KPI-01">
          </div>
         </div>
         <!--<div class="col-xs-12 col-sm-6  col-lg-3">
                                    <div class="kpi-side"><i class="fas fa-syringe"></i></div>
                                    <div class="kpi corkpi02 qvobject" id="KPI-02"></div>
                                </div>
                                <div class="col-xs-12 col-sm-6 col-lg-4">
                                    <div class="kpi-side"><i class="fas fa-syringe"></i></div>
                                    <div class="kpi corkpi03 qvobject" id="KPI-03"></div>
                                </div>
                                <div class="col-xs-12 col-sm-6 col-lg-8">
                                    <div class="kpi-side"><i class="fas fa-syringe"></i></div>
                                    <div class="kpi corkpi04 qvobject" id="KPI-04"></div>
                                </div>-->
        </div>
        <div class="row">
         <p>
          <b>
           <a href="https://sage.saude.gov.br/sistemas/vacina/documentosVacina.php">
            Acesse aqui
           </a>
           os arquivos com os comprovantes de recebimento pelos Estados.
           <br/>
          </b>
          <!--<br>
                             Esclarecimento: Doses em Trânsito são aquelas que estão sendo enviadas pelos Estados aos seus Municípios.
                             </p>-->
         </p>
        </div>
        <!-- =================== -->
        <!-- GRAFICOS 0.1  UF, MAPA  -->
        <!-- =================== -->
        <div class="row">
         <div class="col-xs-12 col-sm-8">
          <!-- Placing a .qvplaceholder within a <qliksense-card> will create a cardified object -->
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G01A">
           </div>
          </qliksense-card>
         </div>
         <div class="col-xs-12 col-sm-4">
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G01B">
           </div>
          </qliksense-card>
         </div>
        </div>
        <!-- =================== -->
        <!-- GRAFICOS 0.2 VACINA, TEMPO -->
        <!-- =================== -->
        <div class="row">
         <div class="col-xs-12 col-sm-6">
          <!-- Placing a .qvplaceholder within a <qliksense-card> will create a cardified object -->
          <qliksense-card content-height="400px">
           <div class="with-title qvobject" id="QV1-G02A">
           </div>
          </qliksense-card>
         </div>
         <div class="col-xs-12 col-sm-6">
          <qliksense-card content-height="400px">
           <div class="with-title qvobject" id="QV1-G02B">
           </div>
          </qliksense-card>
         </div>
        </div>
        <!-- =================== -->
        <!-- GRAFICOS 0.3 TABELA_UF PERCENTUAL_REPASSE -->
        <!-- =================== -->
        <div class="row">
         <div class="col-xs-12 col-sm-5">
          <!-- Placing a .qvplaceholder within a <qliksense-card> will create a cardified object -->
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G03A">
           </div>
          </qliksense-card>
         </div>
         <div class="col-xs-12 col-sm-7">
          <!-- Placing a .qvplaceholder within a <qliksense-card> will create a cardified object -->
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G03B">
           </div>
          </qliksense-card>
         </div>
        </div>
        <!-- ================================================================================================================== -->
        <!--<div class="row">
                             <b>DOSES REPASSADAS PELOS ESTADOS AOS MUNICÍPIOS</b>
                            </div>-->
        <!-- =================== -->
        <!-- KPIS -->
        <!-- =================== -->
        <!-- ====================================================== -->
        <!-- Ver os icones em https://fontawesome.com/v4.7.0/icons/ -->
        <!-- ====================================================== -->
        <div class="row kpi-row">
         <!--<div class="col-xs-12 col-sm-12 col-lg-3">
                                    <div class="kpi-side"><i class="fas fa-syringe"></i></div>
                                    <div class="kpi corkpi01 qvobject" id="KPI-01B"></div>
                                </div>
                                <div class="col-xs-12 col-sm-6  col-lg-3">
                                    <div class="kpi-side"><i class="fas fa-syringe"></i></div>
                                    <div class="kpi corkpi02 qvobject" id="KPI-02B"></div>
                                </div>
                                <div class="col-xs-12 col-sm-6 col-lg-3">
                                    <div class="kpi-side"><i class="fas fa-syringe"></i></div>
                                    <div class="kpi corkpi03 qvobject" id="KPI-03B"></div>
                                </div>-->
         <div class="col-xs-12 col-sm-6 col-lg-6">
          <div class="kpi-side">
           <i class="fas fa-syringe">
           </i>
          </div>
          <div class="kpi corkpi04 qvobject" id="KPI-04B">
          </div>
         </div>
        </div>
        <!--<div class="row">
                             <b>Esclarecimento: Doses em Trânsito são aquelas que estão sendo enviadas pelos Estados aos seus Municípios.</b>
                            </div>-->
        <!-- =================== -->
        <!-- MAPAS 0.4   MN RELOGIO -->
        <!-- =================== -->
        <div class="row">
         <div class="col-xs-12 col-sm-8">
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G04A">
           </div>
          </qliksense-card>
         </div>
         <div class="col-xs-12 col-sm-4">
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G04B">
           </div>
          </qliksense-card>
         </div>
        </div>
        <!-- =================== -->
        <!-- GRAFICOS 0.5 VACINA TEMPO   -->
        <!-- =================== -->
        <div class="row">
         <div class="col-xs-12 col-sm-6">
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G05A">
           </div>
          </qliksense-card>
         </div>
         <div class="col-xs-12 col-sm-6">
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G05B">
           </div>
          </qliksense-card>
         </div>
        </div>
        <!-- =================== -->
        <!-- GRAFICOS 0.5 TABELA   -->
        <!-- =================== -->
        <div class="row">
         <div class="col-xs-12 col-sm-6">
          <qliksense-card content-height="300px">
           <div class="with-title qvobject" id="QV1-G06A">
           </div>
          </qliksense-card>
         </div>
        </div>
        <!-- ====================================================== -->
        <!-- EXPORT -->
        <!-- ====================================================== -->
        <div class="row kpi-row">
         <div class="col-xs-12 col-sm-12 col-md-4">
          <div class="kpi white-2 qvobject" id="TXT-Origem" style="box-shadow:none">
          </div>
         </div>
         <div class="col-xs-12 col-sm-6 col-md-4">
          <div class="kpi white-2 qvobject" id="TXT-DTATU" style="box-shadow:none">
          </div>
         </div>
         <div class="col-xs-12 col-sm-6 col-md-4">
          <div class="kpi white-2 qvplaceholder" id="BT-EXPO" style="box-shadow:none">
          </div>
         </div>
        </div>
       </div>
      </div>
     </iron-pages>
     <!-- PAGINAS ## FIM ============================================================ -->
    </paper-drawer-panel>
    <!-- PAGINA UTIL FIM ============================================================ -->
   </paper-header-panel>
   <!-- PAGINA FIM ============================================================ -->
  </paper-drawer-panel>
  <!-- MODAL HELP INI ============================================================ -->
  <!-- Modal -->
  <div aria-hidden="true" class="modal fade" id="basic" role="basic" tabindex="-1">
   <div class="modal-dialog">
    <div class="modal-content">
     <div class="modal-header">
      <button aria-hidden="true" class="close" data-dismiss="modal" type="button">
      </button>
      <h4 class="modal-title">
       SOBRE ESTE PAINEL
      </h4>
     </div>
     <div class="modal-body">
      <p>
       Este painel apresenta informações sobre a distribuição de Vacinas contra a Covid-19, a partir do Ministério da Saúde.
       <br/>
       <br/>
       A fonte dos dados é a Secretaria de Vigilância Sanitária (SVS).
       <br/>
       <br/>
       Informações adicionais podem ser encontradas no site do
       <a href="https://saude.gov.br/">
        Ministério da Saúde
       </a>
       .
       <br/>
       ___________________________
       <br/>
       <br/>
       <img src="UsoPainel.png" style="width:565px;"/>
      </p>
     </div>
     <div class="modal-footer">
      <button class="btn dark btn-outline" data-dismiss="modal" type="button">
       Close
      </button>
     </div>
    </div>
    <!-- /.modal-content -->
   </div>
   <!-- /.modal-dialog -->
  </div>
  <!-- End Modal -->
  <!-- MODAL HELP FIM ============================================================ -->
  <div class="footer" style="z-index: 20000; height:34px; background-color:#ccc;">
   <div style="position:absolute; height:25px; top:10px; left:10px; text-align:left; color:#333;">
    Versão Beta - Maiores informações no site do
    <a href="https://saude.gov.br/">
     Ministério da Saúde
    </a>
   </div>
   <img src="LOGO_BASE.png" style="position:absolute; height:30px; width:145px; bottom:2px; right:10px;"/>
  </div>
  <script>
   var root = this.root;
        $(document).ready(function() {
            $("#nav-drawer paper-menu paper-item").click(function() {
                var index = $(this).index();
                Polymer.dom(root).querySelector("iron-pages").selectIndex(index);
            });
            $("#nav-menu-button").click(function() {
                Polymer.dom(root).querySelector("#nav-drawer").togglePanel();
            });
            $(window).resize(function() {
                Polymer.updateStyles();
            });
        });
  </script>
 </body>
</html>

enter image description here

If I search for that div element, I won't get the desired data.

What I need is a dictionary like this:

{"MG": 655588, "RJ":758120, ...}

The data from my example may change due update in dashboard.

How could I extract data from those charts, since they are not in any HTML tags?

Henrique Branco
  • 1,778
  • 1
  • 13
  • 40

1 Answers1

0

Approach:

One way would be to use selenium to get all those values for the chart you indicate. You can navigate to the page then move down to the table that lists the values also present in the chart of interest; click on the square icon top right to expand, and at max window size grab the elements list and create your dictionary.

There's some faff to determine the right block which entails an xpath which moves up from finding the right text, within a title attribute, to a level where the icon becomes available to interact with to expand the card.

//paper-card[*//h1[@title='Resumo das Doses Enviadas aos Estados']]//*[@id='icon']

I would welcome input from anyone who can find a way to knock out the hardcoded time.sleep(4).


Py:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

d = webdriver.Chrome()
d.maximize_window()
d.get('https://viz.saude.gov.br/extensions/CobVac_MOV/CobVac_MOV.html')   
target = WebDriverWait(d, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[title="Resumo das Doses Enviadas aos Estados"]')))
coord = target.location_once_scrolled_into_view
WebDriverWait(d, 10).until(EC.element_to_be_clickable((By.XPATH, "//paper-card[*//h1[@title='Resumo das Doses Enviadas aos Estados']]//*[@id='icon']"))).click()
time.sleep(4)
all_elements = [i.text for i in  
                WebDriverWait(d, 5).until(EC.presence_of_all_elements_located((By.XPATH, "//paper-card[*//h1[@title='Resumo das Doses Enviadas aos Estados']]//*[@class='qv-st-value-overflow']/span[@ng-bind='cell.text']")))
               if i.text]

results = dict(zip(all_elements[0::2][:-1], all_elements[1::2]))
print(results)
d.quit()

Output:

enter image description here


References: I read about specifying parent with child in xpath from @lavinio https://stackoverflow.com/a/1457668

QHarr
  • 83,427
  • 12
  • 54
  • 101