3

I'd like to know the number of blocked trackers detected by Ublock Origin using Python (running on linux server, so no GUI) and Selenium (with firefox driver). I don't necessarly need to really block them but i need to know how much there are.

Ublock Origin has a logger (https://github.com/gorhill/uBlock/wiki/The-logger#settings-dialog)) which i'd like to scrap.

This logger is available through an url like this: moz-extension://fc469b55-3182-4104-a95c-6b0b4f87cf0f/logger-ui.html#_ where the part in italic is the UUID of Ublock Origin Addon.

In this logger, for each entry, there is a div with class set to "logEntry" (yellow oblong in the screenshot below), and i'd like to get the datas in the green oblong: enter image description here

So far, i got this:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options as FirefoxOptions
browser_options = FirefoxOptions()
browser_options.headless = True
              
#   Activate add on
str_ublock_extension_path = "/usr/local/bin/uBlock0_1.45.3b10.firefox.signed.xpi"
browser = webdriver.Firefox(executable_path='/usr/loca/bin/geckodriver',options=browser_options)        
str_id  = browser.install_addon(str_ublock_extension_path)
        
#   Getting the UUID which is new each time the script is launched
profile_path = browser.capabilities['moz:profile']    
id_extension_firefox = "uBlock0@raymondhill.net"
with open('{}/prefs.js'.format(profile_path), 'r') as file_prefs:
     lines = file_prefs.readlines()
     for line in lines:
     if 'extensions.webextensions.uuids' in line:
         sublines = line.split(',')
         for subline in sublines:
             if id_extension_firefox in subline:
                internal_uuid = subline.split(':')[1][2:38]
                                    
        str_uoo_panel_url = "moz-extension://" + internal_uuid + "/logger-ui.html#_"
        ubo_logger = browser.get(str_uoo_panel_url)
        ubo_logger_log_entries = ubo_logger.find_element(By.CLASS_NAME, "logEntry")
        
        for log_entrie in ubo_logger_log_entries:
            print(log_entrie.text)
    

Using this "weird" url with moz-extension:// seems to work considering that print(browser.page_source) will display some relevant html code.

Problem: ubo_logger.find_element(By.CLASS_NAME, "logEntry") got nothing. What did i did wrong?

8oris
  • 320
  • 2
  • 12

1 Answers1

2

I found this to work:

parent = driver.find_element(by=By.XPATH, value='//*[@id="vwContent"]')
children = parent.find_elements(by=By.XPATH, value='./child::*')

for child in children:
    attributes = (child.find_element(by=By.XPATH, value='./child::*')).find_elements(by=By.XPATH, value='./child::*')
    print(attributes[4].text)

You could then also do:

if attributes[4].text.isdigit():
    result = int(attributes[4].text)

This converts the resulting text into an int.

Teddy
  • 452
  • 1
  • 12
  • Thanks Teddy. After running some tests, it appears that ```find_element(by=By.XPATH, value =``` is incorrect. It should be ```parent.find_elements_by_xpath('./child::*')``` – 8oris Jan 05 '23 at 10:38
  • Teddy solution didn't work at all. Lots of errors here and there. First, you can't loop through children (see there: https://stackoverflow.com/questions/39356818/selenium-attributeerror-list-object-has-no-attribute-find-element-by-xpath). – 8oris Jan 05 '23 at 12:27
  • 1
    Well that depends on your version. `find_elements(by=By.XPATH, value='./child::*')` for newer versions and `find_elements_by_xpath('./child::*')` for older versions. – Teddy Jan 05 '23 at 12:27
  • @8oris what kind of errors are you talking about? – Teddy Jan 05 '23 at 12:31
  • ```children = parent[0].find_elements_by_xpath('./child::*') ``` instead of ```children = parent.find_elements(by=By.XPATH, value='./child::*')``` – 8oris Jan 05 '23 at 12:37