1

I'm running Selenium with Firefox in Python, and I'm trying to match Elements on a page from keywords in a list.

For the element lookup to be successful, I need to get rid of some special characters like ® and ™ on the web page. I am unfortunately not able to predict when such characters are employed, and I therefore can't add them on the "keyword end" of the problem.

I don't think that Selenium or Firefox itself can remove unwanted characters from a webpage, but my thought was to have Selenium execute a JavaScript on the page and remove those characters. Is that possible?

Something like this presumably non-working, pseudo-code:

driver.execute_script("document.body.innerHTML.replace(/®/g, '');")

The replacement should happen before the driver tries to "read" the page and find_element.

FYI the characters I want to get rid of are in <a> text() nodes in <td> cells across the document body.

P A N
  • 5,642
  • 15
  • 52
  • 103

1 Answers1

2

ASCII is in range of 0 to 127, so you can do it this way:

document.body.innerHTML.replace(/[^\x00-\x7F]/g, '');

If you want to remove only ® you can do it this way:

document.body.innerHTML.replace(/(®)/, '');
  • This worked for me: `driver.execute_script("var replaced = $('body').html().replace(/(®)/g,''); $('body').html(replaced);")`, from this [thread](https://stackoverflow.com/a/10550100/4909923) with your help. – P A N Sep 24 '17 at 12:21
  • I think it is much shorter and easier to read way: `driver.execute_script("document.body.innerHTML.replace(/(®)/, '');")` – Mateusz Ostaszewski Sep 24 '17 at 12:23
  • I agree with you there. But when running that command in the Chrome Javascript console, it didn't re-render the page (although I could see HTML output in the response console). Maybe I'm missing something? – P A N Sep 24 '17 at 12:26
  • Please try this code, i am sure it works: `page_source = driver.execute_script("return document.body.innerHTML.replace(/(®)/‌​, '');")` Now "page_source" contains re-rendered page source. – Mateusz Ostaszewski Sep 24 '17 at 12:49