Chrome extension, making a link from key words in the body

Question

So that you understand my knowledge base, I am a computer engineering major, and am working a job right now at a medical company over the summer. I have little (almost zero web code experience) but that is mostly what my job wants me to do so I have been trying to figure it all out as fast as I can. I have used a lot of C, and Verilog and C++ in School, so computer "languages" are not new but I am having a hard time figuring this stuff all out.

Anyway, my first assignment has been to build an extension for Chrome that links into our Asterix phone server. All is has to do is find phone numbers on a webpage and turn them into a hyperlink, the hyperlink will be based on the phonenumber clicked, that part is trivial.

So, I read the W3 Schools stuff on HTML, JS, Ajax, Jquery, DOM ect.. so in the past 3 days I have learned a lot =)

This is what I produced:

It didn't seem like I needed a "backround.html" in my case because all I need to do is run a JS file once the page loads to find the phonenumbers and turn them into a link.

so I wrote a single manifest file, and a JS file to search the body for a number and put an tag around it, (currently going to www.google.com)

The good news is that it seems to work.

The bad news is that is makes Gmail freeze while loading, and makes hotmail not connect and not able to update and see new messages.

I didn't think you were even able to "break" the website in that way while making an extension.

All of my code is very small so I am just going to post it here.

manifest.json

   {
  "name": "Typenex Hyperlink-Dialer",

  "version": "1.0",
  "description": "This is a custom built extension for Typenex. This extension identifies phone numbers and allows the user to click the number to initiate a phonecall.",
  "permissions": [
    "tabs", "http://*/*", "https://*/*"
  ],

  "browser_action": {
      "default_title": "Typenex Hyperlink-Dialer",

      "default_icon": "typenex_logo.png"
  },

  "content_scripts" : [
    {
      "matches" : ["http://*/*", "https://*/*"],
      "js" : ["typenex_contentscript.js"],
      "run_at" : "document_idle",
      "all_frames" : false
    }
  ],

  "manifest_version": 2
}

typenex_contentscript.js

var arrayOfNumbers = [];
alert("hi");
var regex =  /\d*[/-]*[0-9][0-9][0-9][/ -]*[0-9][0-9][0-9][/ -]*[0-9][0-9][0-9][0-9][ ]*/g;
newBody = document.body.innerHTML;
var i = 0;
do
{
    temp = regex.exec(newBody);
    if (temp != null)
        arrayOfNumbers[i] = temp;
    i++
}
while (temp)
for (var i = 0; i < arrayOfNumbers.length; i++)
{
    newBody = newBody.replace(arrayOfNumbers[i], "<a href='http://www.google.com'>" + arrayOfNumbers[i] + "</a>");
}
document.body.innerHTML = newBody;

I am grateful for any help I can get, if it seems like I am misunderstanding something and you know something I can read that could help that would be great, I have been Google'ing a lot but I might not know enough to even be asking the right question.

I am very open minded if any of you have a better method to tackle this simple extension =)

Just to clarify, it works fine for you on test sites, or sites other than gmail and hotmail? — DigTheDoug, Jul 27 '12 at 23:12
Yes, in my test sites it works fine, but for sites like gmail and Hotmail it doesn't seem to work because it makes the site load endlessly. In Hotmail if I disable the extension then open an email with a number in it, enable the extension and refresh, it works, but Hotmail notifies me that it has lost connection to the Hotmail servers. I have had a few issues with it on all Google sites really. — njfife, Jul 28 '12 at 01:15
In your code you're replacing the entire innerHTML at the end, I wonder if re-creating the DOM is forcing it to reestablish connections with old keys or signatures or something to that effect. Have you tried simply updating the number elements themselves in place, rather than rewriting the entire page? — DigTheDoug, Jul 28 '12 at 03:31
It's a **terrible** practice to replace the body using `.innerHTML`. All dynamically bound properties and events are lost in this way. Furthermore, it also breaks when the HTML contains the numbers. The correct way to do this is by looping through all DOM nodes, and replace **text nodes** only. To learn more about DOM and JS, see https://developer.mozilla.org/en/DOM/. To learn more about JS/HTML/CSS, see https://developer.mozilla.org/. Also, never visit w3schools again, for the reasons as stated here: http://w3fools.com/ — Rob W, Jul 28 '12 at 13:11
That makes sense, I thought about replacing the whole body and if that is causing issues, I guess it is. I will try to loop through the actual DOM nodes and make it work that way. My only question is this: It IS possible to have both text AND more children in a node like body right? So what if the phone-number was in the text in Body which also had children? That is the whole reason I did the full page replace thing. Is there a way to identify that a node has text and children? — njfife, Jul 28 '12 at 13:48
Also, thanks for the new resources Rob, like I said, web coding is a new world to me, and it is easy to get the wrong info, I just assumed W3schools would be the best source because of it's name. — njfife, Jul 28 '12 at 13:56
Okay, I figured it out, I was thinking about text nodes very wrong, thanks for your help, I am sure I will have this working correctly pretty soon now. — njfife, Jul 28 '12 at 14:33

PAEz · Accepted Answer · 2013-01-09T09:07:20.573

I wondered a few times on what's the best way to get the text nodes and meant to look at TreeWalking, so I did this time. Following is the test page I made, I can't say if this is the best way but may suite your needs.

treewalker.html

<html>
  <head>
    <style>
    </style>
    <script src="treewalker.js"></script>
  </head>
  <body>
    <div>This is a div</div>
    <div><div id='testevent'>Test event</div>This is a div 000-000-0000</div>
    <div>This is a div 000-000-0000</div>
     <div>This is<a href='sf'>bleh 000-000-0000 a div</a></div>
  </body>
</html>

treewalker.js

function onLoad() {

  document.querySelector('#testevent').onclick = function() {
    alert('clicked')
  };

  // Here starts the bit for your content script
  var re = /(\d*[/-]*[0-9][0-9][0-9][/ -]*[0-9][0-9][0-9][/ -]*[0-9][0-9][0-9][0-9][ ]*)/g;
  var regs;

  var walker = document.createTreeWalker(
  document.body, NodeFilter.SHOW_TEXT, function(node) {
    if((regs = re.exec(node.textContent))) {
      // make sure the text nodes parent doesnt have an attribute we add to know its allready been highlighted
      if(!node.parentNode.classList.contains('highlighted_text')) {
        var match = document.createElement('A');
        match.appendChild(document.createTextNode(regs[0]));
        match.href = 'http://www.google.com';

        // add an attribute so we know this element is one we added
        // Im using a class so you can target it with css easily
        match.classList.add('highlighted_text');

        var after = node.splitText(regs.index);
        after.nodeValue = after.nodeValue.substring(regs[0].length);
        node.parentNode.insertBefore(match, after);
      }
    }
    return NodeFilter.FILTER_SKIP;
  }, false);

  // Make the walker step through the nodes
  walker.nextNode();

  // and it ends here
}

(function() {
  document.addEventListener("DOMContentLoaded", onLoad);
})();

Code Stolen From

http://paul.kinlan.me/dom-treewalker/
Thats where I got the treewalker code from. Problem with his sample is it wraps the match using innerHTML on the parent (a lot of the examples do), this kills the event in the test page.

http://www.the-art-of-web.com/javascript/search-highlight/
Showed how to split the text node properly. And for all I know is a better way of doing this, but I was interested in the TreeWalker way.

EDIT
I just updated it because if you ran the old version (click the Edited link below to see it) failed on the html in this new version. For some reason that I really dont understand it wouldn't wrap the second numner. This new version doesn't work the way all the examples I saw did and seems an abusive way to use TreeWalker...but it works!

PAEz, I thought I had this all worked out but I ran into so many issues, thanks for this. It looks like (at first looking over how you are handling this) that this is exactly what I need. I was having issues where I couldn't replace the text and have the browser treat the new tags as part of the DOM. Interesting stuff, this is not what I expected at all out of this! — njfife, Aug 01 '12 at 02:34
@PAEz what if I want to filter nodes that I have already updated or replaced for eg If your script gets executed more than once it keeps on replacing the text/number. Any tips on avoiding that? — Nikhil Bhandari, Jan 08 '13 at 18:39
@NikhilBhandari Yep. We can just add an attribute for the node when we create it and then ignore any node that has that attribute. Check the answer, Ive updated the source for you. — PAEz, Jan 09 '13 at 09:06
I am running into this problem myself with a chrome extension I am writing. Does this handle dynamic dom editions? If something new is loaded over ajax, like gmail does, with this solve the problem? — Victor 'Chris' Cabral, Aug 11 '13 at 23:13
@Victor'Chris'Cabral No it wont. You'll either need to incorporate Mutation Observers or make something specific if its for a specific site. Ive only used Mutation Observers a couple of times and at first it all seems very confusing and long winded, but youll get the hang of it and then there quite cool. — PAEz, Aug 12 '13 at 15:50

Chrome extension, making a link from key words in the body

1 Answers1

Linked