Is there a name for this browser technique?

Question

Is there a name for this technique that consists in exploring a page open in the browser to find specific content and modify it?

Some examples:

Skype finds phone numbers on a page, and attaches a call menu
a script finds percentages in a page and replaces them with a small pie
an advertising engine finds keywords in the page and converts them into hyperlinks
add an icon next to all the hyperlinks on the page that point to another domain
etc.

I understand that it is a kind of progressive enhancement. But I am specifically interested in the first step, the content discovery process. I'd be interested in articles that offer best practices, or explain the shortcomings of this technique.

Edit: I added an example to show that this technique is not just for text nodes, but can apply to any kind of html content.

Sime Vidas: sure, I do this all the time. But this doesn't tell me much about best practices and shortcomings! — Christophe, Dec 07 '11 at 19:51
@Christophe The DOM traversal API is implemented in all browsers. It's fast and straightforward. This also goes for string manipulation. I can't think of any shortcomings. — Šime Vidas, Dec 07 '11 at 20:00
An example of issue I'm facing: when content is added asynchronously to the DOM. — Christophe, Dec 07 '11 at 20:10
@Christophe The last example: 1. get all anchors on the page, 2. for each anchor, analyze its `href` property, 3. add a CSS class to those anchors that have a foreign domain. This is pretty straightforward, I don't see what sort of best practices you're after.. — Šime Vidas, Dec 07 '11 at 20:40
@Christophe So you would like to be notified whenever a new anchor is added to the DOM, so that you can conditionally show an icon next to it? — Šime Vidas, Dec 07 '11 at 20:45
Sime Vidas: that's the idea. But it could also be that an anchor is removed, or its href is modified dynamically, etc. Or, in my pie example, it could be a value that is updated every 30 seconds. I am trying to understand the pattern, not solve a specific issue. — Christophe, Dec 07 '11 at 21:31

score 5 · Answer 1 · answered Dec 07 '11 at 19:51

5

For example, execute this code for this web-page (from the console), and all numbers on the page will be replaced with "X":

function walkTheDOM( node, func ) {
    func( node );
    node = node.firstChild;
    while ( node ) {
        walkTheDOM( node, func );
        node = node.nextSibling;
    }
}

walkTheDOM( document.body, function ( node ) {
    if ( node.nodeType === 3 ) {
        node.data = node.data.replace( /\d/g, 'X' );
    }
});

enter image description here

answered Dec 07 '11 at 19:51

Šime Vidas

182,163
62
281
385

Thanks for example. I realized that my initial examples were too specific and edited the question. – Christophe Dec 07 '11 at 20:04
@EdS. Of course you can. SHIFT + ENTER. You can also write your code somewhere else and then just copy-paste it into the console... – Šime Vidas Dec 07 '11 at 22:21
@ŠimeVidas: Thank you. I tried Ctrl+Enter and Shift+Enter... then I googled it... found nothing... moved on. I'm not a web dev, I just dabble. Thanks again. – Ed S. Dec 07 '11 at 22:22
I call that greasemonkey scripts ^^ – Guillaume86 Dec 15 '11 at 22:29

score 0 · Answer 2 · edited May 23 '17 at 10:08

This is functionality is called Add-ons and the technic used by these is DOM traversing

The cases you describe is not something specific to one site, but appear on every site you visit, so there must be some extra functionality added to your browser. This often happen when checking on install toolbars etc when installing a new software like Skype

The technic can be called recognition (as in PNR, Skype Phone Number Recognition), and what they are doing is traversing your site DOM .

This add ons describe above probably runs only on page load, so content added later on with ajax will not be affected.

If its your own add-on there is a way to access it with javascript ad described here: how to call a function in Firefox extension from a html button.

Take also a look at GreaseMonkey and jQuery traversing.

Try to trigger a hashchange() after DOM content is loaded with ajax and see if addon runs and appends it stuff again. — , Dec 07 '11 at 20:35

score 0 · Answer 3 · answered Dec 15 '11 at 22:26

So the conclusion for now is that there doesn't seem to be a name or established practices for this technique.

Thanks to those who have mentioned search engines, it makes sense to see it as a local search, with an effort to interpret the content and structure.

hrishikeshp19 · Answer 4 · 2011-12-17T05:26:24.610

-1

Summarization

It is the technique used in all the web crawlers. Please have a look at open source well documented web crawler/search engine Yioop!

edited Dec 17 '11 at 05:26

answered Dec 07 '11 at 19:44

hrishikeshp19

8,838
26
78
141

I don't know, but maybe your answer would need to be more detailed? I have looked up definitions of summarization, but didn't find anything directly related to the question. Also, I followed your link to Yioop, but didn't see any documentation. – Christophe Dec 15 '11 at 21:22
Could anyone mind to give reason for a downvote. Please see when to downvote. http://stackoverflow.com/privileges/vote-down – hrishikeshp19 Feb 02 '12 at 18:54

score -1 · Answer 5 · edited Dec 07 '11 at 19:58

-1

As it is already said it is call summarization but you can find about it more searching therm "web crawling bot/technique/robot". Here some starting document you might find useful:

Crawling the Web

edited Dec 07 '11 at 19:58

kapex

28,903
6
107
121

answered Dec 07 '11 at 19:51

Siblja

859
2
12
19

Is there a name for this browser technique?

5 Answers5