Any ideas of what more web page meta information I can use to classify a page relevance for some theme?

Question

I'm doing an algorithm to classify the relevance of a page for some theme like 'movies' using all meta information as possible, but excluding the textual content of the body.

I want to know what can I use to determine if a page has some info about the theme.

At the moment, I'm giving an importance of 40% for the title, 30% for the link after the domain, 20% for the domain and 10% for the meta keywords, but I think I can use more thing to be more precise. I'm matching some words with some weighting to calculate the relevance of the page.

Any ideas of what more can I use to calculate the relevance? I only want to exclude the text-content inside HTML itself, but the HTML structure can be used.

Nowadays a number of sites use [dublin core](http://dublincore.org/) based headers (meta tags). Maybe this helps? — home, Sep 03 '11 at 13:29
Your question title asks something (about page relevance) but the question content asks another (page theme/category). Do you want to classify if a webpage is in a category? Can you look at links anchor texts? — Felipe Hummel, Sep 04 '11 at 00:14
@Felipe I edited the title, I want the relevance for some theme. The relevance of a page for movies, or music, or games, or IT, etc. With meta information, I means all that is not the content itself of the page (like this message). This is because the page can have a lot of things in different context like my question, the answer, the related questions, the adversiments, etc. About the anchors, looks a good idea, I will think about it. Thanks! — Renato Dinhani, Sep 04 '11 at 02:23
@home Thanks for your idea, I take a better look at this, but I think that is no much pages use it,right? — Renato Dinhani, Sep 04 '11 at 02:24

score 0 · Accepted Answer · answered Sep 03 '11 at 16:23

I think you should think about the Main Menu links , and if is the case a Submenu links , so to make it more simple , LINKS . And you should also take in count the metadata . But still i em not sure what are you trying to achieve .

From what i understood you are trying to make some "relevancy" formula for a webpage .

Any ideas of what more web page meta information I can use to classify a page relevance for some theme?

1 Answers1