how to group similar url using the DBSCAN algorithm. I have seen many datasets but none were on url , I want to take similar type of urls and group it together. Here i am not able to know distance (eps) and minpoints can be the number of urls to be grouped.
Asked
Active
Viewed 2,001 times
1 Answers
3
DBSCAN needs a distance function and a threshold for detecting similar objects.
So go ahead, first you need to define an appropiate distance function and a threshold, then we can help you with DBSCAN (but you should be able to find DBSCAN implementations that can be extened to arbitrary distance functions).
The key challenge is the distance, and this is up to you, because we do not know what you want to get out. This is very subjective, and we just don't know what you want or need.

Has QUIT--Anony-Mousse
- 76,138
- 12
- 138
- 194
-
Yes the distance function comes when there are points to be detected on a graph, how can I consider it in taking of url by just matching the domain extentions that are similar and grouping them – Steven Dsouza Sep 14 '12 at 10:47
-
1DBSCAN does not use graphs. It uses distance functions, so you need to define a distance function for your URLs. – Has QUIT--Anony-Mousse Sep 14 '12 at 10:54
-
Yeah, I need to know how it can be defined for URLs – Steven Dsouza Sep 14 '12 at 11:00
-
1It depends on **YOU**, what **YOU** want. We cannot answer this for you, but you will have to do some research and find out what you want. – Has QUIT--Anony-Mousse Sep 14 '12 at 13:27