0

Possible Duplicate:
How to apply DBSCAN algorithm on grouping of similar url

I have been told to do a project in my final sem, as the project involves in clustering similar strings using DBSCAN. I want to know can this be done using DBSCAN, If yes then how can i implement the same.

Community
  • 1
  • 1
  • What type of strings are we talking about? Long articles? Short snippets? Single words? What characters can be in the strings? I'm really tempted to close this as a too broad question (questions here should be practical, well-scoped problems), but I really like the subject so please clarify as much as you can. – Emil Vikström Sep 16 '12 at 09:24
  • yes Sir, Strings in the sense it can be part of a word or even a word both can be done for eg. www.sss.com and www.ddd.com here both end with com so I want to group these type of URLs and other with domain ext. org can be grouped. and rest be considered as noise. – Steven Dsouza Sep 16 '12 at 09:38

1 Answers1

1

As I told you earlier (at How to apply DBSCAN algorithm on grouping of similar url ), this is possible.

But YOU need to define the similarity you need for your application.

Nobody on stackoverflow will be able to help you with that, unless you are very clear on what kind of similarity you need.

There are lots of string metrics available, and you need to find out what works for your particular problem:

https://en.wikipedia.org/wiki/String_metric

Community
  • 1
  • 1
Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Sir, I have very well told you abt the type of similarity I need (2nd comment above). Thank you for the link. This wud not let me post another question of same type. – Steven Dsouza Sep 16 '12 at 11:37
  • You have not told us *way enough*. It is much too imprecise. In particular, what type of similarities you have **tried** so far. – Has QUIT--Anony-Mousse Sep 16 '12 at 13:23