Questions tagged [similarity]

Similarity measures quantify how much alike objects (e.g. documents, feature vectors) are.

In information retrieval, is used to describe the relevance between document vectors. The measurement is further used to rank search results.

1866 questions
41
votes
4 answers

how to compute similarity between two strings in MYSQL

if i have two strings in mysql: @a="Welcome to Stack Overflow" @b=" Hello to stack overflow"; is there a way to get the similarity percentage between those two string using MYSQL? here for example 3 words are similar and thus the similarity should…
Lina
  • 2,090
  • 4
  • 20
  • 23
38
votes
10 answers

Calculating Binary Data Similarity

I've seen a few questions here related to determining the similarity of files, but they are all linked to a particular domain (images, sounds, text, etc). The techniques offered as solutions require knowledge of the underlying file format of the…
Chad Birch
  • 73,098
  • 23
  • 151
  • 149
36
votes
9 answers

Levenshtein distance: how to better handle words swapping positions?

I've had some success comparing strings using the PHP levenshtein function. However, for two strings which contain substrings that have swapped positions, the algorithm counts those as whole new substrings. For example: levenshtein("The quick brown…
thomasrutter
  • 114,488
  • 30
  • 148
  • 167
33
votes
5 answers

Color similarity/distance in RGBA color space

How to compute similarity between two colors in RGBA color space? (where the background color is unknown of course) I need to remap an RGBA image to a palette of RGBA colors by finding the best palette entry for each pixel in the image*. In the RGB…
Kornel
  • 97,764
  • 37
  • 219
  • 309
30
votes
2 answers

How to compute jaccard similarity from a pandas dataframe

I have a dataframe as follows: the shape of the frame is (1510, 1399). The columns represent products, the rows represent values (0 or 1) assigned by a user for a given product. How can I can compute jaccard_similarity_scores? I created a…
kitchenprinzessin
  • 1,023
  • 3
  • 14
  • 30
29
votes
6 answers

Similarity Score - Levenshtein

I implemented the Levenshtein algorithm in Java and am now getting the corrections made by the algorithm, a.k.a. the cost. This does help a little but not much since I want the results as a percentage. So I want to know how to calculate those…
N00programmer
  • 1,111
  • 4
  • 13
  • 17
28
votes
7 answers

Find cosine similarity between two arrays

I'm wondering if there is a built in function in R that can find the cosine similarity (or cosine distance) between two arrays? Currently, I implemented my own function, but I can't help but think that R should already come with one.
defoo
  • 5,159
  • 11
  • 34
  • 39
27
votes
7 answers

String similarity -> Levenshtein distance

I'm using the Levenshtein algorithm to find the similarity between two strings. This is a very important part of the program I'm making, so it needs to be effective. The problem is that the algorithm doesn't find the following examples as…
Fede Lerner
  • 457
  • 1
  • 6
  • 14
26
votes
3 answers

How to detect that two sentences are similar?

I want to compute how similar two arbitrary sentences are to each other. For example: A mathematician found a solution to the problem. The problem was solved by a young mathematician. I can use a tagger, a stemmer, and a parser, but I don’t…
SahelSoft
  • 615
  • 2
  • 9
  • 22
24
votes
3 answers

What is the paper "Oliver [1993]" describing a PHP algorithm to calculate text similarity?

There is a function similar_text() in the PHP library. The documentation (http://php.net/manual/en/function.similar-text.php) tells me that "This calculates the similarity between two strings as described in Oliver [1993]." Despite extensive…
jameshfisher
  • 34,029
  • 31
  • 121
  • 167
24
votes
2 answers

String similarity: how exactly does Bitap work?

I'm trying to wrap my head around the Bitap algorithm, but am having trouble understanding the reasons behind the steps of the algorithm. I understand the basic premise of the algorithm, which is (correct me if i'm wrong): Two strings: PATTERN…
Kevin
  • 2,617
  • 29
  • 35
23
votes
4 answers

Libpuzzle Indexing millions of pictures?

its about the libpuzzle libray for php ( http://libpuzzle.pureftpd.org/project/libpuzzle ) from Mr. Frank Denis. I´am trying to understand how to index and store the data in my mysql database. The generation of the vector is absolutly no problem.…
phpman
  • 253
  • 4
  • 6
23
votes
4 answers

what is a good metric for deciding if 2 Strings are "similar enough"

I'm working on a very rough, first-draft algorithm to determine how similar 2 Strings are. I'm also using Levenshtein Distance to calculate the edit distance between the Strings. What I'm doing currently is basically taking the total number of edits…
Hristo
  • 45,559
  • 65
  • 163
  • 230
23
votes
5 answers

How to compare image similarity using php regardless of scale, rotation?

I want to compare similarity between below images. Acording to my requirements I want to identify all of these images as similar, since it has use the same color, same clip art. The only difference in these images are rotation ,scale and the…
Tharu
  • 353
  • 1
  • 2
  • 11
22
votes
4 answers

Find similar images in (pure) PHP / MySQL

My users are uploading images to my website and i would like first to offer them already uploaded images first. My idea is to 1. create some kind of image "hash" of every existing image 2. create a hash of newly uploaded image and compare it with…
Tomáš Kapler
  • 439
  • 1
  • 4
  • 5