Questions tagged [hamming-distance]

The Hamming distance is a mathematical distance function for a pair of strings (sequences) that can be computed with a binary calculation. It counts the number of symbols in the string that are different. Posts that are not about implementation may belong on https://math.stackexchange.com.

For the special case of two binary strings, it may be implemented as the bitcount of their XOR:

int H(int a, int b){
    return popcount(a^b);
}
291 questions
5
votes
1 answer

Eliminating sequences in a message

I have an odd communications channel, and I need to detect errors as well as eliminate certain sequences in the channel. Each message is 12 bits long, split into 3 nibbles (of 4 bits each). I need to extract at least 450 different codes out of this,…
Adam Davis
  • 91,931
  • 60
  • 264
  • 330
4
votes
2 answers

Bit string nearest neighbour searching

I have hundreds of thousands of sparse bit strings of length 32 bits. I'd like to do a nearest neighbour search on them and look-up performance is critical. I've been reading up on various algorithms but they seem to target text strings rather than…
user1305541
  • 205
  • 1
  • 4
  • 14
4
votes
1 answer

Fast computation of pairs with least hamming distance

Problem Suppose you have N (~100k-1m) integers/bitstrings each K (e.g. 256) bits long. The algorithm should return the k pairs with the lowest pairwise Hamming distance. Example N = 4 K = 8 i1 = 00010011 i2 = 01010101 i3 = 11000000 i4 =…
rfalke
  • 451
  • 4
  • 7
4
votes
1 answer

OpenMP takes more time than expected

So, I am facing some difficulties using openMp. I am beginner and I do not know what I am doing wrong. This is a project for one of my courses at University, so I do not seek for the solution, rather for a hint or an explanation. The project is to…
4
votes
1 answer

Optimized CUDA matrix hamming distance

Is anyone aware of an optimized CUDA kernel for computing a GEMM style hamming distance between two matrices of dimension A x N and N x B? The problem is nearly identical to GEMM, but instead computes the sum( a_n != b_n ) for each vector {1 ...…
lakinsm
  • 105
  • 11
4
votes
1 answer

Python - grouping defaultdict values by hamming distance in keys

I have a default dict with ~700 keys. The keys are in a format such as A_B_STRING. What I need to do is split the key by '_', and compare the distance between 'STRING' for each key, if A and B are identical. If the distance is <= 2, I want to group…
4
votes
3 answers

Fastest way to compute all 1-hamming distanced neighbors of strings?

I am trying to compute hamming distances between each node in a graph of n nodes. Each node in this graph has a label of the same length (k) and the alphabet used for labels is {0, 1, *}. The '*' operates as a don't care symbol. For example, hamming…
begumgenc
  • 393
  • 1
  • 2
  • 12
4
votes
2 answers

Interpreting Hamming Distance speed in python

I've been working on making my python more pythonic and toying with runtimes of short snippets of code. My goal to improve the readability, but additionally, to speed execution. This example conflicts with the best practices I've been reading…
Scrocco
  • 133
  • 1
  • 4
4
votes
1 answer

Find all k-nearest neighbors

Problem: I have N (~100m) strings each D (e.g. 100) characters long and with a low alphabet (eg 4 possible characters). I would like to find the k-nearest neighbors for every one of those N points ( k ~ 0.1D). Adjacent strings define via hamming…
4
votes
2 answers

Sum of Hamming Distances

I started preparing for an interview and came across this problem: An array of integers is given Now calculate the sum of Hamming distances of all pairs of integers in the array in their binary representation. Example: given {1,2,3} or…
4
votes
1 answer

How do I find the k-nearest values in n-dimensional space?

I read about kd-trees but they are inefficient when the dimensionality of the space is high. I have a database of value and I want to find the values that are within a certain hamming distance of the query. For instance, the database is a list of…
4
votes
2 answers

Word distance algorithm for OCR

I am working with an OCR output and I'm searching for special words inside it. As the output is not clean, I look for elements that match my inputs according to a word distance lower than a specific threshold. However, I feel that the Levenshtein…
zenbeni
  • 7,019
  • 3
  • 29
  • 60
4
votes
2 answers

Fast way of doing k means clustering on binary vectors in c++

I want to cluster binary vectors (millions of them) into k clusters.I am using hamming distance for finding the nearest neighbors to initial clusters (which is very slow as well). I think K-means clustering does not really fit here. The problem is…
NightFox
  • 125
  • 1
  • 10
4
votes
4 answers

The hunt for the fastest Hamming Distance C implementation

I want to find how many different characters two strings of equal length have. I have found that xoring algorithms are considered to be the fastest, but they return distance expressed in bits. I want the results expressed in characters. Suppose that…
pdrak
  • 404
  • 1
  • 4
  • 12
3
votes
2 answers

What is the most efficient way to identify text similarity between items in large lists of strings in Python?

The following piece of code achieves the results I'm trying to achieve. There is a list of strings called 'lemmas' that contains the accepted forms of a specific class of words. The other list, called 'forms' contains a lot of spelling variations of…
jfontana
  • 141
  • 1
  • 6