Questions tagged [locality-sensitive-hash]

Locality-sensitive hashing (LSH) is a method of probabilistic dimension reduction.

Locality-sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high-dimensional data. The basic idea is to hash the input items so that similar items are mapped to the same buckets with high probability (the number of buckets being much smaller than the universe of possible input items).

97 questions
0
votes
1 answer

Cannot find the rows using sorting, writing after LSH

I've used LSH after ALS algorithm using pyspark and all seems works fine till I accidentally saw that I had some lost rows during the exploring. All was implemented with help of Spark LSH documentation example…
0
votes
1 answer

Matlab: reshape 4-d matrix to 2-d and maintain order, how to?

I'm trying to implement vlsh with the California ND Dataset, which is composed by 701 photos. 10 subject wrote down in a txt file which photos are near duplicate for them, and we have also correlation matrix. The images are RGB and I reduced them in…
0
votes
0 answers

how to get the output for LSH ANN by karlhigley/spark-neighbors using euclidean or cosine distance

I am working with the ANN using Locality Sensitive Hashing from https://github.com/karlhigley/spark-neighbors. I tried different distances: hamming, euclidean and cosine; but when I actually want to see the results calling method collect() this only…
0
votes
1 answer

Deep learning model to find similar images (locality sensitive hashing)

There are different pictures of the same object. The pictures made from different angles, so while the object on the picture is the same, the pictures itself could be quite different. Is there an example or ready to use deep learning model that will…
0
votes
1 answer

Cannot get faster results via yarn when running spark in a hadoop cluster

Applying an LSH algorithm in Spark 1.4 (https://github.com/soundcloud/cosine-lsh-join-spark/tree/master/src/main/scala/com/soundcloud/lsh), I process a text file (4GB) in a LIBSVM format (https://www.csie.ntu.edu.tw/~cjlin/libsvm/) to find…
mlee_jordan
  • 772
  • 4
  • 18
  • 50
0
votes
0 answers

Unique identifier generation despite the presence of near duplicates

I have an “entity resolution” type of use case, where I have several (< 100) device features available for many (a few millions of) devices. My goal is to generate ids for these devices. The challenge is that the same device might have two or more…
0
votes
0 answers

Similar images: Bag of Features / Visual Word or matching descriptors?

I have an application where given a reasonable amount of images (let's say 20K) and a query image, I want to find the most similar one. An reasonable approximation is feasible. In order to guarantee precision in representing each image, I'm using…
0
votes
1 answer

Non-empty buckets in LSH

I'm reading this survey about LSH, in particular citing the last paragraph of section 2.2.1: To improve the recall, L hash tables are constructed, and the items lying in the L (L ′ , L ′ < L) hash buckets h_1 (q), · · · , h_L (q) are retrieved…
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
0
votes
1 answer

Bag of Features / Visual Words + Locality Sensitive Hashing

PREMISE: I'm really new to Computer Vision/Image Processing and Machine Learning (luckily, I'm more expert on Information retrieval), so please be kind with this filthy peasant! :D MY APPLICATION: We have a mobile application where the user takes a…
0
votes
0 answers

Global vector descriptor

Usually, algorithms as SIFT, SURF and many others provdies a set of k keypoints and the associated descriptor in d dimension (for example, in SIFT each descriptor has d=128 dimensions). So, in order to describe an image we need a matrix kxd (k…
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
0
votes
1 answer

Image procssing - What are kernel space, functions and data?

I'm reading Kernelized Locality-Sensitive Hashing, which is obviously based on the concept of kernel applied to space, functions and data. I'm not confident with this concept in math and image processing too (since it's not my domain, sorry if I'm…
justHelloWorld
  • 6,478
  • 8
  • 58
  • 138
0
votes
1 answer

Finding similar products using LSH on structured data

I am trying to build a similar product using LSH and I have following query. My data has following schema id: long, title: string, description: string, category: string, price: double, inventory_count: int, active: boolean, date_added:…
0
votes
0 answers

How to detect questions similarity using Locality Sensitive Hashing?

We are trying to implement question similarity detection using Locality Sensitive Algorithm. We are using lshash python package. our objective is to achieve similarly "How question suggestions works on Stackvoerflow" Following is our sample data…
0
votes
1 answer

How to take random projections in LSH when there are both Numerical and Categorical Data?

Note : Using LSH for a Nearest Neighbor Query Assuming the data set has 5 features (f1,f2,..,f5) Where the first 2 are Numerical and 3 are categorical. And one or many of these categories maybe something like username or subject which would be quite…
0
votes
2 answers

Using opencv, how do you maintain the same number of BRIEF vectors per image

I have a dataset of images where I have applied the BRIEF method to. I used the following tutorial: http://docs.opencv.org/trunk/doc/py_tutorials/py_feature2d/py_brief/py_brief.html Currently the size of the matrices vary quite dramatically. I am…