2

I'm in trouble with saving my data to redis with python code. just using redis and r.ft()

the uploading data is going to be like this. also I want to refresh the embeddings in a different values in same ids.

id is the data index and embeddings are going to be flatten with same shape between all datas. (ex. 1024) id embeddings 0 [3.1515, 4.5562, ..., ] 1 [3, 8.62, ..., ]

after uploading redis, I want to search a certain batch embeddings with redis.

if the input batch shape is [3, 1024] then the search should be iterative to the batch and return [3, top-k] similar ids that has similarity with embeddings in redis.

it is really hard for me to make this right now. waiting for help.

ddrong
  • 21
  • 1

1 Answers1

3

A few helpful links first: This notebook has some helpful examples, here are the RediSearch docs for using vector similarity, and lastly, here's an example app where it all comes together.

To store a numpy array as a vector field in Redis, you need to first create a search index with a VectorField in the schema:

import numpy as np
import redis

from redis.commands.search.indexDefinition import (
    IndexDefinition,
    IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
    TextField,
    VectorField
)

# connect
r = redis.Redis(...)

# define vector field
fields = [VectorField("vector",
    "FLAT", {
        "TYPE": "FLOAT32",
        "DIM": 1024,  # 1024 dimensions
        "DISTANCE_METRIC": "COSINE",
        "INITIAL_CAP": 10000, # approx initial count of docs in the index
    }
)]

# create search index
r.ft(INDEX_NAME).create_index(
    fields = fields,
    definition = IndexDefinition(prefix=["doc:"], index_type=IndexType.HASH)
)

After you have an index, you can write data to Redis using hset and a pipeline. Vectors in Redis are stored as byte strings (see tobytes() below):

# random vectors
vectors = np.random.rand(10000, 1024).astype(np.float32)

pipe = r.pipeline(transaction=False)
for id_, vector in enumerate(vectors):
    pipe.hset(key=f"doc:{id_}", mapping={"id": id_, "vector": vector.tobytes()})
    if id_ % 100 == 0:
        pipe.execute() # write batch
pipe.execute() # cleanup

Out of the box, you can use a pipeline call to query Redis multiple times with one API call:

base_query = f'*=>[KNN 5 @vector $vector AS vector_score]'
query = (
    Query(base_query)
    .sort_by("vector_score")
    .paging(0, 5)
    .dialect(2)
)
query_vectors = np.random.rand(3, 1024).astype(np.float32)

# pipeline calls to redis
pipe = r.pipeline(transaction=False)
for vector in query_vectors:
    pipe.ft(INDEX_NAME).search(query, {"vector": query_vector.tobytes()})
res = pipe.execute()

Then you will need to unpack the res object that contains the raw response for all three queries from Redis. Hope this helps.