0

I would like to iterated cross KV (NoSql,Redis) Target in MLRun without specification of primary key(s) also, but I saw only ability to get specific KV item(s) based on specific key(s). I have to use only unique items, not duplicit items.

I chose NoSql/Redis Target because key operations in-memory are critical for quick responce (based on specific key), but from time to time I have to iterate cross whole collection of keys (and it is case of this question).

You can see part of my code for getting values based on keys:

import mlrun 
import mlrun.feature_store as fs
...

svc=fs.get_online_feature_service(fs.FeatureVector("myVector", ["client.*"]))
resp = svc.get([{"cuid": 1926, "key1": 100023}])

Do you know, how to iterate cross items in KV (NoSql, Redis) target in MLRun CE/Paid editions (version 1.2.1)?

JIST
  • 1,139
  • 2
  • 8
  • 30
JzD
  • 22
  • 11

2 Answers2

1

The NoSqlTarget is using a KV store which is inherently for retrieving an individual value based on a given key. If you are looking to retrieve the entirety of your feature vector, you should use the offline store instead. More info in the MLRun docs here

Alternatively, you could use the offline store to get a list of keys and then query the online store using those:

import mlrun.feature_store as fstore

df = fstore.get_offline_features("my-vec").to_dataframe()
keys = list(resp.to_dataframe().index.unique())

svc = fstore.get_online_feature_service("my-vec")
svc.get([{"my-key" : key} for key in keys[:5]])
Nick Schenone
  • 209
  • 1
  • 7
  • I have to use unique items (based on items rewriting) and based on that offline store such as parquet is not useful. – JzD Apr 07 '23 at 19:31
  • The ParquetTarget includes all records that have been ingested. This includes all versions of the items that you are writing. You can pass in a given time range to specify what you are looking for Alternatively you can use offline store to get your list of keys to iterate over – Nick Schenone Apr 07 '23 at 19:48
  • Is it possible to iterate cross NoSqlTarget also? – JzD Apr 07 '23 at 19:51
  • No the KV source allows you to get one value for a given key. You need to provide the key for an efficient lookup. If you want a list of all the keys you can use the ParquetTarget and use that to iterate over Not entirely sure what you're trying to do - not much to go off of in your question. If you provide a little more information on your use case there may be a better solution – Nick Schenone Apr 07 '23 at 19:59
  • I updated original question for better clarification. – JzD Apr 07 '23 at 20:13
  • 1
    My recommendation is the same ```python import mlrun.feature_store as fstore df = fstore.get_offline_features("my-vec").to_dataframe() keys = list(resp.to_dataframe().index.unique()) svc = fstore.get_online_feature_service("my-vec") svc.get([{"my-key" : key} for key in keys[:5]]) ``` – Nick Schenone Apr 07 '23 at 20:24
  • It is little overkill for me to keep whole history in parquet, but thanks for this useful solution. Please can you mention your solution directly in answer. If I understand it correctly, there is no chanse to iterate via NoSqlTarget (therefore Parquet). – JzD Apr 07 '23 at 20:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253039/discussion-between-nick-schenone-and-jzd). – Nick Schenone Apr 07 '23 at 20:31
0

I am using these solutions for iteration cross keys in KV storage:

1. NoSqlTarget

Iteration via v3io (it is not pure part of MLRun API, but it is part of MLRun distribution packages). More information about v3io python SDK see and iteration cross KV items (cursor usage) see sample

import v3io.dataplane

v3io_client = v3io.dataplane.Client(endpoint='https://v3io-webapi:8081', access_key='some_access_key')

# create a query, and use an items cursor to iterate the results
items_cursor = v3io_client.kv.new_cursor(container='users',
                                         table_path='/user-profile',
                                         attribute_names=['income'],
                                         filter_expression='income > 150')

# print the output
for item in items_cursor.all():
    print(item)

BTW: NoSqlTarget is available only for MLRun Enterprise edition

2. RedisTarget

You can use easy iteration cross KV items, it is part of Redis API

import redis

r = redis.StrictRedis(host='localhost', port=6379, db=0)
for key in r.keys('*'):
    r.delete(key)

It is possible to use commandline also via redis-cli see sample:

redis-cli keys users*

or remove from redis specific keys based on list of keys:

redis-cli keys users* | xargs redis-cli del

BTW: RedisTarget is available for MLRun CE and Enterprise editions

JIST
  • 1,139
  • 2
  • 8
  • 30