1

I used influxDB-Python to insert a large amount of data read from the Redis-Stream. Because Redis-stream and set maxlen=600 and the data is inserted at a speed of 100ms, and I needed to retain all of its data. so I read and transfer it to influxDB(I don't know what's a better database), but using batch inserts only ⌈count/batch_size⌉ pieces of data, both at the end of each batch_size, appear to be overwritten. The following code

import redis
from apscheduler.schedulers.blocking import BlockingScheduler
import time
import datetime

import os
import struct
from influxdb import InfluxDBClient

def parse(datas):
    ts,data = datas
    w_json = {
    "measurement": 'sensor1',
    "fields": {
        "Value":data[b'Value'].decode('utf-8')
        "Count":data[b'Count'].decode('utf-8')
        }
    }
    return w_json

def archived_data(rs,client):
    results= rs.xreadgroup('group1', 'test', {'test1': ">"}, count=600)
    if(len(results)!=0):
        print("len(results[0][1]) = ",len(results[0][1]))
        datas = list(map(parse,results[0][1]))
        client.write_points(datas,batch_size=300)
        print('insert success')
    else:
        print("No new data is generated")

if __name__=="__main__":
    try:
        rs = redis.Redis(host="localhost", port=6379, db=0)
        rs.xgroup_destroy("test1", "group1")
        rs.xgroup_create('test1','group1','0-0')
    except Exception as e:
        print("error = ",e)
    try:
        client = InfluxDBClient(host="localhost", port=8086,database='test')
    except Exception as e:
        print("error = ", e)
    try:
        sched = BlockingScheduler()
        sched.add_job(test1, 'interval', seconds=60,args=[rs,client])
        sched.start()
    except Exception as e:
        print(e)

The data changes following for the influxDB

> select count(*) from sensor1;
name: sensor1
time count_Count count_Value
---- ----------- -----------
0    6           6
> select count(*) from sensor1;
name: sensor1
time count_Count count_Value
---- ----------- -----------
0    8           8

> select Count from sensor1;
name: sensor1
time                Count
----                -----
1594099736722564482 00000310
1594099737463373188 00000610
1594099795941527728 00000910
1594099796752396784 00001193
1594099854366369551 00001493
1594099855120826270 00001777
1594099913596094653 00002077
1594099914196135122 00002361

Why does the data appear to be overwritten, and how can I resolve it to insert all the data at a time?

I would appreciate it if you could tell me how to solve it?

moluzhui
  • 1,003
  • 14
  • 34

1 Answers1

0

Can you provide more details on the structure of data that you wish to store in the influx DB ? However, I hope the below information helps you.

In Influxdb, timestamp + tags are unique (i.e. two data points with same tag values and timestamp cannot exist). Unlike SQL influxdb doesn't throw unique constraint violation, it overwrites the existing data with the incoming data. It seems your data doesn't have tags, so if the some incoming data whose timestamps are already present in the influxdb will override the existing data

Tamil Selvan V
  • 436
  • 3
  • 7
  • Thank you for your help. I did not tag, but I deleted my tag because I encountered the following mistakes `"error":"partial write: max-values-per-tag limit exceeded (100128/100000)` – moluzhui Jul 07 '20 at 09:01
  • Tags are indexed so the cardinality of tags should not be very high. But for the data to remain distinct there should be some other parameter, right? you will need to add that as tag. If you can post some sample data, I can help with choosing the tags, so that data will not get overwritten – Tamil Selvan V Jul 07 '20 at 10:41
  • Transfer the Redis -stream data to influxDB. `Ts` is a string representing the time when redis will be inserted. `Chans` is an array of type int, representing the channel sequence number; `Type` is a string representing the data Type; `Shape` is an integer list with length 2, representing the Shape information of the matrix; `Units` are an array of strings representing the channel's unit information; `Names` is a list of strings representing the name of the channel; `Value` is a list of floating point Numbers of length 3072, representing the data collected by the sensor. – moluzhui Jul 07 '20 at 11:51