0

For my bachelors thesis I am trying to send machine data (in this case historical data sent with a python script) using a http connection to kafka. I am using the confluent platform running in docker on a windows system.

Using the python script I try to send the data to the REST proxy. At first I got error responses concerning the data type which I was able to resolve.

import pandas as pd
import csv, os, json, requests, time, datetime, copy, sys

if len(sys.argv) > 1:
    bgrfc_value = str(sys.argv[1])
else:
    print("No arguments for bgrfc given, defaulting to 'false'")
    bgrfc_value = 'false'

if len(sys.argv) > 2:
    filePath = str(sys.argv[2])
else:
    filePath = "path"


if len(sys.argv) > 3:
    batchSize = int(float(str(sys.argv[3])))
else:
    batchSize = 10


# Build skeleton JSON
basejson = {"message": {"meta" : "", "data": ""}}
#metajson = [{'meta_key' : 'sender', 'meta_value': 'OPCR'},
#           {'meta_key' : 'receiver', 'meta_value': 'CAT'},
#            {'meta_key' : 'message_type', 'meta_value': 'MA1SEK'},
#            {'meta_key' : 'bgrfc', 'meta_value': bgrfc_value}]
#basejson['message']['meta'] = metajson
url = "http://127.0.0.1:8082/"
headers = {'Content-Type':'application/json','Accept':'application/json'}

def assign_timestamps(batch):
    newtimestamps = []
    oldtimestamps = []

    # Batch timestamps to list, add 10 newly generated timestamps to a list
    for item in batch['tag_tsp'].values.tolist():
        newtimestamps.append(datetime.datetime.now())
        oldtimestamps.append(datetime.datetime.strptime(str(item), "%Y%m%d%H%M%S.%f"))

    # Sort old timestamps without sorting the original array to preserve variance
    temp = copy.deepcopy(oldtimestamps)
    temp.sort()
    mrtimestamp = temp[0]

    # Replicate variance of old timestamps into the new timestamps
    for x in range(batchSize):
        diff = mrtimestamp - oldtimestamps[x]
        newtimestamps[x] = newtimestamps[x] - diff
        newtimestamps[x] = newtimestamps[x].strftime("%Y%m%d%H%M%S.%f")[:-3]

    # Switch old timestamps with new timestamps
    batch['tag_tsp'] = newtimestamps
    return batch

# Build and send JSON, wait for a sec
def build_json(batch):
    assign_timestamps(batch)
    batchlist = []
    for index, row in batch.iterrows():
        batchlist.append(row.to_dict())

    basejson['message']['data'] = batchlist
    print(basejson)
    req = requests.post(url, json = json.loads(json.dumps(basejson)), headers = headers)
    print(req.status_code)
    time.sleep(1)

while(True):
    df = pd.read_csv(filePath, sep=";", parse_dates=[2], decimal=",", usecols = ['SENSOR_ID', 'KEP_UTC_TIME', 'VALUE'], dtype={'SENSOR_ID': object})
    df = df[::-1]
    df.rename(columns={'SENSOR_ID' : 'ext_id', 'KEP_UTC_TIME' : 'tag_tsp', 'VALUE' : 'tag_value_int'}, inplace=True)

    # Fill list with batches of 10 rows from the df
    list_df = [df[ i:i + batchSize] for i in range(0, df.shape[0], batchSize)]

    for batch in list_df:
        build_json(batch)

The script sends the data but as a response I get status code 500.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
LukasM
  • 43
  • 8
  • 1
    Do you know that there is a Python client library for Kafka? – Robin Moffatt Feb 08 '19 at 11:32
  • I have heard there are python libraries for kafka but I have not considered them as a solution to my problem yet as the definition of the task for my thesis is that the data is given through a http connection. As far as I understand a python library for kafka will not help me with this. If I am wrong please let me know. – LukasM Feb 08 '19 at 13:43
  • 1
    The Python libraries will use the native Kafka protocol, which is more efficient than HTTP. If you *have* to use HTTP then stick with REST proxy. – Robin Moffatt Feb 08 '19 at 13:47

2 Answers2

2

Your headers value are not correct. You need to set Accept and Content-type two headers as given below:

 Accept: application/vnd.kafka.v2+json
 Content-Type : application/vnd.kafka.json.v2+json

Also the data should be structured in following way :

{"records":[{"value":{<Put your json record here>}}]}

For example :

{"records":[{"value":{"foo":"bar"}}]}
Nishu Tayal
  • 20,106
  • 8
  • 49
  • 101
  • Thank you for your respones. I have changed headers and am trying to send an example message using a different script. console output is `{'records': [{'key': 'somekey', 'value': {'foo': 'bar'}}, {'value': ['foo', 'bar'], 'partition': 1}, {'value': 53.5}]} 500` the message is copy & pasted from confluents documentation. Any idea why error 500 is still coming up? – LukasM Feb 11 '19 at 10:33
  • @LukasM 500 it means `Internal Server Error`, something is wrong in your server 's side, not in your client. You need to check the logs from your REST Proxy – SteveGr2015 Feb 18 '19 at 14:51
0

I believe that the data you put into "value" must be a string. Something like this will work:

{"records":[{"value":"{'foo':'bar'}"}]}

If you are getting a funny message when you read from your topic, then try encoding your message using base64 encoding. Your original json string, after encoded, should be like this:

{"records":[{"value":"eyJmb28iOiJiYXIifQ=="}]}
Averell
  • 793
  • 2
  • 10
  • 21