0

versions druid .10.1 from HDP-2.6.5.0 We are using the druid-kafka indexer service ingestion to load the data into druid from kafka topics and during this we have found that druid is storing the metrics values which has 0 or 0.0 are been stored as null and while retrieving through superset or Druid api getting response as null. Need advice if we are missing anything here.

Error from superset:

{"status": "failed", "error_type": "warning", "error": "unsupported operand type(s) for +: 'int' and 'NoneType'"}

Ingestion spec file below:

{
    "type": "kafka",
    "dataSchema": {
        "dataSource": "data-source",
        "parser": {
            "type": "string",
            "parseSpec": {
                "format": "json",
                "timestampSpec": {
                    "column": "datetime",
                    "format": "YYYYMMdd_HHmmss"
                },
                "columns": [
                    "created_date",
                    "s_type",
                    "datetime",
                    "ds_ser",
                    "ven",
                    "cou_name",
                    "c_name",
                    "d_name",
                    "dv_name",
                    "p_name",
                    "redTime",
                    "wrTime",
                    "tRate",
                    "MTRate"
                ],
                "dimensionsSpec": {
                    "dimensions": [
                        "created_date",
                    "s_type",
                    "datetime",
                    "ds_ser",
                    "ven",
                    "cou_name",
                    "c_name",
                    "d_name",
                    "dv_name",
                    "p_name",
                    ]
                }
            }
        },
        "metricsSpec": [{
            "name": "count",
            "type": "count"
        },
            {
                "type": "doubleMax",
                "name": "redTime",
                "fieldName": "redTime"
            },
            {
                "type": "doubleMax",
                "name": "wrTime",
                "fieldName": "wrTime"
            },
            {
                "type": "longMax",
                "name": "tRate",
                "fieldName": "tRate"
            },
            {
                "type": "longMax",
                "name": "MTRate",
                "fieldName": "MTRate"
            }
        ],
        "granularitySpec": {
            "type": "uniform",
            "segmentGranularity": "HOUR",
            "queryGranularity": "NONE"
        }
    },
    "tuningConfig": {
        "type": "kafka",
        "maxRowsPerSegment": 5000000
    },
    "ioConfig": {
        "topic": "ptopic",
        "useEarliestOffset": "true",
        "consumerProperties": {
            "bootstrap.servers": "host:port"
        },
        "taskCount": 1,
        "replicas": 1,
        "taskDuration": "PT5M"
    }
}

Rest api from druid used: http://host:port/druid/v2?pretty

body:

{
    "queryType": "groupBy",
    "dataSource": "data-source",
    "granularity": "all",
    "dimensions": ["ds_ser"],
    "aggregations": [
        {"type": "doubleMax", "name": "redTime", "redTime": "writeresponsetime"},
        {"type": "doubleMax", "name": "wrTime", "wrTime": "totalResponseTime"},
        {"type": "longMax", "name": "tRate", "fieldName": "tRate"},
        {"type": "longMax", "name": "MTRate", "MTRate": "MaxTransferRate"}
        
    ],
    "intervals": ["2019-01-02T00:00/2019-01-02T23:59"]
}

Response from Druid:

[
    {
        "version": "v1",
        "timestamp": "2019-01-02T00:00:00.000Z",
        "event": {
            "redTime": null,
            "ds_ser": "240163",
            "wrTime": null,
            "tRate": null,
            "MTRate": null
        }
    },
    {
        "version": "v1",
        "timestamp": "2019-01-02T00:00:00.000Z",
        "event": {
            "redTime": null,
            "ds_ser": "443548",
            "wrTime": null,
            "tRate": 0,
            "MTRate": null
        }
    }
]

Data in Kafka:

> {"created_date":"2019-02-03T18:35:59.514Z","s_type":"BLOCK","datetime":"20181121_070000","ds_ser":"443551","ven":"abc","cou_name":"USA","c_name":"Piscataway","d_name":"Piscataway","dv_name":"USPSCG","p_name":"443551-CK","redTime":0.0,"wrTime":0.0,"tRate":0,"MTRate":0}
> {"created_date":"2019-02-03T18:35:59.514Z","s_type":"BLOCK","datetime":"20181121_070000","ds_ser":"443551","ven":"abc","cou_name":"USA","c_name":"Piscataway","d_name":"Piscataway","dv_name":"USPSCG4","p_name":"443551-CF","redTime":0.0,"wrTime":0.0,"tRate":0,"MTRate":0}
TylerH
  • 20,799
  • 66
  • 75
  • 101
Imran
  • 429
  • 9
  • 23
  • In your second object tRate returns 0 that means its storing numbers correctly. Does some of your data has nulll for these fields? – Jainik Feb 08 '19 at 07:20
  • @Jainik Thanks for replying. I had double checked the data in kafka but i could not find null. I have attached sample data from kafka. Moreover we are explicitly replacing blanks & nulls with 0 then casting values as double and long. This looks inconsistent behavior from Druid – Imran Feb 12 '19 at 11:24
  • If your data is not having null and still you are seeing an issue then try `Javascript Aggregator` – Jainik Feb 12 '19 at 18:54
  • thanks @Jainik. I got this response from druid community. https://groups.google.com/forum/#!msg/druid-user/MRHjHiaQ8Do/9PYoUtb0CgAJ – Imran Feb 13 '19 at 07:37
  • 1
    According to that in current version "Long/Float columns Nulls are considered equivalent to 0." that means you need to treat null in your response as 0. – Jainik Feb 13 '19 at 08:10

1 Answers1

0

Well I have found answer to my own question.

I had done mistake in in preparing the druid kafka inderex json. I was not aware of the fact that the fields are case sensative. The json snippet posted here was a made up one hence the fields names are matching but in my actual production code and json file these weren't matching hence druid is assuming these as new fields and allocating value as null while ingesting them. Example below:

Kafka Json:

{"created_date":"2019-02-03T18:35:59.514Z","s_type":"BLOCK","datetime":"20181121_070000","ds_ser":"443551","ven":"abc","cou_name":"USA","c_name":"Piscataway","d_name":"Piscataway","dv_name":"USPSCG","p_name":"443551-CK","redTime":0.0,"wrTime":0.0,"tRate":0,"MTRate":0}

Druid indexer json columns were like:

"columns": [
                    "created_date",
                    "s_type",
                    "datetime",
                    "ds_ser",
                    "ven",
                    "cou_name",
                    "c_name",
                    "d_name",
                    "dv_name",
                    "p_name",
                    "redTime",
                    "wrtime",
                    "trate",
                    "MTRate"
                ],

If we observe above there is a mismatch in wrTime --> wrtime and tRate --> trate. So for me this was the root cause, once after resolving the names druid started to ingest proper values.

Imran
  • 429
  • 9
  • 23