1

I am trying to use SpoolDirCsvSourceConnector from https://github.com/jcustenborder/kafka-connect-spooldir

I have following configuration for connector in Kafka:

connector.class=com.github.jcustenborder.kafka.connect.spooldir.SpoolDirCsvSourceConnector
csv.first.row.as.header=true
finished.path=/csv/finished
tasks.max=1
parser.timestamp.date.formats=[dd.MM.yyyy, yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss]
key.schema={"name":"com.github.jcustenborder.kafka.connect.model.Key","type":"STRUCT","isOptional":false,"fieldSchemas":{}}
csv.separator.char=59
input.file.pattern=umsaetze_.*.csv
topic=test-csv
error.path=/csv/error
input.path=/csv/input
value.schema={"name":"com.github.jcustenborder.kafka.connect.model.Value","type":"STRUCT","isOptional":false,"fieldSchemas":{"Buchungstag":{"name":"org.apache.kafka.connect.data.Timestamp","type":"INT64","version":1,"isOptional":true},"Wertstellung":{"name":"org.apache.kafka.connect.data.Timestamp","type":"INT64","version":1,"isOptional":true},"Vorgang":{"type":"STRING","isOptional":false},"Buchungstext":{"type":"STRING","isOptional":false},"Umsatz":{"name":"org.apache.kafka.connect.data.Decimal","type":"BYTES","version":1,"parameters":{"scale":"2"},"isOptional":true}}}

value schema is following:

{
  "name": "com.github.jcustenborder.kafka.connect.model.Value",
  "type": "STRUCT",
  "isOptional": false,
  "fieldSchemas": {
    "Buchungstag": {
      "name": "org.apache.kafka.connect.data.Date",
      "type": "INT32",
      "version": 1,
      "isOptional": true
    },
    "Wertstellung": {
      "name": "org.apache.kafka.connect.data.Timestamp",
      "type": "INT64",
      "version": 1,
      "isOptional": true
    },
    "Vorgang": {
      "type": "STRING",
      "isOptional": false
    },
    "Buchungstext": {
      "type": "STRING",
      "isOptional": false
    },
    "Umsatz": {
      "name": "org.apache.kafka.connect.data.Decimal",
      "type": "BYTES",
      "version": 1,
      "parameters": {
        "scale": "2"
      },
      "isOptional": true
    }
  }
}

I have tried Date instead of timestamps

{
  "name" : "org.apache.kafka.connect.data.Date",
  "type" : "INT32",
  "version" : 1,
  "isOptional" : true
}

Both timestamps and date are not working for me with same exception as on example for fields Buchungstag and Wertstellung. I was trying to solve it with option parser.timestamp.date.formats but it doesn't help.

Here is example of CSV I am trying to import into Kafka:

Buchungstag;Wertstellung;Vorgang;Buchungstext;Umsatz;
08.02.2019;08.02.2019;Lastschrift / Belastung;Auftraggeber: BlablaBuchungstext: Fahrschein XXXXXX Ref. U3436346/8423;-55,60;
08.02.2019;08.02.2019;Lastschrift / Belastung;Auftraggeber: Bank AGBuchungstext: 01.02.209:189,34 Ref. ZMKDVSDVS/5620;-189,34;

I am getting following exception in Kafka Connect:

org.apache.kafka.connect.errors.ConnectException: org.apache.kafka.connect.errors.DataException: Exception thrown while parsing data for 'Buchungstag'. linenumber=2
    at com.github.jcustenborder.kafka.connect.spooldir.AbstractSourceTask.read(AbstractSourceTask.java:277)
    at com.github.jcustenborder.kafka.connect.spooldir.AbstractSourceTask.poll(AbstractSourceTask.java:144)
    ... 10 more
Caused by: org.apache.kafka.connect.errors.DataException: Could not parse '08.02.2019' to 'Date'
    at com.github.jcustenborder.kafka.connect.utils.data.Parser.parseString(Parser.java:113)
    ... 11 more
Caused by: java.lang.IllegalStateException: Could not parse '08.02.2019' to java.util.Date
    at com.google.common.base.Preconditions.checkState(Preconditions.java:588)
    ... 12 more

Do you have any idea what should be there value schema to parse dates like 01.01.2001?

Regfor
  • 8,515
  • 1
  • 38
  • 51
  • Are you sure `"type" : "INT32"` is the correct schema type for your dates? Look like you have strings – OneCricketeer Feb 12 '19 at 03:29
  • And for `Exception thrown while parsing data for 'Buchungstag'. linenumber=2` ... Looks like it's trying to parse your header? Which doesn't match your schema – OneCricketeer Feb 12 '19 at 03:32
  • No it is not parsing header. I have option csv.first.row.as.header=true, as well as later in stack trace you see it tries to parse date. To your first comment I was putting there string and it was not working but let me check again, maybe you are right – Regfor Feb 12 '19 at 16:21
  • Well, I've not actually used this particular connector. But I have lots of experience debugging others... If all else fails, you'll need the source code and enable remote debugging – OneCricketeer Feb 13 '19 at 04:36
  • @cricket_007 I have tried putting "type": "STRING" for date. It says schema is not supported. I suppose connector does not support such conversion or does it in a very basic manner – Regfor Feb 13 '19 at 13:24

1 Answers1

1

I think the issue is with your parser.timestamp.date.formats value. You pass [dd.MM.yyyy, yyyy-MM-dd'T'HH:mm:ss, yyyy-MM-dd' 'HH:mm:ss].

In configuration the property (parser.timestamp.date.formats) is set as List type. List should be passed as string with comma delimiter (,). In your case it should be: dd.MM.yyyy,yyyy-MM-dd'T'HH:mm:ss,yyyy-MM-dd' 'HH:mm:ss. The problem might be with white spaces, because they are trimmed.

Bartosz Wardziński
  • 6,185
  • 1
  • 19
  • 30
  • For reference how file is parsed - https://docs.oracle.com/javase/8/docs/api/java/util/Properties.html#load-java.io.Reader- – OneCricketeer Feb 14 '19 at 10:02