How to fix an error when an empty string is being written to elastic search from an Apache Spark job?

Question

There is an exception being thrown when I execute my Scala app with functionality of myRDD.saveToEs (I also tried saveToEs from a dataframe). My ES version is 2.3.5. I am using Spark 1.5.0 so maybe there is a way to configure this in the SparkContext which I am not aware of.

The stack trace is as under -

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, localhost): org.apache.spark.util.TaskCompletionListenerException: Found unrecoverable error [127.0.0.1:9200] returned Bad Request(400) - failed to parse [foo_eff_dt];Invalid format: ""; Bailing out..
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:90)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

The field named foo_eff_dt does have values and in certain cases doesnt (i.e., empty). I am not sure if this is causing the exception.

My scala code snippet looks like this :

fooRDD.saveToEs("foo/bar")

Please help/guide me in resolving this.

TIA.

score 0 · Answer 1 · answered May 15 '17 at 12:38

I think you are trying to insert Date into Elastic and in Elastic Date can be empty.

{
"format": "strict_date_optional_time||epoch_millis",
"type": "date"
 }

If you don't have strict need for date field then you can easily resolve this by changing this into string.

How to fix an error when an empty string is being written to elastic search from an Apache Spark job?

1 Answers1