2

When I run my spark job from an IDE using Spark's Java APIs, I get the output in a desired encoding format (UTF-8). But if I start the 'spark-submit' method from command line, the output misses out on the encoding.

Is there a way where I can enforce encoding to 'spark-submit' when used through command line interface.

I am using Windows 10 OS and Eclipse IDE.

Your help will be really appreciated.

Thank you.

Community
  • 1
  • 1
KJ Sudarshan
  • 2,694
  • 1
  • 29
  • 22

4 Answers4

7

Run your Spark job like this : spark-submit --class com.something.class --name "someName" --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8"

Shreyas K C
  • 140
  • 6
0

Not working in my case

The command i use is

spark-submit --class com.rera.esearch --jars /Users/nitinthakur/.ivy2/cache/mysql/mysql-connector-java/jars/mysql-connector-java-8.0.11.jar /Users/nitinthakur/IdeaProjects/Rera2/target/scala-2.11/rera2_2.11-0.1.jar
--conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" 127.0.0.1 root

Output of below commands

println(System.getProperty("file.encoding")) // US-ASCII
println(scala.util.Properties.encodingString) // US-ASCII
nits41089
  • 34
  • 6
0

If you are seeing the issue in a code that runs in executor(like the code between foreachPartition or mapPartition) you would have to set spark.executor.extraJavaOptions that is

--conf 'spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8'

if your code is running in driver then set as said above, i.e

--conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8"
user2446776
  • 109
  • 1
  • 5
0

It seems like the arguments order matters. You have to specify the encoding before the JAR file like so:

spark-submit --class my.package.app --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" --conf spark.driver.extraJavaOptions=-Dfile.encoding=UTF-8 --conf spark.executor.extraJavaOptions=-Dfile.encoding=UTF-8 my-app.jar 

I tried specifying the encoding after the JAR file and it seems like the specified encoding doesn't get picked up.

Devigny
  • 3
  • 2