15

I need to save the output of df.show() as a string so that i can email it directly.

For ex., the below example taken from official spark docs,:

val df = spark.read.json("examples/src/main/resources/people.json")

// Displays the content of the DataFrame to stdout
df.show()
// +----+-------+
// | age|   name|
// +----+-------+
// |null|Michael|
// |  30|   Andy|
// |  19| Justin|
// +----+-------+

I need to save the above table as a string which is printed in the console. I did look at log4j to print the log, but couldnt come across any info on logging only the output.

Can someone help me with it?

Omkar
  • 2,274
  • 6
  • 21
  • 34
  • this has already been answered : https://stackoverflow.com/questions/45741035/is-there-any-way-to-get-the-output-of-sparks-dataset-show-method-as-a-string/ – Raphael Roth Jan 31 '18 at 20:34

2 Answers2

24

scala.Console has a withOut method for this kind of thing:

val outCapture = new ByteArrayOutputStream
Console.withOut(outCapture) {
  df.show()
}
val result = new String(outCapture.toByteArray)
Joe K
  • 18,204
  • 2
  • 36
  • 58
6

Workaround is to redirect standard output to variable:

val baos = new java.io.ByteArrayOutputStream();
val ps =  new java.io.PrintStream(baos);

val oldPs = Console.out
Console.setOut(ps)
df.show()
val content = baos.toString()
Console.setOut(oldPs)

Note that I have one deprecation warning here.

You can also re-implement method Dataset.showString, which generated data. It uses take in background. Maybe it's also a good moment to create PR to make showString public? :)

Alper t. Turker
  • 34,230
  • 9
  • 83
  • 115
T. Gawęda
  • 15,706
  • 4
  • 46
  • 61