1

I have a dataframe with a single column as below. I am using pyspark version 2.3 to write to csv.

18391860-bb33-11e6-a12d-0050569d8a5c,48,24,44,31,47,162,227,0,37,30,28
18391310-bc74-11e5-9049-005056b996a7,37,0,48,25,72,28,24,44,31,52,27,30,4

In default the output for the code is

df.select('RESULT').write.csv(path)
"18391860-bb33-11e6-a12d-0050569d8a5c,48,24,44,31,47,162,227,0,37,30,28"
"18391310-bc74-11e5-9049-005056b996a7,37,0,48,25,72,28,24,44,31,52,27,30,4"

How can I remove the outer quotes? I have tried option('quoteAll','false') and option('quote',None) which did not work.

mck
  • 40,932
  • 13
  • 35
  • 50
kavya
  • 75
  • 1
  • 10

2 Answers2

1

You can try writing with a | separator. The default is , which conflicts with your content that contains commas.

df.select('RESULT').write.csv(path, sep="|")
mck
  • 40,932
  • 13
  • 35
  • 50
0

You can also use spark.write.text:

df.select('RESULT').write.text(path)
blackbishop
  • 30,945
  • 11
  • 55
  • 76