0

I am trying to read a csv file into a dataframe using an encoder but am running into some issues. The file has the following format:

enter image description here

While it should be taking the first entry and making the respective entry:

92,61,2008-08-01T14:45:37Z,90,13,"http://svnbook.red-bean.com/"">Version Control with SubversionA very good resource for source control in general. Not really TortoiseSVN specific, though.

"

It is missing the entire second paragraph. The following is what I am doing to parse the csv:

case class tit(Id:Int,OwnerUserId:Int,CreationDate:String,ParentID:Int,Score:Int,Body:String)

val schema=Encoders.product[tit].schema

val df=spark.read.schema(schema).csv(fileName)
Bentaye
  • 9,403
  • 5
  • 32
  • 45

1 Answers1

0

Either try to add .option("multiLine", true) for reading a CSV, however this may not be without issues.

Or try to convert input to 1 line per record (using \n chars for line separators)

Tom Lous
  • 2,819
  • 2
  • 25
  • 46