Scala reading CSV into Dataframe using Encoder

Question

I am trying to read a csv file into a dataframe using an encoder but am running into some issues. The file has the following format:

While it should be taking the first entry and making the respective entry:

92,61,2008-08-01T14:45:37Z,90,13,"http://svnbook.red-bean.com/"">Version Control with SubversionA very good resource for source control in general. Not really TortoiseSVN specific, though.
"

It is missing the entire second paragraph. The following is what I am doing to parse the csv:

case class tit(Id:Int,OwnerUserId:Int,CreationDate:String,ParentID:Int,Score:Int,Body:String)

val schema=Encoders.product[tit].schema

val df=spark.read.schema(schema).csv(fileName)

score 0 · Answer 1 · answered Apr 29 '18 at 18:13

0

Either try to add .option("multiLine", true) for reading a CSV, however this may not be without issues.

Or try to convert input to 1 line per record (using \n chars for line separators)

answered Apr 29 '18 at 18:13

Tom Lous

2,819
2
25
46

Scala reading CSV into Dataframe using Encoder

1 Answers1