0

I have data as below:-

Rollno|Name|height|department
101|Aman|5"2|C.S.E

Taking all the columns as string. When I am loading above data in hive I am getting extra quote at start and end as below:- Rollno:-"101 Name:-Aman Height:-5"2 Department:-C.S.E" Can anyone help me with the solution.

dtolnay
  • 9,621
  • 5
  • 41
  • 62
Meena
  • 11
  • 1

1 Answers1

1

Specify your separator such as:

val df = spark.read.option("header","true").option("inferSchema","true").option("sep", "|").csv("test.csv")
df.show(false)

+------+----+------+----------+
|Rollno|Name|height|department|
+------+----+------+----------+
|101   |Aman|5"2   |C.S.E     |
+------+----+------+----------+
Lamanus
  • 12,898
  • 4
  • 21
  • 47
  • Can't this be handled using csvserde in HQL?? – Meena Aug 17 '20 at 12:57
  • Then, split and get from array. But your csv loading is already messed up, the quote is not escaped well. So, it is somehow corrupted. – Lamanus Aug 17 '20 at 13:00
  • Thanks @Lamanua!Yes the way you shared I am able to read data correctly but when I am saving this data frame to another hdfs location the column height having quote itself is getting saved as "5"2".How can I handle this so that quotes get escaped and only 5"2 got written to the other hdfs location. – Meena Aug 19 '20 at 06:53
  • When I write this df to csv again then it save with the escape `\ ` `df.write.option("header","true").csv("test")` – Lamanus Aug 19 '20 at 07:09