I have a very huge CSV file. I want to read it through Pyspark but I am not able to read it properly.
Sample csv as
"keyvalue","rto","state","maker_model","veh_type","veh_class"
"hnjsnjncjssssmj", "OD", "ODISHA", "BAJAJ AUTO", "Private Vehicle", "Car"
"hnjsnjncjssssjj", "OD", "ODISHA", "BAJAJ AUTO
", "Private Vehicle", "Car"
"hnjsnjncjssssmm", "GO", "GOA", "TATA MOTORS", "Private Vehicle", "Bus"
I want to read it as like this
+---------------+-----+---------+--------------+------------------+---------+
| keyvalue| rto| state| maker_model| veh_type|veh_class|
+---------------+-----+---------+--------------+------------------+---------+
|hnjsnjncjssssmj| "OD"| "ODISHA"| "BAJAJ AUTO"| "Private Vehicle"| "Car"|
|hnjsnjncjssssjj| "OD"| "ODISHA"| "BAJAJ AUTO"| "Private Vehicle"| "Car"|
|hnjsnjncjssssmm| "GO"| "GOA"| "TATA MOTORS"| "Private Vehicle"| "Bus"|
but my pyspark is unable to recognise 2nd row properly and it's breaking it like
+--------------------+------+---------+--------------+------------------+---------+
| keyvalue| rto| state| maker_model| veh_type|veh_class|
+--------------------+------+---------+--------------+------------------+---------+
| hnjsnjncjssssmj| "OD"| "ODISHA"| "BAJAJ AUTO"| "Private Vehicle"| "Car"|
| hnjsnjncjssssjj| "OD"| "ODISHA"| "BAJAJ AUTO| null| null|
|", "Private Vehicle"| "Car"| null| null| null| null|
| hnjsnjncjssssmm| "GO"| "GOA"| "TATA MOTORS"| "Private Vehicle"| "Bus"|
+--------------------+------+---------+--------------+------------------+---------+
I have tried various configurations in read csv function of spark but as of now nothing is working. Please guide me?