I have a problem that I hope you can help me with.
The text file that looks like this:
Report Name :
column1,column2,column3
this is row 1,this is row 2, this is row 3
I am leveraging Synapse Notebooks to try to read this file into a dataframe. If I try to read the csv file using spark.read.csv() it thinks that the column name is "Report Name : ", which is obviously incorrect. I know that the Pandas csv reader has a 'skipRows[1]' function but unfortunately I cannot read the file directly with Pandas, as I am getting some strange networking errors. I can however convert a PySpark dataframe to a Pandas dataframe via: df.toPandas() I'd like to be able to solve this with straight PySpark dataframes.
Surely someone else has encountered this issue! Help!
I have tried every variation of reading files, and drop, etc. but the schema has already been defined when the first dataframe was created, with 1 column (Report Name : ). Not sure what to do now..