I am trying to filter records which donot have expected number of fields in row below is my code
no_of_rows_in_each_column=3 delimiter = ","
input.csv
emp_id,emp_name,salary
1,"siva
Prasad",100
2,pavan,200,extra
3,prem,300
4,john
Expecetd output dataframes
Correct_input_data_frame
emp_id,emp_name,salary
1,"siva Prasad",100
3,prem,300
wrong_file.csv it is file
emp_id,emp_name,salary,no_of_fields
2,pavan,200,extra,4 fields in row 3 fields expected
4,john, 2 fields in row 3 expected
I tried this, seems able to read but len() function not working on rows.
input_df = (spark.read
.option("multiline", "true")
.option("quote", '"')
.option("header", "true")
.option("escape", "\\")
.option("escape", '"')
.csv('input.csv')
)
correct = input_df.(filter(len(row{}) = 3)
wrong_data = input_df.(filter(len(row{})<>3)