1

My Code:

finalJoined.show();

Encoder<Row> rowEncoder = Encoders.bean(Row.class);                             
Dataset<Row> validatedDS = finalJoined.map(row -> validationRowMap(row), rowEncoder);       
validatedDS.show();

Map function :

public static Row validationRowMap(Row row) {

        //PART-A validateTxn()

        System.out.println("Inside map");
        //System.out.println("Value of CIS_DIVISION is " + row.getString(7));

        //1. CIS_DIVISION
        if ((row.getString(7)) == null || (row.getString(7)).trim().isEmpty()) {
            System.out.println("CIS_DIVISION cannot be blank.");
        }

return row;

}

Output :

finalJoined Dataset<Row> is properly shown with all columns and rows with proper values, however validatedDS Dataset<Row>is shown with only one column with empty values.

*Expected output : *

validatedDS should also show same values as finalJoined dataset because I am only performing validation inside the map function and not changing the dataset itself.

Please let me know if you need more information.

Selim
  • 1,064
  • 11
  • 23
Raj
  • 707
  • 6
  • 23

1 Answers1

1

Encoders.bean is intended for usage with Bean classes. Row is not one of these (doesn't define setter and getters for specific fields, only generic getters).

To return Row object you have to use RowEncoder and provide expected output schema.

Check for example Encoder for Row Type Spark Datasets