Just create iterator,
Iterator<Row> rowIterator = w.getWorkbook(blob).getSheetAt(1).iterator;
iterate over sheet and put cells values into 2d array,
Object[][] tab = new Object[rows][cols];
.iterator() iterates over row, inside this loop you have to create cell iterator to iterate over "columns" too.
Object[][] tab = new Object[rows][cols];
Iterator<Row> rowIterator = w.getWorkbook(blob).getSheetAt(1).iterator;
while (rowIterator.hasNext()) {
Row row = rowIterator.next();
// For each row, iterate through all the columns
Iterator<Cell> cellIterator = row.cellIterator();
while (cellIterator.hasNext()) {
Cell cell = cellIterator.next();
// Check the cell type and format accordingly
cell.setCellType(CellType.STRING);
tab[cell.getRowIndex()][cell.getColumnIndex()] = cell.getStringCellValue();
}
}
then convert your 2d array to Seq/List/Array of Strings, where String will be concatenated elements of rows. In next step create RDD using sc.parallelize, in map step split your string and create RDD[Row]
. I will look like this:
val yourRDD = rdd.map(RowFactory.create(_.split(yourDelimiter)))
and then use spark.createDataFrame(yourRDD,schema)