0

I wanted to read an excel sheet that has one merged column and I need to treat that column to determine the rows that are grouped with. So that I can consider that group as a set. Likewise each merged rows are considered as set1, set2, set3 and so on. I will then process each set w its rows separately.

enter image description here

UPDATE: As Requested by @fireandfuel, I am including the approche that I have arrived.

    CTMergeCells mergeCells = workSheet.getMergeCells();
    SheetData sheetData = workSheet.getSheetData();
    List<ArrayList<Row>> rowGroups = new ArrayList<ArrayList<Row>>();
    List<CTMergeCell> cTMergeCells = mergeCells.getMergeCell();
    for(CTMergeCell mcells : cTMergeCells){
        String range = mcells.getRef();
        Integer rowStart = Integer.parseInt(range.substring(1, 2));
        Integer rowEnd = Integer.parseInt(range.substring(4, 5));
        ArrayList<Row> rowss = (ArrayList<Row>) sheetData.getRow().stream().filter(row -> {
            return row.getR() >= rowStart.longValue() && row.getR() <= rowEnd.longValue();
        }).collect(Collectors.toList());
        rowGroups.add((ArrayList<Row>) rowss);
    }
Sathish Kumar k k
  • 1,732
  • 5
  • 26
  • 45
  • let us know what you have tried – sidgate Apr 02 '18 at 07:38
  • @sidgate I have tried only `formatter.formatCellValue(row.getC().get(3))`. I do not know what to be called to get the rows range as we do in [Apache poi](https://stackoverflow.com/questions/29664977/how-to-read-from-merged-cells-of-excel-in-java-using-apache-poi). I went through the docx4j documentation but no luck. – Sathish Kumar k k Apr 02 '18 at 08:49
  • You might be interested in https://codereview.stackexchange.com/ to review your code. I see some issue with your approach, but it's off topic. – fireandfuel Apr 05 '18 at 21:07

1 Answers1

0

It's very easy to access the definition of merged cells in SpreadsheetML using docx4j.

Office Open XML files (docx, xlsx and pptx) are simply ZIP archives which are containing XML files. You can open it with the most archive programs and have a look inside its file structure and files.

Here is some source code for to access the definition of merged cells using docx4j:

// load yourFile.xlsx file
SpreadsheetMLPackage spreadsheetMLPackage = SpreadsheetMLPackage.load(new File("yourFile.xlsx"));

// get worksheet from /xl/worksheets/yourSheet.xml file from yourFile.xlsx
WorksheetPart worksheetPart = (WorksheetPart) spreadsheetMLPackage.getParts().get(new PartName("/xl/worksheets/yourSheet.xml")); 
Worksheet worksheet = worksheetPart.getJaxbElement();

// get the merged cells
CTMergeCells ctMergedCells = worksheet.getMergeCells();
if(ctMergedCells != null){
    List<CTMergeCell> mergedCellList = ctMergedCells.getMergeCell();
    // do something with the merged cells
}

The class org.xlsx4j.sml.CTMergeCell has a function getRef which gives you the reference of cells which are merged as String, encoded as begin:end, like A1:B2 (merged cells from A1 to B2: A1, A2, B1 and B2).

fireandfuel
  • 732
  • 1
  • 9
  • 22
  • Thanks and I am able to achieve the same, but `getRef` gives only `String` and with this I need to do more manipulation to get the `List` with in that range. Do we have and other way to do the same? ` String range = mcells.getRef(); ` Integer rowStart =` `Integer.parseInt(range.substring(1, 2));` ` Integer rowEnd =` `Integer.parseInt(range.substring(4, 5)); ArrayList rowss = (ArrayList) sheetData.getRow().stream().filter(row -> { return row.getR() >= rowStart.longValue() && row.getR() <= rowEnd.longValue(); }).collect(Collectors.toList());` – Sathish Kumar k k Apr 05 '18 at 06:55
  • @SathishKumarkk As docx4j only provides an low level API for the access on `Office Open XML` files you have to implement the function on your own. Please edit your question and add your approach(es) to your question. PS: After having a quick look over your approach, it only works as long the column token is less than 27 (column 27 is `AA`). An alternative to it might be a XPath query on the data, but be warned that it might change the document structure! – fireandfuel Apr 05 '18 at 07:39