3

How can I get merged regions (merged cells) of an excel sheet using the event API provided by Apache POI?

Using the "traditional" DOM-like parsing style there are methods called Sheet.getNumMergedRegions() and Sheet.getMergedRegion(int). Unfortunately I need to handle huge Excel files where I get out of memory errors even with the highest Xmx-value I am allowed to use (in this project). So I'd like to use the event API, but wasn't able to find out how to get information about merged regions, which I need to know to "understand" the content correctly...

Using the example given here: http://poi.apache.org/spreadsheet/how-to.html#xssf_sax_api I get events for each single cell of a merged region (only the first of them contains any textual content though). So maybe, if there isn't a more direct way, it would help to know how those merged cells could be (safely) distinguished from other (empty) cells...

centic
  • 15,565
  • 9
  • 68
  • 125
Patter
  • 919
  • 7
  • 10

3 Answers3

3

I don't know for sure where merged cell info gets stored, but I'm fairly sure it won't be with the cell data itself, as that's not the Excel way.

What I'd suggest you do is create a simple file without merged cells. Then, take a copy, and add a single merged cell. Unzip both of these (.xlsx is a zip of xml files), and diff them. That'll show you quite quickly what gets set to mark cells as merged. (My hunch is that it'll be somewhere in the sheet settings, near the start but not near the cell values, BICBW)

Once you know where the merged cell details live, you can take a look at the XSSF UserModel code for working with merged cells to get an idea of how they work, how they're manipulated, what the options are etc. With that in mind, you can look at the file format docs for the full details, but those can be a bit heavy and detailed to go to first. Finally, you can add in your code to use the merged info details, once you know where to get it from!

Gagravarr
  • 47,320
  • 10
  • 111
  • 156
  • You're right - I was searching in the wrong place. There are elements named "mergeCell" with an attribute "ref". The value of ref is e.g. "A1:B3". Now, it seems that I have to interpret that string ("A1:B3") to get column/row indices of all merged regions. Thanks! – Patter Jul 24 '12 at 11:33
  • So, where is the right place? Were is "mergeCell" element? I have the same problem. – Samoth Jan 13 '17 at 14:27
  • I did just as you recommended, and compared new xlsx files that I just created, and indeed- the difference is the after the tag. But unfortunately- in my real file- after I load it- I don't have any , anywhere, I checked it in the regular way (via XSSFWorkbook, which takes tons of time) and it manage to get the merge cells (arrayList of 14 elements, that's what's actually supposed to be) What's wrong in my file? What can cause XSSFReader to not process the merge cells correctly? – Stack Overflow Jan 13 '22 at 10:04
2

To expand on Mike's answer. You can create a ContentHandler to locate Merge Regions like:

import java.util.ArrayList;
import java.util.List;

import org.apache.poi.ss.util.CellRangeAddress;

import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;

public class MergedRegionLocator extends DefaultHandler {
    private final List<CellRangeAddress> mergedRegions = new ArrayList<>();

    @Override
    public void startElement (String uri, String localName, String name, Attributes attributes) {
        if ("mergeCell".equals(name) && attributes.getValue("ref") != null) {
            mergedRegions.add(CellRangeAddress.valueOf(attributes.getValue("ref")));
        }
    }

    public CellRangeAddress getMergedRegion (int index) {
        return mergedRegions.get(index);
    }

    public List<CellRangeAddress> getMergedRegions () {
        return mergedRegions;
    }
}

An example of using it with POIs Event-Based parsing:

OPCPackage pkg = OPCPackage.open(new FileInputStream("test.xlsx"));
XSSFReader reader = new XSSFReader(pkg);
InputStream sheetData = reader.getSheetsData().next();

MergedRegionLocator mergedRegionLocator = new MergedRegionLocator();
XMLReader parser = XMLReaderFactory.createXMLReader();
parser.setContentHandler(mergedRegionLocator);
parser.parse(new InputSource(sheetData));

mergedRegionLocator.getMergedRegions();
Jon Bake
  • 31
  • 1
1

You need to open stream and parse it twice.

First time - to extract merged cells. They are appears in the sheet...xml file after <sheetData>...</sheetData> tag, like in this example:

...
< /sheetData >
< mergeCells count="2" >
    < mergeCell ref="A2:C2"/ >
    < mergeCell ref="A3:A7"/ >
 </mergeCells >

Extract that and keep in some List.

Then reopen the stream again and parse it as usual, to extract rows and cells. In the endElement(...) method when finishing every row, check if this row appears (partially or completely) in the merged region.

AdrieanKhisbe
  • 3,899
  • 8
  • 37
  • 45
Mike
  • 41
  • 1
  • 4