Easiest way to compare two Excel files in Java?

Question

I'm writing a JUnit test for some code that produces an Excel file (which is binary). I have another Excel file that contains my expected output. What's the easiest way to compare the actual file to the expected file?

Sure I could write the code myself, but I was wondering if there's an existing method in a trusted third-party library (e.g. Spring or Apache Commons) that already does this.

Toby · Answer 1 · 2013-01-24T15:30:43.990

19

You might consider using my project simple-excel which provides a bunch of Hamcrest Matchers to do the job.

When you do something like the following,

assertThat(actual, WorkbookMatcher.sameWorkbook(expected));

You'd see, for example,

java.lang.AssertionError:
Expected: entire workbook to be equal
     but: cell at "C14" contained <"bananas"> expected <nothing>,
          cell at "C15" contained <"1,850,000 EUR"> expected <"1,850,000.00 EUR">,
          cell at "D16" contained <nothing> expected <"Tue Sep 04 06:30:00">
    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)

That way, you can run it from your automatted tests and get meaningful feedback whilst you're developing.

You can read more about it at this article on my site

edited Jan 24 '13 at 15:30

answered Sep 10 '12 at 07:21

Toby

9,523
8
36
59

If you're the developer of this project, you should probably add a disclaimer to that effect. – Andrew Swan Sep 12 '12 at 03:06
Because apart from being good form, it's a rule of this site, see http://stackoverflow.com/faq#promotion. – Andrew Swan Sep 13 '12 at 08:28
2

sure, but i'd say its not so much a rule from SO; it suggests that the community may get peeved with you. I'd have thought this could happen in the case of flagrant self promotion. My reply is genuine and fits into modern java testing strategies, ie hamcrest, well. Its a good answer to the question, whomever offered it IMO. – Toby Sep 14 '12 at 13:27
@Toby .. instead of assertion error, can I get simply the result of comparison in some easy to read format? I want the differences to be feed to another function for further processing. – Garry Aug 24 '15 at 07:04
Not by default but you could do something around a custom runner to catch the assertion error and process it. For something more elegant, you could try calling the matchers directly which should return a boolean. You can build up the errors in a customer Description class and process it that way... – Toby Aug 25 '15 at 08:19
1

Why you don't put that in maven central? Also what is the license? Is there any limitation using it? Do i have to give my code if i use your library? – isaac.hazan Dec 24 '15 at 08:14
If you look in the source, you'll see it's an Apache 2 licence. See the licence for details (http://www.apache.org/licenses/LICENSE-2.0). You don't have to give your code away if you use it. As to maven central, it's already available via a Maven repository so can be easily downloaded from within a pom.xml (see https://github.com/tobyweston/simple-excel/issues/6 for details) – Toby Dec 28 '15 at 18:28
Not available on Maven Central or jcenter. Meh. – barfuin Feb 27 '20 at 21:13

score 9 · Accepted Answer · answered May 15 '09 at 07:26

9

Here's what I ended up doing (with the heavy lifting being done by DBUnit):

/**
 * Compares the data in the two Excel files represented by the given input
 * streams, closing them on completion
 * 
 * @param expected can't be <code>null</code>
 * @param actual can't be <code>null</code>
 * @throws Exception
 */
private void compareExcelFiles(InputStream expected, InputStream actual)
  throws Exception
{
  try {
    Assertion.assertEquals(new XlsDataSet(expected), new XlsDataSet(actual));
  }
  finally {
    IOUtils.closeQuietly(expected);
    IOUtils.closeQuietly(actual);
  }
}

This compares the data in the two files, with no risk of false negatives from any irrelevant metadata that might be different. Hope this helps someone.

answered May 15 '09 at 07:26

Andrew Swan

13,427
22
69
98

1

Hey using XlsDataSet from DBUnit is a really clever idea; didn't think of that :-). – sleske May 15 '09 at 09:06
1

The latest DbUnit version 2.5 does not work with .xlsx file, only with .xls files. Do you have any idea how to make it work for ".xlsx" files? – Romain Apr 27 '15 at 12:02
No idea sorry, unless you have the option of first exporting the .xlsx files to .xls format. – Andrew Swan Apr 28 '15 at 07:32
I tried it but it does not identify differences between cell values, is this normal? – isaac.hazan Dec 24 '15 at 09:42
@isaac.hazan no, it worked for me, but this was six years ago! :-) – Andrew Swan Dec 29 '15 at 22:02
I also had the problem that 2.5.4 did not work with xlsx files. I changed to use the ExcelComparator from the apache poi examples: https://svn.apache.org/repos/asf/poi/trunk/src/examples/src/org/apache/poi/ss/examples/ExcelComparator.java – Jérôme Sep 13 '18 at 15:33
Doesn't work with DBUnit 2.7 with xlsx files – Bhushan Apr 07 '21 at 04:31

score 6 · Answer 3 · answered Aug 05 '15 at 19:07

I needed to do something similar and was already using the Apache POI library in my project to create Excel files. So I opted to use the included ExcelExtractor interface to export both workbooks as a string of text and asserted that the strings were equal. There are implementations for both HSSF for .xls as well as XSSF for .xlsx.

Dump to string:

XSSFWorkbook xssfWorkbookA = ...;
String workbookA = new XSSFExcelExtractor(xssfWorkbookA).getText();

ExcelExtractor has some options for what all should be included in the string dump. I found it to have useful defaults of including sheet names. In addition it includes the text contents of the cells.

score 6 · Answer 4 · answered May 14 '09 at 23:51

6

A simple file comparison can easily be done using some checksumming (like MD5) or just reading both files.

However, as Excel files contain loads of metadata, the files will probably never be identical byte-for-byte, as James Burgess pointed out. So you'll need another kind of comparison for your test.

I'd recommend somehow generating a "canonical" form from the Excel file, i.e. reading the generated Excel file and converting it to a simpler format (CSV or something similar), which will only retain the information you want to check. Then you can use the "canonical form" to compare with your expected result (also in canonical form, of course).

Apache POI might be useful for reading the file.

BTW: Reading a whole file to check its correctnes would generally not be considere a Unit test. That's an integration test...

answered May 14 '09 at 23:51

sleske

81,358
34
189
227

You're right, I was using the term unit test loosely; in fact it's an integration test run by JUnit. I'll fix that now. – Andrew Swan May 15 '09 at 05:10
Thanks for putting me on the right track; see my DBUnit solution below (or above, depending on votes!) – Andrew Swan May 15 '09 at 07:27
For .xlsx files: MD5 sums will definitely be different, but the directories resulting of unzipping both .xlsx files should be identical (that would be a better way of getting the canonical form) – golimar Feb 23 '17 at 11:28
@golimar: Yes, it should, but it is not :-) (just tested with Excel 2016). There is still metadata that may change. For example, .xlsx files contain the author and time of last modification, and they contain the cell that was active when the document was changed. You still need some canonical form for comparison... – sleske Feb 23 '17 at 11:37
@sleske I tried with Excel 2016 too... the directories were identical when compared with `diff -r` as it compares the contents of files recursively and not author and times. All depends on what you consider identical, for me it's perfect as I wanted to compare contents and ignore things like file saving time – golimar Feb 23 '17 at 11:54

score 3 · Answer 5 · answered May 20 '16 at 13:15

The easiest way I find is to use Tika. I use it like this:

private void compareXlsx(File expected, File result) throws IOException, TikaException {
     Tika tika = new Tika();
     String expectedText = tika.parseToString(expected);
     String resultText = tika.parseToString(result);
     assertEquals(expectedText, resultText);
}


<dependency>
    <groupId>org.apache.tika</groupId>
    <artifactId>tika-parsers</artifactId>
    <version>1.13</version>
    <scope>test</scope>
</dependency>

score 2 · Answer 6 · answered May 14 '09 at 23:26

2

You could use javaxdelta to check whether the two files are the same. It's available from here:

http://javaxdelta.sourceforge.net/

answered May 14 '09 at 23:26

Jonathan Holloway

62,090
32
125
150

I eventually got javaxdelta working after mucking around with its dependency on the "trove" library, but although it works as advertised, sleske is right that I need a canonical comparison, not a byte-by-byte comparison. Thanks anyway for the suggestion, which I've voted up. – Andrew Swan May 15 '09 at 05:56

score 0 · Answer 7 · answered Apr 26 '10 at 12:05

0

You may use Beyond Compare 3 which can be started from command-line and supports different ways to compare Excel files, including:

Comparing Excel sheets as database tables
Checking all textual content
Checking textual content with some formating

answered Apr 26 '10 at 12:05

Wernight

36,122
25
118
131

Command-line tools are ugly to invoke from Java (in my case, JUnit). – Andrew Swan May 24 '12 at 00:57

score 0 · Answer 8 · answered May 31 '19 at 16:26

To test only content of the first sheets in Kotlin (easily can be converted to java).

private fun checkEqualityExcelDocs(doc : XSSFWorkbook, doc1 : XSSFWorkbook) : Boolean{
        val mapOfCellDoc = doc.toList().first().toList().flatMap { row -> row.map { Pair(PivotExcelCreator.IndexInThePivotTable(it.rowIndex,it.columnIndex),it.stringCellValue) }}.toMap()
        val mapOfCellDoc1 = doc1.toList().first().toList().flatMap { row -> row.map { Pair(PivotExcelCreator.IndexInThePivotTable(it.rowIndex,it.columnIndex),it.stringCellValue) }}.toMap()
        if(mapOfCellDoc.size == mapOfCellDoc1.size){
            return mapOfCellDoc.entries.all { mapOfCellDoc1.containsKey(it.key) && mapOfCellDoc[it.key] == mapOfCellDoc1[it.key]}
        }
        return false
    }

data class IndexInThePivotTable(val row: Int, val col: Int)

and in your code add assert

    assertTrue(checkEqualityExcelDocs(expected, actual), "Docs aren't equal!")

as you can see doc.toList().first() will take only the first sheet of document, if you need to compare each sheet respectively change code a little.

Also it is quite good idea to not take into account "" empty strings cells, I didn't need this functionality (As well, simply add this part, if you need).

also it can be useful information

//first doc I've got from outputstream such way
val out = ByteArrayOutputStream()
//some method which writes excel to outputstream
val firstDoc = XSSFWorkbook(ByteArrayInputStream(out.toByteArray()))

and second doc from file to compare with

val secondDoc = XSSFWorkbook(Test::class.java.getClassLoader().getResource("yourfile.xlsx").path)

score 0 · Answer 9 · answered May 14 '09 at 23:48

0

Just found out there's something in commons-io's FileUtils. Thanks for the other answers.

answered May 14 '09 at 23:48

Andrew Swan

13,427
22
69
98

1

This actually doesn't solve my problem, as there seem to be differences between the Excel files that aren't due to geniune content differences. I'll try sleske's suggestion of parsing the files' contents and doing a canonical comparison. – Andrew Swan May 15 '09 at 05:57

score 0 · Answer 10 · answered May 15 '09 at 02:01

0

Please, take a look at the site to compare the binary files, http://www.velocityreviews.com/forums/t123770-re-java-code-for-determining-binary-file-equality.html

Tiger

answered May 15 '09 at 02:01

Tiger

507
1
6
16

Two issues: that code is just doing a byte-for-byte comparison, which as sleske points out, isn't ideal where Excel is concerned. Also, I was after a third-party utility method, not a block of code on some forum that may or may not work. I'll update the question to make this clear. – Andrew Swan May 15 '09 at 05:07

CookieOfFortune · Answer 11 · 2009-05-14T23:50:01.990

-1

Maybe... compare MD5 digests of each file? I'm sure there are a lot of ways to do it. You could just open both files and compare each byte.

EDIT: James stated how the XLS format might have differences in the metadata. Perhaps you should use the same interface you used to generate the xls files to open them and compare the values from cell to cell?

edited May 14 '09 at 23:50

answered May 14 '09 at 23:18

CookieOfFortune

13,836
8
42
58

2

This is a good idea, except you do have to consider if the files will be identical (i.e. that there's not any metadata involved, such as in the XLSX format, for example. This differing metadata will, obviously, produce different hashes). – James B May 14 '09 at 23:30

Easiest way to compare two Excel files in Java?

11 Answers11

Linked