I'm using Java 11 (AdoptOpenJDK 11.0.5 2019-10-15) on Windows 10. I have some legacy XHTML 1.1 files I want to process. They take the following general form:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>XHTML 1.1 Skeleton</title>
</head>
<body>
</body>
</html>
To keep the parser from waiting connecting to the Internet, I install a custom EntityResolver
that loads known entities (from their public IDs, such as -//W3C//ELEMENTS XHTML Inline Style 1.0//EN
) stored in the the program resources. This DefaultEntityResolver
class also prints debug messages indicating which entities the parser is loading.
Here is the basic form of my parsing:
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
documentBuilder.setEntityResolver(DefaultEntityResolver.getInstance());
final Document document;
try (InputStream inputStream = new BufferedInputStream(getClass().getResourceAsStream("xhtml-1.1-test.xhtml"))) {
document = documentBuilder.parse(inputStream);
}
Because of the debug messages in DefaultEntityResolver
, I can see that the parser loaded the following entities, in this order.
-//W3C//DTD XHTML 1.1//EN
(http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd
)-//W3C//ELEMENTS XHTML Inline Style 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlstyle-1.mod
)-//W3C//ENTITIES XHTML Datatypes 1.0//EN
(http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
)-//W3C//ENTITIES XHTML Modular Framework 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-framework-1.mod
)-//W3C//ENTITIES XHTML Datatypes 1.0//EN
(http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
)-//W3C//ENTITIES XHTML Qualified Names 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-qname-1.mod
)-//W3C//ENTITIES XHTML Intrinsic Events 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-events-1.mod
)-//W3C//ENTITIES XHTML Common Attributes 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-attribs-1.mod
)-//W3C//ENTITIES XHTML 1.1 Document Model 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml11-model-1.mod
)-//W3C//ENTITIES XHTML Character Entities 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-charent-1.mod
)-//W3C//ENTITIES Latin 1 for XHTML//EN
(http://www.w3.org/MarkUp/DTD/xhtml-lat1.ent
)-//W3C//ENTITIES Symbols for XHTML//EN
(http://www.w3.org/MarkUp/DTD/xhtml-symbol.ent
)-//W3C//ENTITIES Special for XHTML//EN
(http://www.w3.org/MarkUp/DTD/xhtml-special.ent
)-//W3C//ELEMENTS XHTML Text 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-text-1.mod
)-//W3C//ELEMENTS XHTML Inline Structural 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlstruct-1.mod
)-//W3C//ELEMENTS XHTML Inline Phrasal 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlphras-1.mod
)-//W3C//ELEMENTS XHTML Block Structural 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-blkstruct-1.mod
)-//W3C//ELEMENTS XHTML Block Phrasal 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-blkphras-1.mod
)-//W3C//ELEMENTS XHTML Hypertext 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-hypertext-1.mod
)-//W3C//ELEMENTS XHTML Lists 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-list-1.mod
)-//W3C//ELEMENTS XHTML Editing Elements 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-edit-1.mod
)-//W3C//ELEMENTS XHTML BIDI Override Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-bdo-1.mod
)-//W3C//ELEMENTS XHTML Ruby 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-ruby-1.mod
)-//W3C//ELEMENTS XHTML Presentation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-pres-1.mod
)-//W3C//ELEMENTS XHTML Inline Presentation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-inlpres-1.mod
)-//W3C//ELEMENTS XHTML Block Presentation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-blkpres-1.mod
)-//W3C//ELEMENTS XHTML Link Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-link-1.mod
)-//W3C//ELEMENTS XHTML Metainformation 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-meta-1.mod
)-//W3C//ELEMENTS XHTML Base Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-base-1.mod
)-//W3C//ELEMENTS XHTML Scripting 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-script-1.mod
)-//W3C//ELEMENTS XHTML Style Sheets 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-style-1.mod
)-//W3C//ELEMENTS XHTML Images 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-image-1.mod
)-//W3C//ELEMENTS XHTML Client-side Image Maps 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-csismap-1.mod
)-//W3C//ELEMENTS XHTML Server-side Image Maps 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-ssismap-1.mod
)-//W3C//ELEMENTS XHTML Param Element 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-param-1.mod
)-//W3C//ELEMENTS XHTML Embedded Object 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-object-1.mod
)-//W3C//ELEMENTS XHTML Tables 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-table-1.mod
)-//W3C//ELEMENTS XHTML Forms 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-form-1.mod
)-//W3C//ELEMENTS XHTML Document Structure 1.0//EN
(http://www.w3.org/MarkUp/DTD/xhtml-struct-1.mod
)
Note that some of these entities no longer exist at the indicated URL; nevertheless my DefaultEntityResolver
has these entities already stored and keyed to their public IDs, and thus still provides them to the parser.
So far so good. But when I immediately call document.normalizeDocument()
, the program pauses and then prints:
[Error] xhtml11.dtd:129:43: The entity "LanguageCode.datatype" was referenced, but not declared.
[Error] xhtml11.dtd:130:44: The entity "LanguageCode.datatype" was referenced, but not declared.
[Error] xhtml11.dtd:194:47: The entity "Common.attrib" was referenced, but not declared.
Note this is not my program printing these errors; it's apparently something inside document.normalizeDocument()
. In addition, here are two other curiosities:
- This does not happen if I run my application from within Eclipse.
- This does not happen if I disable my network connection.
My best guess is that document.normalizeDocument()
is not using the custom EntityResolver
I installed in the document builder. Because some of the entities no longer exist at their expected URLs (e.g. http://www.w3.org/TR/xhtml11/DTD/xhtml-datatypes-1.mod
), they cannot be loaded and therefore the indicated referenced entities never get defined. The web server, however, takes a long time to responsd that the entities are missing (as you can test manually), which makes the program seem to pause. This also might explain why the error messages don't appear when my network connection is disabled; I'm guessing none of the external entities can be loaded, failing immediately, but this is not considered an error. (None of this explains why this works with no pause or error message inside Eclipse, though.)
In fact the DOMConfiguration
documentation hints that I need to set some sort of resource-resolver
parameter, although I'm not sure why DOMConfiguration
doesn't default to the entity resolver I set in the original document builder used to parse the XML document.
To make things a little stranger, I put the skeleton XHTML 1.1 document above in my resources, and created a unit test exactly like the code above, followed by document.normalizeDocument()
, and the test passed with no pause and no errors, even from the command line!
But then if I put a loop for(int i = 0; i < 100; i++)
in the unit test; to load, parse, and normalize the document 100 times (but using the same DocumentBuilderFactory
); my unit test crashes the forked unit test JVM altogether!!
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.22.1:test (default-test) on project [...]: There are test failures.
Please refer to [...]\xml\target\surefire-reports for the individual test results.
Please refer to dump files (if any exist) [date].dump, [date]-jvmRun[N].dump and [date].dumpstream.
The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Command was cmd.exe /X /C [...]
Process Exit Code: 0
Crashed tests:
[...].XmlDomTest
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?
Command was cmd.exe /X /C [...]
Process Exit Code: 0
Crashed tests:
com.globalmentor.xml.XmlDomTest
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:669)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:282)
at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:245)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1183)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:1011)
at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:857)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:957)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:289)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:193)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:347)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81)
at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:957)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:289)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:193)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:566)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:282)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:225)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:406)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:347)
Caused by: org.apache.maven.plugin.MojoExecutionException: There are test failures.
So I'm thinking I want to avoid document.normalizeDocument()
, but I welcome any clarifications of this behavior.