As a data analyst, I am constantly running across files with structured data that are in some proprietary format and resist normal XML parsing.
For example, I have an archive of about a hundred documents that all begin with this:
<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
I have included an abridged example of the document below, don't read it if you're offended by cloning.
At any rate, is there a way to query this without having DTD or namespace or URI or whatever it is I need? I'm ok using SQL Server 2012+ or xquery or, I dunno, php or vba.
<!DOCTYPE DOCUMENT PUBLIC "-//Gale Research//DTD Document V2.0//EN">
<document synfileid="MCIESS0044">
<galedata><project>
<projectname>
<title>Opposing Viewpoints Resource Center</title>
</projectname>
</project></galedata>
<doc.head>
<title>Cloning</title>
</doc.head>
<doc.body>
<para>A clone is an identical copy of a plant or animal, produced from the genetic material of a single organism. In 1996 scientists in Britain created a sheep named Dolly, the first successful clone of an adult mammal. Since then, scientists have successfully cloned other animals, such as goats, mice, pigs, and rabbits. People began wondering if human beings would be next. The question of whether human cloning should be allowed, and under what conditions, raises a number of challenging scientific, legal, and ethical issues—including what it means to be human.</para>
<head n="1">Scientific Background</head>
<para>People have been cloning plants for thousands of years. Some plants produce offspring without any genetic material from another organism. In these cases, cloning simply requires cutting pieces of the stems, roots, or leaves of the plants and then planting the cuttings. The cuttings will grow into identical copies of the originals. Many common fruits, vegetables, and ornamental plants are produced in this way from parent plants with especially desirable characteristics.</para>
<para>[lots of excluded text] Perhaps the most perplexing question of all: How would clones feel about their status? As a copy, would they lack the sense of uniqueness that is part of the human condition? As yet, such questions have no answers—perhaps they never will. The debate about cloning, both animal and human, however, will certainly continue. The technology exists to create clones. How will society use this technology?</para>
</doc.body>
</document>