6

I have a following sample sgml data from my .sgm file and I want convert this in to xml

<?dtd name="viewed">
<?XMLDOC>
<viewed >xyz
<cite>
<yr>2010
<pno cite="2010 abc 1188">10
<?/XMLDOC>

<?XMLDOC>
<viewed>abc.
<cite>
<yr>2010
<pno cite="2010 xyz 5133">9
<?/XMLDOC>

Output should be like this:

<index1>
    <num viewed="xyz"/>
    <heading>xyz</heading>
    <index-refs>
      <link  caseno="2010 abc 1188</link>
    </index-refs>
  </index-1>
<index1>
    <num viewed="abc"/>
    <heading>abc</heading>
    <index-refs>
      <link  caseno="2010 xyz 5133</link>
    </index-refs>
  </index-1>

Can this be done in c# or can we use xslt 2.0 to do this kind of conversion?

Cœur
  • 37,241
  • 25
  • 195
  • 267
atif
  • 1,137
  • 7
  • 22
  • 35
  • You need an SGML parser to do this properly. XSLT 2.0 doesn't provide such a parser; you could theoretically write one in XSLT 2.0 but it would be a huge pain. I don't know what support there is for parsing SGML in C#. – LarsH Dec 15 '10 at 17:00

4 Answers4

6

Others have already given some good advice. Here's one way of putting it all together by first converting the input SGML to well-formed XML and then using XSLT to transform that to the exact format you need.

Converting your SGML to well-formed XML

The osx tool from the OpenSP package suggested by mzjn is a good tool for this. Since your SGML markup omits end tags, you need to have a DTD from which the correct nesting of elements can be determined. If you don't have a DTD, you need to create one. For your example input, it could be as simple as this:

<!ELEMENT toplevel o o (viewed)+>

<!ELEMENT viewed - o (#PCDATA,cite)>
<!ELEMENT cite - o (yr,pno)>
<!ELEMENT yr - o (#PCDATA)>
<!ELEMENT pno - o (#PCDATA)>

<!ATTLIST pno cite CDATA #REQUIRED>

You also need to add a proper doctype declaration to the beginning of your SGML file. Assuming you have your DTD in file viewed.dtd.

<!DOCTYPE toplevel SYSTEM "viewed.dtd" >

With this addition, you should now be able use osx to convert the SGML to XML. (It won't be able to convert the processing instructions which start with a / as those are not allowed in XML, and will emit a warning about them.)

osx input.sgm > input.xml

Transforming the resulting XML to your desired format

For the above case, you could use something like the following XSLT stylesheet:

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>
  <xsl:template match="VIEWED">
    <index1>
      <num viewed="{normalize-space(text())}"/>
      <heading>
        <xsl:value-of select="normalize-space(text())"/>
      </heading>
      <index-refs>
        <xsl:apply-templates select="CITE"/>
      </index-refs>
    </index1>
  </xsl:template>

  <xsl:template match="CITE">
    <link caseno="{PNO/@CITE}"/>
  </xsl:template>

</xsl:stylesheet>
Community
  • 1
  • 1
Jukka Matilainen
  • 9,608
  • 1
  • 25
  • 19
  • Hi, Jukka Matilainen, "osx input.sgm > input.xml" Could you please provide Link to download that "OSX" EXE and supporting files... – Thirusanguraja Venkatesan Feb 20 '14 at 09:29
  • 1
    @Thirusanguraja Venkatesan, you can find download links from the sourceforge downloads page for the openjade/opensp project: http://sourceforge.net/projects/openjade/files/opensp/1.5.2/ – Jukka Matilainen Feb 20 '14 at 20:45
  • **osx.exe** not available in **OpenSP-1.5.2-win32.zip**, but I am followed the instruction given in http://openjade.sourceforge.net/doc/build.htm , Finally i got EXE files, Thanks for Good Guidance.. – Thirusanguraja Venkatesan Feb 21 '14 at 07:26
3

Maybe you can use the osx SGML to XML converter. It is part of the OpenSP package (based on SP, originally written by James Clark).

mzjn
  • 48,958
  • 13
  • 128
  • 248
-1

Can the SGML-Reader, originally developed by Chris Lovett help in solving this problem?

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
-1

Why XSLT? I doubt you can map SGML to XML Infoset or XDM...

I think that you should better use the language made for this task: DSSSL (Document Style Semantics and Specification Language)

This is the predecessor of XSLT. The author is James Clark. And this is the his site.