1

I'm a Perl newbie attempting to read an SGML file, parse it then convert it to XML so I can get the key/value pairs of all the elements. I found the SGML::DTDParse and XML::Simple modules as I think this is what I want for the task. My problem is I can't find any documentation on DTDParse or any code examples.

My code is below:

# use modules
use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;

use warnings;
use strict;

my $xml;
my $data;
my $convert;

$/ = undef;
open FILE, "C:/..." or die $!;
my $file = <FILE>;

# Convert the DTD file to XML
dtdParse $file;

# Create the XML object
$xml = new XML::Simple;

# Read the XML file
$data = $xml->XMLin($file);

# print the output
print Dumper($data);

I get an error with the dtdParse $file line as follows: Can't call method "dtdParse" without a package or object reference at "my script name"

Any ideas as to the proper syntax here and is this a valid approach for the task?

I reworked the code the code again and was able to do the dtd parsing with this:

$dtd = SGML::DTDParse::DTD->new();
$dtd->parse($file);
print $dtd;

I don't believe the parsed file can be considered xml though, so maybe the correct way to get all the elements from the parsed file is a for loop.

James Drinkard
  • 15,342
  • 16
  • 114
  • 137

3 Answers3

2

There is no function dtdParse.

dtdparse is a program coming with the SGML::DTDParse module.

You can use it to dump xml from a dtd file. A quick example how you could use dtdparse:

use strict;
use warnings;

use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;

# Convert the DTD file to XML
my $result = qx{dtdparse test.dtd};

# Create the XML object
my $xml = new XML::Simple;

# Read the XML file
$result = $xml->XMLin($result);

# print the output
$Data::Dumper::Indent = 1;
print Dumper($result);

where test.dtd looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT DatabaseInventory (DatabaseName+)>
<!ELEMENT DatabaseName (   GlobalDatabaseName
                         , OracleSID
                         , DatabaseDomain
                         , Administrator+
                         , DatabaseAttributes
                         , Comments)
>
<!ELEMENT GlobalDatabaseName (#PCDATA)>
<!ELEMENT OracleSID          (#PCDATA)>
<!ELEMENT DatabaseDomain     (#PCDATA)>
<!ELEMENT Administrator      (#PCDATA)>
<!ELEMENT DatabaseAttributes EMPTY>
<!ELEMENT Comments           (#PCDATA)>

<!ATTLIST Administrator       EmailAlias CDATA #REQUIRED>
<!ATTLIST Administrator       Extension  CDATA #IMPLIED>
<!ATTLIST DatabaseAttributes  Type       (Production|Development|Testing) #REQUIRED>
<!ATTLIST DatabaseAttributes  Version    (7|8|8i|9i) "9i">

<!ENTITY AUTHOR "Jeffrey Hunter">
<!ENTITY WEB    "www.iDevelopment.info">
<!ENTITY EMAIL  "jhunter@iDevelopment.info">

Which will output something like this:

$VAR1 = {
  'namecase-entity' => '0',
  'created-by' => 'DTDParse V2.00',
  'public-id' => '',
  'version' => '1.0',
  'attlist' => {
    'DatabaseAttributes' => {
      'attribute' => {
        'Type' => {
          'value' => 'Production Development Testing',
          'type' => '#REQUIRED',
          'default' => '',
          'enumeration' => 'yes'
        },
        'Version' => {
          'value' => '7 8 8i 9i',
          'type' => '',
          'default' => '9i',
          'enumeration' => 'yes'
        }
      },
      'attdecl' => '  Type       (Production|Development|Testing) #REQUIRED'
    },
    'Administrator' => {
      'attribute' => {
        'EmailAlias' => {
          'value' => 'CDATA',
          'type' => '#REQUIRED',
          'default' => ''
        },
        'Extension' => {
          'value' => 'CDATA',
          'type' => '#IMPLIED',
          'default' => ''
        }
      },
      'attdecl' => '       EmailAlias CDATA #REQUIRED'
    }
  },
  'element' => {
    'OracleSID' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'Comments' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseAttributes' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'empty' => {}
      },
      'content-model' => {
        'empty' => {}
      }
    },
    'GlobalDatabaseName' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'Administrator' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseInventory' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'sequence-group' => {
          'element-name' => {
            'occurrence' => '+',
            'name' => 'DatabaseName'
          }
        }
      },
      'content-model' => {
        'sequence-group' => {
          'element-name' => {
            'occurrence' => '+',
            'name' => 'DatabaseName'
          }
        }
      }
    },
    'DatabaseDomain' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseName' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'sequence-group' => {
          'element-name' => {
            'Comments' => {},
            'OracleSID' => {},
            'DatabaseAttributes' => {},
            'DatabaseDomain' => {},
            'GlobalDatabaseName' => {},
            'Administrator' => {
              'occurrence' => '+'
            }
          }
        }
      },
      'content-model' => {
        'sequence-group' => {
          'element-name' => {
            'Comments' => {},
            'OracleSID' => {},
            'DatabaseAttributes' => {},
            'DatabaseDomain' => {},
            'GlobalDatabaseName' => {},
            'Administrator' => {
              'occurrence' => '+'
            }
          }
        }
      }
    }
  },
  'entity' => {
    'WEB' => {
      'text-expanded' => 'www.iDevelopment.info',
      'text' => 'www.iDevelopment.info',
      'type' => 'gen'
    },
    'AUTHOR' => {
      'text-expanded' => 'Jeffrey Hunter',
      'text' => 'Jeffrey Hunter',
      'type' => 'gen'
    },
    'EMAIL' => {
      'text-expanded' => 'jhunter@iDevelopment.info',
      'text' => 'jhunter@iDevelopment.info',
      'type' => 'gen'
    }
  },
  'system-id' => 'test.dtd',
  'unexpanded' => '1',
  'created-on' => 'Tue Feb 28 00:44:52 2012',
  'declaration' => '',
  'xml' => '0',
  'title' => '?untitled?',
  'namecase-general' => '1'
};
matthias krull
  • 4,389
  • 3
  • 34
  • 54
  • I'm not sure how to use dtdparse here? I don't see where it's declared anywhere and I get this error: "'dtdparse' is not recognized as an internal or external command, operable program or batch file" when I step through your code. – James Drinkard Feb 28 '12 at 15:58
  • Then you haven't added `dtdparse`'s directory to your `$PATH`. – reinierpost Feb 28 '12 at 16:42
  • You probably don't want to use it anyway, unless you are indeed trying to parse DTDs. – reinierpost Feb 28 '12 at 16:43
  • I saw that I didn't add the module in to my use statement, that got me past this error. Yes, I need to parse a dtd, so I need to use this module. – James Drinkard Feb 28 '12 at 16:54
  • Well, i should have pointed out more clearly what is happening. The example script is capturing the output of an external program with `my $result = qx{dtdparse test.dtd}`. To use dtdparse this way you have to have `dtdparse` in your systems path or you have to give the full path yourself like `my $result = qx{C:\path\to\dtdparse test.dtd}` – matthias krull Feb 28 '12 at 17:00
  • I was able to get this to work finally, but I don't think the xml libraries will work on the parsed sgml dtd, at least I couldn't. Thank you for the help nonetheless! – James Drinkard Feb 28 '12 at 23:10
2

dtdparse isn't a Perl function; it's a script for processing an SGML DTD from the command line. The documentation for the script is here.

Since you want to do the parsing in your own Perl script, you can use the source of dtdparse as an example if you like.

wes
  • 7,795
  • 6
  • 31
  • 41
2

For SGML, use James Clark's SP, which includes an SGML to XML converter called SX. This is a professional system, and it does have documentation. If you need Perl in there, use system or open to call SP/SX as an external program.

Lumi
  • 14,775
  • 8
  • 59
  • 92