I have well formed HTML files. To turn them into SGML do I just switch the extension or is there more to do?
4 Answers
It's going to depend on what version of HTML. From SGML:
While HTML was developed partially independently and in parallel with SGML, its creator Tim Berners-Lee, intended it to be an application of SGML. The design of HTML (Hyper Text Markup Language) was therefore inspired by SGML tagging, but, since no clear expansion and parsing guidelines were established, most actual HTML documents are not valid SGML documents. Later, HTML was reformulated (version 2.0) to be more of an SGML application, however, the HTML markup language has many legacy- and exception- handling features that differ from SGML's requirements. HTML 4 is an SGML application that fully conforms to ISO 8879 – SGML.
The charter for the recently revived World Wide Web Consortium HTML Working Group says, "the Group will not assume that an SGML parser is used for 'classic HTML'". Although HTML syntax closely resembles SGML syntax with the default reference concrete syntax, HTML5 abandons any attempt to define HTML as an SGML application, explicitly defining its own parsing rules which more closely match existing implementations and documents. (It does, however, define an alternative XML-based XHTML serialization, which does conform to SGML (WWW).)
So it looks like you probably already have SGML if you have well-formed HTML 4 or XHTML. Anything earlier (unlikely) or later (HTML 5) and you may have to make some changes to the document itself.

- 1
- 1

- 398,270
- 210
- 566
- 880
-
Bill the Lizard is right, but hope you don't use HTML3.2 (or less) any more :) – Dima Sep 05 '12 at 22:50
-
@Dima I think I still have an HTML 3 book around here somewhere. :) – Bill the Lizard Sep 05 '12 at 22:52
-
=))) Hope it's only for the memories, not for daily use ;) – Dima Sep 05 '12 at 22:55
It is enough to change extension, but in fact you do not have to do anything to have SGML from HTML because HTML is fully based on SGML, so whan you have HTML you already have SGML.
SGML is mother of markup. XML is also based on SGML. So when you have some XML, you automatically have SGML. XHTML is based on XML, so when you have XHTML, you have XML and SGML.

- 1,717
- 15
- 34
-
@DevNull, you are not allowed to write
in XML (in HTML of course, but still this rule was broken because many developers were to lazy to write correctly). You must close your tag in XML. Either
or
. Read this official document: http://www.w3.org/TR/NOTE-sgml-xml-971215. – Dima Sep 06 '12 at 11:39 -
Exactly my point. They are similar, but still different. (Not sure where my other comment went.) – Daniel Haley Sep 06 '12 at 16:28
-
As I remember you have stated that in SGML you write
, and in XML:
. And that was your point of difference. However you are not allowed to write
in xml. You either have to close
with or write single tag
. – Dima Sep 06 '12 at 16:37 -
I wrote that SGML was `
` and XML was `
` and also a processing instruction in SGML is `` and a pi in XML is `` just showing examples of how they differ. If you have XHTML that has any of these differences, you can't just change the extension and have SGML. We will have to agree to disagree I suppose. – Daniel Haley Sep 06 '12 at 18:45 -
Actually, a PI is ...> only in the Reference Concrete Syntax. The closing delimiter PIC can be redefined in the SGML declaration from the RCS default of '>' to anything else you like, such as '?>'. XML is an SGML profile with a variant syntax, but since the reference to SGML is non-normative, the [SGML declaration for XML](http://www.w3.org/TR/NOTE-sgml-xml-971215) is only advisory. – arayq2 Dec 31 '12 at 22:10
An HTML document that validates is an SGML document. Whether this has any practical impact is a different issue, but such a document can be processed using general SGML tools (which still exist).
Validity is not required, however, for being SGML. And SGML document need not have a document type declaration at all. But if it does and if it validates, then this proves that it is indeed SGML (and not just SGML-like), since SGML validators check the basic syntax too, in addition to checking conformance to the DTD.
There is no well-formedness concept in SGML or in SGML-based HTML, but the XML well-formedness concept just means that the document is XML (and not just XML-like) in the first place, i.e. uses the general syntax of XML correctly.

- 195,524
- 37
- 270
- 390
As long as your HTML validates to one of the SGML HTML DTDs, you already have SGML.
Contrary to other answers, XML/XHTML is not valid SGML.
Also with SGML there isn't really such a thing as "well-formed" SGML, only valid (to a DTD) SGML.

- 51,389
- 6
- 69
- 95
-
sorry, but you are wrong. Read this official document http://www.w3.org/TR/NOTE-sgml-xml-971215. – Dima Sep 06 '12 at 11:42
-
@Dima - What am I wrong about? Is there something specific in that document you can point out? – Daniel Haley Sep 06 '12 at 15:08
-
Simply saying: XML IS valid SGML, however if you have valid SGML it does not mean you have valid XML. – Dima Sep 06 '12 at 15:25
-
@Dima - XML is based on SGML, but how can you say XML is valid SGML? There are syntactical differences that prevent this. They are similar, but completely different. Also, valid SGML must parse to an SGML DTD and there are major differences between an XML DTD and an SGML DTD. – Daniel Haley Sep 06 '12 at 16:00
-
SGML stands for Standard Generalized Markup Language... And with XML it's inheritance, don't you see it? It's like with genders: if you are a boy, you are definitely a human, but if you are a human, it does not mean you are a boy, you may be a girl (but still a human). – Dima Sep 06 '12 at 16:38