Sunday, June 1, 2008

What is SGML?

SGML (Standard Generalized Markup Language) is a language for defining markup languages such as HTML and for specifying the rules for tagging elements in a document. SGML itself is not a markup language; rather, it is a language to create markup languages. SGML supports the definition of markup languages that are hardware- and software-independent. SGML was developed and standardized by the International Organization for Standardization (ISO), which published it in 1986. Because of SGML's complexity, HTML and XML were developed as simplified subsets of SGML for use on the Internet.

Some SGML history

In 1969, Charles Goldfarb leads an IBM research project on integrated law office information systems. With E. Mosher and R. Lorie he invented the Generalized Markup Language (GML) as a means of allowing text editing, formatting, and information retrieval subsystems to share documents.

The first working draft of the SGML standard was published in 1980 by ANSI. By 1983, the sixth working draft is recommended as an industry standard (GCA 101-1983). Major adopters included US IRS and DoD.

A draft ISO standard was published in October 1985, and was adopted by the Office of Official Publications of the EU. Another year of review and comment resulted in the final text, which was published in record time after approval (ISO 8879:1986)

Characteristics of SGML

Descriptive Markup
  • Markup codes categorize parts of a document; they do not tell what processing is to be carried out at particular points in a document (procedural markup).

    E.g.:

    • ``the following item is a paragraph''
    • ``skip down one line, move 5 quads right''

    In SGML, instructions needed to process a document for some particular purpose (for example, to format it) are sharply distinguished from the descriptive markup which occurs within the document. Usually, they are collected outside the document in separate procedures or programs.

  • Document Types

    Documents are regarded as having types, and these are expressed by document type definitions (DTD), which enforce markup for that document type.

  • Data Independence

    SGML encoded documents should be transportable from one hardware and software environment to another without loss of information: platforms differ in character sets, file-naming conventions, interpretation of bytes...

    SGML provides a general purpose mechanism for string substitution, that is, a simple machine-independent way of stating that a particular string of characters in the document should be replaced by some other string when the document is processed.

No comments: