• +49-(0)721-402485-12
Ihre Experten für XML, XQuery und XML-Datenbanken

The DTD

In the Document Type Definition (DTD), the element types permitted in the document can be defined with their content models and attributes. A DTD is thus a schematic description of the XML document, which, however, is optional. A schema for a document can also be described with another schema language, for example XML Schema. If a validation is not required, it is also allowed to have no schematic description at all.

In case a DTD exists, it is introduced by <!DOCTYPE. Then, the name of the document type and, enclosed by squared brackets, the element type and the attribute type definitions follow.

<!DOCTYPE Hospital [ ... ]>

For each element type, introduced by <!ELEMENT, the name and the content model are defined. Apart from the empty content, a combination of child elements or textual content or a mixture thereof is available for the choice of the content model. In these cases, the content model is enclosed by round brackets. A mere textual content is specified by (#PCDATA). However, it is not possible to specify the type in detail (for example numeric data):

<!ELEMENT Name (#PCDATA)>

An empty element is characterised in the type definition by the content model EMPTY. Elements may in turn contain further elements (complex element content). In this case, the permitted elements and conditions can be specified by means of their combination. If several sub-elements shall appear in a specific order, they are separated by a comma ("sequence"). A choice of several possible elements is specified by a vertical bar (pipe).

<!ELEMENT Bed EMPTY>
<!ELEMENT Hospital (Name, Wards)>
<!ELEMENT Address (City, (Street | PostOfficeBox))>

The first example shows the definition of an empty element, the second example the definition of a Hospital element type, whereby each element of this type must contain as children a Name element and a Wards element in this order. And the third example shows the definition of an Address element type, whereby each element of this type must at first contain a City child element and then either a Street element or a PostOfficeBox element.

If nothing is explicitly specified, the frequency is restricted to exactly 1, which means an appropriate element must appear exactly once in an element of the described type. A "?" indicates that such an element appears no more than once (so it is optional), a "+" means that an element appears at least once and a "*" means that an element may appear any number of times (or is left out completely). In the following example, at least one first name must appear, whereas any number of phone numbers and fax numbers in any order is allowed:

<!ELEMENT Person (Name, FirstName+, (Phone | Fax)*)>

For mixed content, all permitted element types are connected with a vertical bar to #PCDATA. A limitation of the cardinality is not possible – as cardinality of the content model, the "*" must be indicated:

<!ELEMENT Finding (#PCDATA | b)*>

In some cases, such as for XHTML, elements of almost all defined element types may appear as content of an element. For a compact notation in this case, the keyword ANY was introduced. It allows all element types specified in the DTD.

Finally, also the attributes permitted for an element can be described in the DTD. Whereas element names must be unique in the entire document, attribute names must only be unique within their element. Therefore, they are defined by indicating the related element type (as ATTLIST). An attribute list may define several attribute types for a related element type – however, there may also exist various attribute lists for one element type. The general formula is:

AttlistDecl   ::= <!ATTLIST Name AttDef*>

AttDef        ::= Name AttType DefaultDecl

DefaultDecl   ::= #REQUIRED | #IMPLIED | #FIXED AttValue | AttValue

An example for such an attribute declaration is:

<!ATTLIST Ward Manager CDATA #REQUIRED>

The following attribute types can be used in a DTD:

  • String
    The string is referred to as CDATA.
  • Identifier type ID
    The values of all ID attributes must be unique in the document (even if they are different attributes) and are subject to syntactical restrictions.
  • Reference type for attributes of the ID type
    Depending on whether it is a single reference or a list of such references, the type is called IDREF or IDREFS.
  • Single token (NMTOKEN) or list of tokens (NMTOKENS)
    A single token ("name token") consists of a sequence of letters, numbers and certain special characters, but without space characters.
  • Enumeration types
    The individual values are separated by a vertical bar.

Moreover, the attribute definition also contains information on the frequency and default values.

  • #REQUIRED
    The attribute must explicitly occur in the element.
  • #IMPLIED
    The attribute may be omitted and, as a consequence, has no default value.
  • Indication of a default value 
    This value applies if the attribute in an element does not explicitly have another value.
  • #FIXED
    Together with the indication of a value it is an error if the attribute appears with another value in the document. The indicated value applies in this case.

During the processing, attribute values are subject to a normalisation by a XML processor. This also includes, besides the resolving of character and entity references (described in detail in the following), the transformation of whitespace: all whitespace characters (U+0009, U+000A, U+000D) are first transformed to space characters (U+0020).

The order of the definitions in the DTD does not matter. It can be interchanged without modfying the semantics. All definitions in the DTD are global. This means all element type definitions may refer to all other defined element types (and to themselves) in the content definition. Therefore, for an element type, several "parent types" can be defined. In this way, a direct or an indirect recursion becomes also possible at type level, as demonstrated in the following example of a definition for a tree:

<!ELEMENT Nodes (Nodes*)>
<!ATTLIST Nodes Name CDATA #REQUIRED>

Conversely the globality of the definitions also means that element types used in different contexts cannot have a context-specific content model. This is only possible if the element type is named differently. As you can see, the type specifications for attributes and even more for elements are rather unspecific. There are more sophisticated possibilities for the type definition with XML Schema.

A DTD may also be external to the document. In this case, the document only contains one reference, for example:

<!DOCTYPE Hospital SYSTEM "http://www.xquery-book.de/hosptal.dtd">

Furthermore, also the combination of internal and external DTD is possible. In this case, the definitions of the internal DTD overwrite the definitions of the external DTD. As a consequence, documents referring to the same DTD can nevertheless have different type definitions. Some of the introduced concepts can be seen in the extended example document:

<?xml version="1.0"?>
<!DOCTYPE Hospital [
<!ELEMENT Hospital (Name, Wards)>
<!ELEMENT Name (#PCDATA)>
<!ELEMENT Wards (Ward*)>
<!ELEMENT Ward (Name, Location)>
<!ATTLIST Ward Manager CDATA #REQUIRED>
]>
<!-- document created on 1.1.2004 -->
<?xml:stylesheet type="text/xsl" href="stylesheets/print.xsl" ?>
<Hospital>
<Name>Hochwaldklinik</Name>
<Wards>
<Ward Manager="Nurse_01">
<Name>Emergency room</Name>
<Location>Suburb</Location>
</Ward>
</Wards>
</Hospital>

 

Source: "XQuery – Grundlagen und fortgeschrittene Methoden", dpunkt-Verlag, Heidelberg (2004)

<< backnext >>