Action against software patentsGnome2 LogoW3C LogoRed Hat Logo
Made with Libxml2 Logo

The XML C parser and toolkit of Gnome

Validation & DTDs

Main Menu
Related links

Table of Content:

  1. General overview
  2. The definition
  3. Simple rules
    1. How to reference a DTD from a document
    2. Declaring elements
    3. Declaring attributes
  4. Some examples
  5. How to validate
  6. Other resources

General overview

Well what is validation and what is a DTD ?

DTD is the acronym for Document Type Definition. This is a descriptionofthe content for a family of XML files. This is part of the XML1.0specification, and allows one to describe and verify that a givendocumentinstance conforms to the set of rules detailing its structure andcontent.

Validation is the process of checking a document against a DTD(moregenerally against a set of construction rules).

The validation process and building DTDs are the two most difficultpartsof the XML life cycle. Briefly a DTD defines all the possible elementsto befound within your document, what is the formal shape of your documenttree(by defining the allowed content of an element; either text, aregularexpression for the allowed list of children, or mixed content i.e.both textand children). The DTD also defines the valid attributes for allelements andthe types of those attributes.

The definition

The W3C XML Recommendation(Tim Bray's annotated versionofRev1):

(unfortunately) all this is inherited from the SGML world, the syntaxisancient...

Simple rules

Writing DTDs can be done in many ways. The rules to build them if youneedsomething permanent or something which can evolve over time can beradicallydifferent. Really complex DTDs like DocBook ones are flexible butquiteharder to design. I will just focus on DTDs for a formats with a fixedsimplestructure. It is just a set of basic rules, and definitely notexhaustive norusable for complex DTD design.

How to reference a DTD from a document:

Assuming the top element of the document is specand the dtdisplaced in the file mydtdin the subdirectorydtdsofthe directory from where the document were loaded:

<!DOCTYPE spec SYSTEM "dtds/mydtd">

Notes:

  • The system string is actually an URI-Reference (as defined in RFC 2396) so you can useafull URL string indicating the location of your DTD on the Web. This isareally good thing to do if you want others to validate yourdocument.
  • It is also possible to associate a PUBLICidentifier(amagic string) so that the DTD is looked up in catalogs on the clientsidewithout having to locate it on the web.
  • A DTD contains a set of element and attribute declarations, buttheydon't define what the root of the document should be. This isexplicitlytold to the parser/validator as the first element oftheDOCTYPEdeclaration.

Declaring elements:

The following declares an element spec:

<!ELEMENT spec (front, body, back?)>

It also expresses that the spec element contains onefront,one bodyand one optionalbackchildren elements inthis order. The declaration of oneelement of the structure and its contentare done in a single declaration.Similarly the following declaresdiv1elements:

<!ELEMENT div1 (head, (p | list | note)*, div2?)>

which means div1 contains one headthen a series ofoptionalp, lists and notes and thenanoptional div2. And last but not least an element cancontaintext:

<!ELEMENT b (#PCDATA)>

bcontains text or being of mixed content (text and elementsinno particular order):

<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>

p can contain text or a,ul,b, i or emelements inno particularorder.

Declaring attributes:

Again the attributes declaration includes their content definition:

<!ATTLIST termdef name CDATA #IMPLIED>

means that the element termdefcan have anameattribute containing text (CDATA) and which isoptional(#IMPLIED). The attribute value can also be definedwithin aset:

<!ATTLIST list type(bullets|ordered|glossary)"ordered">

means listelement have a typeattribute with3allowed values "bullets", "ordered" or "glossary" and which defaultto"ordered" if the attribute is not explicitly specified.

The content type of an attribute can be text(CDATA),anchor/reference/references(ID/IDREF/IDREFS),entity(ies)(ENTITY/ENTITIES) orname(s)(NMTOKEN/NMTOKENS). The following definesthat achapterelement can have an optionalidattributeof type ID, usable for reference fromattribute of typeIDREF:

<!ATTLIST chapter id ID #IMPLIED>

The last value of an attribute definition can be#REQUIREDmeaning that the attribute has to be given,#IMPLIEDmeaning that it is optional, or the default value(possibly prefixed by#FIXEDif it is the only allowed).

Notes:

  • Usually the attributes pertaining to a given element are declared inasingle expression, but it is just a convention adopted by a lot ofDTDwriters:
    <!ATTLIST termdef
              id      ID      #REQUIRED
              name    CDATA   #IMPLIED>

    The previous construct defines bothidandnameattributes for the elementtermdef.

Some examples

The directory test/valid/dtds/in the libxml2distributioncontains some complex DTD examples. The example in thefiletest/valid/dia.xmlshows an XML file where the simple DTDisdirectly included within the document.

How to validate

The simplest way is to use the xmllint program included with libxml.The--validoption turns-on validation of the files given asinput.For example the following validates a copy of the first revision of theXML1.0 specification:

xmllint --valid --noout test/valid/REC-xml-19980210.xml

the -- noout is used to disable output of the resulting tree.

The --dtdvalid dtdallows validation of the document(s)againsta given DTD.

Libxml2 exports an API to handle DTDs and validation, check the associateddescription.

Other resources

DTDs are as old as SGML. So there may be a number of examples on-line,Iwill just list one for now, others pointers welcome:

I suggest looking at the examples found under test/valid/dtd and any ofthelarge number of books available on XML. The dia example in test/validshouldbe both simple and complete enough to allow you to build your own.

Daniel Veillard