Table of Content: - General overview
- The definition
- Simple rules
- How to reference a DTD from a document
- Declaring elements
- Declaring attributes
- Some examples
- How to validate
- Other resources
Well what is validation and what is a DTD ? DTD is the acronym for Document Type Definition. This is a
descriptionofthe content for a family of XML files. This is part of the
XML1.0specification, and allows one to describe and verify that a
givendocumentinstance conforms to the set of rules detailing its structure
andcontent. Validation is the process of checking a document against a
DTD(moregenerally against a set of construction rules). The validation process and building DTDs are the two most difficultpartsof
the XML life cycle. Briefly a DTD defines all the possible elementsto befound
within your document, what is the formal shape of your documenttree(by
defining the allowed content of an element; either text, aregularexpression
for the allowed list of children, or mixed content i.e.both textand
children). The DTD also defines the valid attributes for allelements andthe
types of those attributes. The W3C XML Recommendation(Tim Bray's annotated
versionofRev1): (unfortunately) all this is inherited from the SGML world, the
syntaxisancient... Writing DTDs can be done in many ways. The rules to build them if
youneedsomething permanent or something which can evolve over time can
beradicallydifferent. Really complex DTDs like DocBook ones are flexible
butquiteharder to design. I will just focus on DTDs for a formats with a
fixedsimplestructure. It is just a set of basic rules, and definitely
notexhaustive norusable for complex DTD design. Assuming the top element of the document is spec and the
dtdisplaced in the file mydtd in the
subdirectorydtds ofthe directory from where the document were
loaded: <!DOCTYPE spec SYSTEM "dtds/mydtd">
Notes: - The system string is actually an URI-Reference (as defined in RFC 2396) so you can
useafull URL string indicating the location of your DTD on the Web. This
isareally good thing to do if you want others to validate
yourdocument.
- It is also possible to associate a
PUBLIC identifier(amagic
string) so that the DTD is looked up in catalogs on the clientsidewithout
having to locate it on the web.
- A DTD contains a set of element and attribute declarations,
buttheydon't define what the root of the document should be. This
isexplicitlytold to the parser/validator as the first element
ofthe
DOCTYPE declaration.
The following declares an element spec : <!ELEMENT spec (front, body, back?)>
It also expresses that the spec element contains onefront ,one
body and one optionalback children elements inthis
order. The declaration of oneelement of the structure and its contentare done
in a single declaration.Similarly the following
declaresdiv1 elements: <!ELEMENT div1 (head, (p | list | note)*, div2?)>
which means div1 contains one head then a series
ofoptionalp , list s and note s and
thenanoptional div2 . And last but not least an element
cancontaintext: <!ELEMENT b (#PCDATA)>
b contains text or being of mixed content (text and
elementsinno particular order):
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
p can contain text or
a ,ul ,b , i or
em elements inno particularorder.
Again the attributes declaration includes their content definition: <!ATTLIST termdef name CDATA #IMPLIED>
means that the element termdef can have
aname attribute containing text (CDATA ) and which
isoptional(#IMPLIED ). The attribute value can also be
definedwithin aset: <!ATTLIST list
type(bullets|ordered|glossary)"ordered">
means list element have a type attribute
with3allowed values "bullets", "ordered" or "glossary" and which
defaultto"ordered" if the attribute is not explicitly specified. The content type of an attribute can be
text(CDATA ),anchor/reference/references(ID /IDREF /IDREFS ),entity(ies)(ENTITY /ENTITIES )
orname(s)(NMTOKEN /NMTOKENS ). The following
definesthat achapter element can have an
optionalid attributeof type ID , usable for reference
fromattribute of typeIDREF: <!ATTLIST chapter id ID #IMPLIED>
The last value of an attribute definition can
be#REQUIRED meaning that the attribute has to be
given,#IMPLIED meaning that it is optional, or the default
value(possibly prefixed by#FIXED if it is the only allowed). Notes: The directory test/valid/dtds/ in the
libxml2distributioncontains some complex DTD examples. The example in
thefiletest/valid/dia.xml shows an XML file where the simple
DTDisdirectly included within the document. The simplest way is to use the xmllint program included with
libxml.The--valid option turns-on validation of the files given
asinput.For example the following validates a copy of the first revision of
theXML1.0 specification: xmllint --valid --noout test/valid/REC-xml-19980210.xml
the -- noout is used to disable output of the resulting tree. The --dtdvalid dtd allows validation of the
document(s)againsta given DTD. Libxml2 exports an API to handle DTDs and validation, check the associateddescription. DTDs are as old as SGML. So there may be a number of examples
on-line,Iwill just list one for now, others pointers welcome: I suggest looking at the examples found under test/valid/dtd and any
ofthelarge number of books available on XML. The dia example in
test/validshouldbe both simple and complete enough to allow you to build your
own. Daniel Veillard |