Lesson 1.4: TEI Document Structure & Key Elements
Introduction
A TEI document is composed of hierarchical components that provide both metadata and textual content, using clearly defined TEI elements. These elements serve as the fundamental building blocks of the TEI framework, enabling the encoding of textual features such as paragraphs, names, dates, structural divisions, and more.
TEI elements are enclosed within angle brackets (< >) and define the structure and meaning of textual content. This lesson provides a general overview of some fundamental TEI elements that will be discussed in more detail in subsequent modules. All of them are clickable links to the corresponding reference in the official TEI Guidelines. In addition to those references, the TEI by Example Project is an excellent recent for more detailed information beyond the scope of this lesson.
Core Components of a TEI Document
- The
<TEI>
Root Element: This encloses the entire document and declares it as TEI-encoded. - The
<teiHeader>
: Contains bibliographic and administrative metadata, including author information, publication details, and encoding descriptions. - The
<text>
Element: Holds the actual content, divided into sections such as:Encloses the actual content of the document.
Below is a basic example of a TEI document structure:
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Sample TEI Document</title>
<author>John Doe</author>
</titleStmt>
<publicationStmt>
<publisher>Open Humanities Press</publisher>
<date>2025</date>
</publicationStmt>
<sourceDesc>
<p>Based on the original manuscript housed in the British Library.</p>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div>
<head>Introduction</head>
<p>This is a paragraph inside a TEI-encoded text.</p>
</div>
</body>
</text>
</TEI>
This structure ensures that the document is well-organized, making it easier for researchers and software to process and analyze the encoded content.
The Importance of Metadata
The TEI by Example Project explains that the TEI header functions like an “identity card” of a literary work as part of a library catalogue record. The screenshot below shows an example of such a record in the catalogue of the University of Hawaiʻi at Mānoa Library:

- Discovery: Metadata enhances the searchability and discoverability of texts in digital archives and academic databases.
- Attribution: Ensures proper credit is given to authors, editors, and contributors.
- Contextualization: Provides important contextual information that helps readers understand the text’s origins, purpose, and historical background.
- Preservation: Metadata supports the long-term preservation of digital texts, ensuring that future scholars can interpret and access the work.
Breakdown of the TEI Header
The TEI Header is a crucial part of any TEI document as it provides important contextual metadata about the text, offering information that enhances seaches and scholarly interpretation. Below are the key sections within <teiHeader>
:
-
File Description:
<fileDesc>
Contains bibliographic information about the document and describes the file’s origin, including title, authorship, and publication details. This element is mandatory in a TEI header along with the following three sub-elements:
- Title Statement:
<titleStmt>
Lists the title of the document and identifies responsible parties such as authors and editors. The only mandatory sub-element here is<title>
, but it may also contain<author>
,<editor>
,<sponsor>
,<funder>
,<principal>
(name of the principal researcher responsible), or<respStmt>
(statement of responsibility). - Publication Statement:
<publicationStmt>
Provides publication details, including publisher, date, and access restrictions. At a minimum, these details can be dscribed in a<p>
element; however, the TEI Guidelines recommend to use at least<publisher>
,<distributor>
, or<authority>
). Other options are<pubPlace>
,<address>
,<idno>
(a standardized bibliographic identification code),<availability>
, or<date>
. - Source Description:
<sourceDesc>
Describes the original source text, including the provenance of original manuscripts or editions. The bibliographic description of the source text can be written in a paragraph (<p>
) or more formally using<bibl>
(loose),<biblStruct>
(structured), or<biblFull>
(exhaustive), or a list of bibliographic references as<listBibl>
.
Optional elements:
<editionStmt>
: Provides details about the edition of the electronic text, such as its version, editor, or revision history.<extent>
: Indicates the size or scope of the electronic text, typically by noting word count, number of files, or character length.<seriesStmt>
: Contains information about the publication series in which the electronic text appears, including series title and editors if applicable.<notesStmt>
: Gathers supplementary notes related to the electronic edition that are not captured elsewhere in the bibliographic metadata.
- Title Statement:
-
Encoding Description:
<encodingDesc>
Explains how the text was encoded, including editorial decisions, methods, standards, and tools used.
Example:
<encodingDesc> <p>This TEI encoding represents a diplomatic transcription of the original manuscript. Original spelling, punctuation, and capitalization have been preserved. Line breaks, page breaks, and marginal annotations have been encoded. Scribal deletions and additions are marked using <del> and <add> elements. No editorial normalization has been applied.</p> </encodingDesc>
Other than the paragraph element (
<p>
), it could also contain<editorialDecl>
(editorial practice declaration),<projectDesc>
(project description),<tagsDecl>
(tagging declaration),<refsDecl>
(reference system declaration), or<classDecl>
(classification declaration) as well as schema-dependant elements<metDecl>
(metrical notation declaration) for editing poetry,<variantEncoding>
(method used to encode text-critical variants), and others. -
Profile Description:
<profileDesc>
Provides contextual information about the text’s language, setting, and other descriptive metadata in elements such as
<creation>
(information about the creation of a text),<langUsage>
(the languages used in the text),<textClass>
(classification of the contents of the text), and schema-dependant elements such as<handNotes>
(identification of the different hands in a primary document). -
Revision Description:
<revisionDesc>
Tracks changes and updates made to the document over time, ensuring transparency in the editing process. Each revision is described in a
<change>
element, and encoders can formally identify the exact date of the change in a@when
attribute, and the person responsible for the change in a@who
attribute.
Example of a TEI Header
<teiHeader>
<fileDesc>
<titleStmt>
<title>Encoded Historical Letter</title>
<author>John Smith</author>
</titleStmt>
<publicationStmt>
<publisher>Historical Society Press</publisher>
<date>2024</date>
</publicationStmt>
<sourceDesc>
<p>Based on the original manuscript housed in the National Archives.</p>
</sourceDesc>
</fileDesc>
<encodingDesc>
<editorialDecl>
<p>Encoded following TEI P5 guidelines using Oxygen XML Editor.</p>
</editorialDecl>
</encodingDesc>
<profileDesc>
<langUsage>
<language ident="en">English</language>
</langUsage>
</profileDesc>
<revisionDesc>
<change who="#editor1" when="2025-02-20">Initial TEI encoding completed.</change>
</revisionDesc>
</teiHeader>
This header structure ensures that TEI documents contain vital metadata, facilitating proper organization, discovery, and analysis of texts in digital humanities projects.
Structural Markup
Structural markup defines the overall organization of a text, allowing encoders to indicate divisions, sections, and other layout features that reflect the text’s internal structure. These elements help maintain the logical and visual flow of a document, making it easier to navigate, analyze, and preserve in digital form.
<div>
: Used to group sections and subsections of a text, such as chapters or articles.<head>
: Represents a heading or section title, typically found at the start of a<div>
.<p>
: Represents paragraphs.<pb/>
: Page break, marks where a new page begins in the original document. This is useful for maintaining page-level fidelity in digital editions and can help with citation consistency across different formats.
Note: The / in<pb/>
signifies that this is a self-closing element, meaning it does not contain any text or nested elements. It is commonly used for elements that indicate milestones in a document, such as line breaks or page breaks, without requiring a closing tag.<br/>
: Line break, especially useful in poetry or transcription.<milestone>
: Marks non-hierarchical structural points (e.g., sermon numbers, canto divisions).
Example:
<body>
<div>
<head>Introduction</head>
<p>This is the opening section.</p>
</div>
</body>
Inline Elements
There are many elements available for encoding different kinds of texts, including those for poetry, drama, and manuscripts, which we will explore in more detail in our next lesson. Here are just a few examples:
<note>
: Used to encode editorial notes, annotations, or comments within the text.<choice>
: Represents editorial or authorial alternatives, such as original and modernized spellings.<persName>
: Identifies a person’s name.<placeName>
: Identifies a place.<date>
: Marks a date.<hi>
: Used for text styling (e.g., italics, bold).
Example:
<p>
<persName>William Shakespeare</persName> was born in <placeName>Stratford-upon-Avon</placeName> in <date>1564</date>.
</p>
Attributes in TEI
Attributes provide additional information about TEI elements and are always included within the opening tag. They modify the behavior or meaning of an element, making the encoding more precise.
- Common Attributes:
@xml:id
: A unique identifier for an element.@rend
: Specifies rendering information (e.g., italics, bold, underline).@type
: Categorizes the element’s content.@when
: Represents a date or timestamp.
Example:
<p xml:id="p1" rend="italic">This is an emphasized paragraph.</p>
TEI Syntax & Structure
Since TEI uses XML (Extensible Markup Language), it follows strict syntax rules for consistency, readability, and machine processing:
-
All Elements Must Be Properly Nested
Elements must not overlap and must close in reverse order.
Correct:
<div> <p>This is a paragraph.</p> </div>
Incorrect:
<div> <p>This is a paragraph.</div></p>
-
Every Element Must Be Closed
You must either use a closing tag or self-close it.
<p>This is a paragraph.</p> <!-- correct --> <lb/> <!-- correct for line break (self-closing) -->
-
Tags Are Case-Sensitive
TEI treats
<title>
and<Title>
as different. Stick to lowercase tags as defined in the TEI Guidelines. -
Attributes Go Inside Opening Tags
Attributes modify elements and go within the start tag.
<p xml:id="p1" rend="italic">This is a styled paragraph.</p>
-
Use the TEI Namespace Declaration
All TEI documents should begin with a namespace declaration in the root <TEI> element:
<TEI xmlns="http://www.tei-c.org/ns/1.0"> ... </TEI>
-
Attribute Values Must Be Quoted
Always wrap attribute values in double or single quotes.
<date when="1818">1818</date>
Works Cited
- Terras, Melissa, et al.. “Module 2: The TEI Header.” TEI by Example, 2020, teibyexample.org/exist/tutorials/TBED02v00.htm.
- TEI Consortium. TEI: Guidelines for Electronic Text Encoding and Interchange, 2025, tei-c.org/guidelines/p5/.