Lesson 1.5: TEI for Different Genres
Introduction
Depending on the genre, specific TEI elements help structure and annotate texts in meaningful ways. This lesson explores how TEI can be adapted to prose, poetry, drama, manuscripts, and historical documents, with examples illustrating best practices for encoding each genre.
Adapting TEI to Genre-Specific Structures
- Textual Structure: Different genres have unique structural conventions that influence how texts are encoded. For example, a poem’s line breaks are essential to its form, while a letter’s metadata about sender and recipient is equally critical.
- Scholarly Focus: The encoding priorities for a historical document might center on authenticity and preservation, while for a novel, focus may shift toward narrative structure and character interactions.
- Customizing TEI Elements: While TEI provides general guidelines, many projects require genre-specific adaptations or the use of TEI customization modules to fit unique needs.
1. TEI for Prose
Prose texts, such as novels and essays, primarily use structural and inline markup to define sections, paragraphs, and specific elements like names, dates, and places.
Key Elements:
<div>
: Divides the text into chapters, sections, or parts. The@type
attribute can be added to specify the division level, and the@n
attribute can be used for numbering.<p>
: Represents individual paragraphs. Note that they cannot follow division elements or occur in between divisions, but must be wrapped within them.<head>
: Marks section or chapter headings. They can nest hierarchically. Semantic information can be added in a@type
attribute and/or a@subtype
attribute.<list>
and<item>
: Represents lists and the list items. The formatting can be specified with a@rend
attribute (e.g. numbered, lettered, bulleted, unmarked, or inline).<persName>
,<placeName>
,<date>
: Identifies people, places, and dates.<q>
: Marks quoted speech within a paragraph. The@type
attribute can add further details such as ";spoken" or "thought."<said>
: Indicates passages thoughts or spoken words more explicitly, allowing for attributes such as@who
,@aloud
, and@direct
.<quote>
: Marks a quotation from some agency external to the text.<cit>
and<bibl>
: Represents a citation and bibliographic reference.<ref>
: Creates hyperlinks or references to other parts of the text or external resources.
Challenges:
Managing narrative shifts and embedded texts — such as letters within novels — can be challenging in TEI encoding because these elements often require distinct structural tagging while maintaining the coherence of the primary narrative.
- Embedded Texts: A letter within a novel might require
<div type="letter">
,<opener>
,<closer>
, and<signed>
elements while still being housed within the novel’s<text>
structure. - Narrative Perspective Changes: Shifts between first-person and third-person narration may require
<seg>
or<quote>
elements to distinguish perspectives. - Footnotes & Editorial Context: If the embedded text is introduced via a footnote or an editorial note,
<note>
or<ref>
elements may be necessary.
Cultural Considerations:
Cultural context shapes not only the interpretation of content but also the encoding choices themselves. Narrative conventions, rhetorical strategies, and even what is considered a “section” or “character” may differ across cultures. Encoders must be attentive to non-Western storytelling forms, oral traditions, or community-based knowledge systems that resist linear or individual-centered structures. Using elements like <div>
, <persName>
, or <note>
should involve reflection on whose voices are centered, how authority is represented, and whether the markup framework reinforces or challenges dominant cultural assumptions.
Example: Encoding a paragraph from a novel
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<text>
<body>
<head type="mainTitle">A Portrait of the Artist as a Young Man</head>
<head
<bibl>
<author>
by <persName>James Joyce</persName>
<author>
</bibl>
</head>
<div type="section" n="1">
<head>Chapter I</head>
<p>
Once upon a time and a very good time it was there was a moocow coming down along the road and this moocow that was coming down along the road met a nicens little boy named <persName>baby tuckoo</persName>....
</p>
</div>
</body>
</text>
</TEI>
2. TEI for Poetry
Poetry often requires line-based encoding to preserve the original formatting and structure.
Key Elements:
<lg>
: Represents a group of lines, such as a stanza. The@type
attribute can be added for more details, such as type="stanza." or "quatrain." The@met
attribute can declare the metrical structure.<l>
: Marks individual lines of verse. Enjambments can be marked with the@enjamb
attribute with values such as "yes" or "no" or even "weak" or "strong." If the line deviates from the metrical structure, we can add the@real
attribute (the actual realization of the metrical structure).<lb/>
: Indicates a line break. Note that this is a self-closing empty element.<metDecl>
: Describes meter or rhythm.<rhyme>
: Indicates rhyme patterns.<caesura/>
: Marks a pause within a line of poetry (self-closing).<seg>
: Highlights a portion of text that needs special interpretation.
Challenges:
- Preserving line breaks and indentations: XML, by default, does not inherently preserve whitespace formatting in a way that reflects poetic or manuscript structures.
- XML Whitespace Handling: Many XML parsers ignore or collapse whitespace, making it difficult to maintain formatting. Encoders can use
<space unit="char" quantity="X"/>
for precise spacing needs. - Poetic Structure: Line breaks in poetry carry meaning, and indentation may signal shifts in tone, emphasis, or speaker.
- Manuscript Representation: Historical documents may contain unique spacing conventions that are critical for accurate transcription.
- Metrical patterns or rhyme schemes: Different poetic traditions use varied structures, including specific metrical feet, caesurae, and rhyme schemes that must be represented accurately in a digital format. TEI provides elements like
<metDecl>
to declare metrical structure and<rhyme>
to indicate rhyme patterns, but encoding these features requires careful analysis of the poem’s structure and conventions. Additionally, variations in historical or regional poetic traditions might necessitate custom encoding approaches to preserve their nuances.
Cultural Considerations:
When encoding Indigenous poetry or chants, consider how oral performance affects the textual structure. Use <note>
or <desc>
to provide cultural context. Ensure that textual variants respect the original oral structure, intonation patterns, and cultural significance. Variations may reflect regional differences, evolving storytelling practices, or performance-based adaptations. For example, when encoding Native Hawaiian chants (oli), it is important to recognize distinctions between formal recitations and informal adaptations while maintaining cultural integrity.
Example: Encoding Metrical Patterns and Rhyme Schemes
<text>
<body>
<div type="poem" rhyme="ababcdcdefefgg">
<head>Sonnet 141</head>
<lg type="quatrain" met="pentameter">
<l>In faith, I do not love thee with mine <rhyme label="a">eyes</rhyme>,</l>
<l>For they in thee a thousand errors <rhyme label="b">note</rhyme>;</l>
<l>But 'tis my heart that loves what they <rhyme label="a">despise</rhyme>,</l>
<l>Who, in despite of view, is pleased to <rhyme label="b">dote</rhyme>;</l>
</lg>
<lg type="quatrain" met="pentameter">
<l real="-+-+-+-+--"><seg type="anaphora">Nor</seg> are mine ears with thy tongue's tune <rhyme label="c">delighted</rhyme>,</l>
<l><seg type="anaphora">Nor</seg> tender feeling, to base touches <rhyme label="d">prone</rhyme>,</l>
<l real="-+-+-+-+--" enjamb="yes"><seg type="anaphora">Nor</seg> taste, nor smell, desire to be <rhyme label="c">invited</rhyme></l>
<l>To any sensual feast with thee <rhyme label="d">alone</rhyme>:</l>
</lg>
<lg type="quatrain" met="pentameter">
<seg type="volta">
<l enjamb="yes">But my five wits nor my five senses <rhyme label="e">can</rhyme></l>
<l>Dissuade one foolish heart from serving <rhyme label="f">thee</rhyme>,</l>
<l>Who leaves unswayed the likeness of a <rhyme label="e">man</rhyme>,</l>
<l>Thy proud heart's slave and vassal wretch to <rhyme label="f">be</rhyme>.</l>
</seg>
</lg>
<lg type="couplet" met="pentameter">
<l real="+-+-+-+-+f">Only my plague thus far I count my <rhyme label="g">gain</rhyme>,</l>
<l>That she that makes me sin awards me <rhyme label="g">pain</rhyme>.</l>
</lg>
</div>
</body>
</text>
3. TEI for Performance Texts
Written representations of performance texts such as plays need structured markup for features such as speakers, stage directions, and acts/scenes.
Key Elements:
<div>
: Defines divisions that can be specified with the@type
attribute, such as "act" or "scene."<sp>
: Contains a speech unit. The speaking character can be identified with the@who
attribute.<speaker>
: Identifies the speaker of dialogue. Note that this is used to encode only the identifier (often a name), while the actual text of a speech line should be encoded as a paragraph (<p>
) or verse line<l>
(possibly grouped in<lg>
), or<ab>
(anonymous block) in case the format in unclear.<stage>
: Represents stage directions (e.g. setting, aspects of the acting, technical circumstances, locations, music, etc.). The@type
attribute can be used to add details.<move/>
: Indicates movements of characters. It can be used for movement descriptions in stage directions or to document movement of characters when it is absent from the stage directions. This is an empty (self-closing) element with attributes such as@who
,@type
, or@where
conveying the information.Example:
<stage xmlns="http://www.tei-c.org/ns/1.0" type="entrance"> <move who="#vladimir" type="entrance" where="left"/> (Vladimir enters slowly from stage left, removes his hat, and looks around uncertainly) </stage>
<sound>
: Describes sound descriptions in stage directions. Further information can be conveyed with attributes such as@type
or@discrete
(specifies whether the sound overlaps with the surrounding speeches: "false" or "true")<view>
: Represents descriptions of the contents of a screen.<camera>
: Marks camera directions (e.g.<camera type="angle">close up</camera>
).<castList>
: Provides a list of characters in the play.<castItem>
: Represents an individual character within a cast list. It can contain<role>
(name of the dramatic role),<roleDesc>
(role of that character), or<actor>
(name of the actor performing the role).<set>
: Marks a general description of the setting in the front matter (not in stage directions). It should be wrapped in paragraphs (<p>
) or lines (<l>
).
Challenges:
- Non-verbal actions (e.g. gestures, expressions): This can be challenging because they are often implied rather than explicitly stated in text. Distinguishing between authored stage directions and editorial additions is important for scholarly transparency. If a stage direction is added by the editor, the
@resp
(responsibility) and/or@source
attributes can be used in<stage>
or<move>
to indicate editorial input. The addition can also be explained further in a<note>
element. - Interpolated text like asides or off-stage speech: This can be challenging because these textual elements exist outside the main narrative or dialogue flow. TEI provides
<stage>
for off-stage speech and<seg type="aside">
; for asides, but deciding how to encode them while preserving their dramatic and interpretive significance requires careful attention. The challenge is ensuring that these elements remain clearly distinguishable from the main text while maintaining readability and usability in digital editions.
Cultural Considerations:
In ceremonial texts or ritual dramas, consider community permissions and cultural protocols when encoding sensitive performances. Some performances may have restricted access or require specific contextual annotations to preserve their intended meaning. When encoding elements such as <stage>
or <performance>
, ensure that descriptions reflect the cultural significance of movements, gestures, and oral traditions.
Example: Encoding a Play Scene
<text>
<body>
<div type="act">
<head>Act 1</head>
<div type="scene">
<head>Scene 1</head>
<sp>
<speaker>Hamlet</speaker>
<p>To be, or not to be: that is the question:</p>
</sp>
<sp>
<speaker>Ophelia</speaker>
<p>My lord, I do not know what to think.</p>
</sp>
<stage>[She exits]</stage>
</div>
</div>
</body>
</text>
4. TEI for Primary Sources
Encoding primary sources such as manuscripts, handwritten letters, and other non-published texts allows for the detailed representation of textual features such as revisions, annotations, marginalia, and variations in handwriting or layout — preserving the material and editorial complexities of the original document. Note that it is important to state responsibility for editorial interpretations in the @resp
attribute, for example when the editor makes a call in identifying a specific scribal hand.
Key Elements:
<handNote>
: Identifies different a particular handwriting style or hand in the<profileDesc>
part of the TEI header. It may contain specific attributes such as@scribe
(name for the scribe),@script
(writing style or font),@medium
(type of ink), or@scope
(dominance of this hand in the document).<add>
: Marks added text. The global@rend
attribute can specify actual rendition information, and the@place
attribute can decribe where the addition occured. Editors can also add the@hand
attribute, which references a hand description elsewhere in the document.<del>
: Marks deleted text. Information like "strikethrough" or "overwritten" can be conveyed with the@rend
attribute.<subst>
: Represents substitutions (as groups of deletions and additions).<restore>
: Restores an original reading (after initial rejection). It can be wrapped around prior deletions, canceling an editorial or authorial marking or instruction.<delSpan>
and<addSpan>
: Indicates the beginning of a longer deletion or addition. Note that these are empty (self-closing) elements using the@spanTo
attribute to define the scope of the addition or deletion.<damage>
: Signals material damage to the source text. The damage can be describein a variety of attributes such as:@type
: a characterisation of the type of the damage@hand
: the hand that caused the damage@agent
: the cause of the damage@extent
: the amount of damaged text@quantity
: the length of the damage in a specific unit (specified with the@unit
attribute)@unit
: the unit in which the length of the damage is expressed (with the@quantity
attribute)
<unclear>
: Flags text that is difficult to read or ambiguous. The@reason
attribute can be used for further explanation. This element can be wrapped within the<damage>
element or occur individually.<gap>
: Marks a missing portion of text due to damage or omission. This is an empty element that may contain the@reason
,@agent
,@extent
attributes.<supplied>
: Indicates an editorial addition for missing text.<corr>
: Represents editorial corrections.<sic>
: Indicates apparent errors as interpreted by the editor (while transcribing the source text as accurately as possible).<choice>
: Encodes alternative readings or editorial decisions. With this element, both the original text and an editor’s correction can be included.
Challenges:
- Complex Layouts: Many manuscripts feature layered content such as marginal notes, footnotes, and interlinear glosses. Accurately representing these overlapping textual elements requires thoughtful use of TEI’s structural and spatial markup tools.
- Textual Variants and Corrections: Manuscripts frequently contain deletions, additions, and rewrites by different hands. TEI provides
<del>
,<add>
, and<subst>
to track these changes, but interpreting them accurately can be difficult. - Multiple Hands and Scripts: Some manuscripts are written by multiple scribes, requiring
<handNote>
and<handShift>
to differentiate handwriting styles. - Material and Physical Features: Manuscripts often include seals, illuminations, and unique bindings. Encoding these requires descriptive elements such as
<seal>
,<figure>
, or<surface>
to capture their visual and material aspects. - Nonlinear Text Flow: Some manuscripts, especially medieval texts, may have texts arranged in unusual patterns (e.g. circular writing, marginal glosses responding to main text). TEI’s
<zone>
and<surface>
can help represent these features but require detailed markup.
Cultural Considerations:
Encoding primary sources requires sensitivity to the cultural, historical, and ethical contexts in which those texts were created and preserved. Decisions about what to include, how to structure content, and which features to highlight can shape how a document is interpreted and who is centered or marginalized in the process. This is especially important when working with materials from colonized, Indigenous, or otherwise historically underrepresented communities. Careful attention to provenance, community consultation, and cultural protocols ensures that encoding is not only technically accurate but also respectful and responsible.
Example: Encoding a Manuscript with Corrections
<text>
<body>
<p>
This is an <del rend="strikethrough">incorrect</del> <add place="above">corrected</add> sentence.
</p>
</body>
</text>
5. TEI for Historical Documents
Historical documents, such as letters and treaties, require metadata encoding to preserve context.
Key Elements:
<correspDesc>
: Describes correspondence metadata, including sender, recipient, and date.<docDate>
: Captures the document’s date.<docTitle>
: Titles the document.<listPerson>
: Lists people mentioned.<listPlace>
: Lists places referenced in the document.<opener>
and<closer>
: Marks formal openings and closings in letters.<signed>
: Indicates who signed the letter.<event>
: Encodes historical events described in the text.<witness>
: Identifies sources or witnesses associated with the document.<origDate>
: Specifies the original date of a document when different from its publication date.<seal>
: Represents official seals or stamps found on historical documents.<q>
: Marks direct quotations within historical texts.
Challenges:
- Marginalia, crossed text, or non-linear writing (e.g. writing around the margins): These features are challenging because they disrupt the standard left-to-right, top-to-bottom reading flow. In historical and manuscript texts, marginalia may appear in various locations and interact dynamically with the main text. TEI provides elements like
<note place="margin">
for marginalia and<add>
or<del>
for textual changes, but encoding their precise placement and relationship to the main text requires careful structuring and metadata annotation. - Physical attributes like stamps or seals: This can be challenging because these elements exist outside the primary textual content but are still crucial for historical authenticity. TEI provides
<seal>
for encoding official stamps and<figure>
for capturing images of seals, but encoding their placement, size, and relationship to the text requires careful metadata annotation. Additionally, preserving their visual representation in digital editions may necessitate linking to high-quality images or transcribing accompanying inscriptions.
Cultural Considerations:
Respect data sovereignty when encoding sacred texts or genealogical records. Some documents may contain sensitive cultural or familial information that requires restricted access. Use TEI’s <availability>
tag to limit access where necessary, and <desc>
or <note>
elements to provide context about cultural or legal restrictions on the material.
Example: Encoding a Letter
<text>
<body>
<div type="letter">
<docTitle>Letter from John Adams to Thomas Jefferson</docTitle>
<docDate>July 4, 1813</docDate>
<p>Dear <persName>Thomas Jefferson</persName>,</p>
<p>I write to you on this momentous occasion...</p>
<p>Sincerely,<br/>
<persName>John Adams</persName></p>
</div>
</body>
</text>
6. Textual Variants and Annotations
Encoding textual variants allows scholars to represent different versions of a text, preserving the editorial history and interpretive possibilities:
- Scholarly Insight: Annotations provide interpretive insights and highlight editorial decisions.
- Textual Analysis: Encoding variants reveals the evolution of a text over time and across editions.
- Preservation of Textual Integrity: Maintaining a record of variants ensures that the original context of the text is preserved.
Key Elements:
<app>
: Apparatus for textual variants, grouping alternative readings together.<lem>
: Represents the lemma, or the base text.<rdg>
: Represents different readings or variations from the lemma.<note>
: Adds commentary or explanatory information about specific parts of the text.
Challenges:
- Identifying Variant Significance: Not all textual differences are equally important. Deciding what constitutes a meaningful variant requires scholarly judgment.
- Representing Complex Variations: Some texts have multiple layers of changes (e.g. marginal annotations, inserted text, editorial emendations), which must be encoded using
<app>
,<lem>
, and<rdg>
while maintaining readability. - Handling Overlapping Annotations: In some cases, different versions of a text or multiple commentaries overlap, making it difficult to encode variations in a linear XML structure.
- Cultural and Ethical Considerations: When dealing with oral traditions or Indigenous narratives, textual variations may reflect cultural performance rather than error or correction. Encoding must respect these traditions.
- Maintaining Readability in Digital Editions: Encoding textual variants can make a document difficult to read. Decisions must be made on how to present variants effectively, such as through footnotes, pop-ups, or side-by-side comparisons.
Cultural Considerations:
When working with oral traditions or Indigenous narratives, ensure that textual variants respect the original oral structure, intonation patterns, and cultural significance. Variations may reflect regional differences, evolving storytelling practices, or performance-based adaptations. For example, when encoding oli (Native Hawaiian chants), it is important to recognize distinctions between formal recitations and informal adaptations while maintaining cultural integrity. Additionally, collaborate with community stakeholders to ensure respectful and accurate representation of Indigenous texts.
Example: Encoding Variants
<text>
<body>
<p>The quick brown fox
<app>
<lem>jumps</lem>
<rdg>leaps</rdg>
<rdg>hops</rdg>
</app> over the lazy dog.
</p>
<note>This phrase has multiple variants found in different manuscripts.</note>
</body>
</text>
General Cultural Considerations:
- Community Consultation and Permissions:
When working with Indigenous texts, oral histories, or sacred documents, it is crucial to obtain permission from the relevant community before encoding and publishing them. Some texts may have restrictions on access or require community approval for dissemination, and encoding should follow community-defined protocols. Ideally, this work is done collaboratively with cultural practitioners and reflects locally defined knowledge-sharing practices. - Representation of Marginalized Voices:
Historical biases in texts, such as colonial narratives, should be acknowledged in encoding decisions. Adding<note>
or<desc>
elements can help contextualize how a text was created and whose perspectives are included or omitted. - Data Privacy and Sensitive Information:
Some historical documents (e.g. personal letters, medical records, or legal documents) contain private or sensitive information. Consider using the<availability>
tag to restrict access or redacting personal details responsibly. - Encoding Non-Western or Non-Linear Writing Systems:
Many non-Western traditions have unique textual structures that do not fit into linear left-to-right encoding models. TEI’s<surface>
and<zone>
elements can help represent complex text layouts, but care should be taken to preserve cultural authenticity rather than force non-Western texts into Western formats. - Transliteration and Linguistic Integrity:
When encoding multilingual or translated texts, it is important to distinguish original words from editorial additions. Using<foreign>
,<gloss>
, and<choice>
ensures that readers understand which elements are part of the original text versus modern editorial input. - Bias in Editorial Interventions:
Decisions about what to encode and how to structure it inherently shape how the text is interpreted. Editorial transparency (using<note>
or<editor>
to indicate changes) is important for scholarly integrity.
- Burnard, Lou. What is the Text Encoding Initiative? How to Add Intelligent Markup to Digital Resources. Open Book Publishers, 2014.
- Pierazzo, Elena. "A Rationale of Digital Documentary Editions." Literary and Linguistic Computing, vol. 30, no. 4, 2015, pp. 463-482.
- Ciula, Arianna, and Øyvind Eide. "Reflections on Cultural Heritage and Digital Humanities: Modelling in Practice and Theory." Digital Scholarship in the Humanities, vol. 35, no. 2, 2020, pp. 367-385.
- TEI Consortium. *TEI P5: Guide*lines for Electronic Text Encoding and Interchange. Version 4.5.0, 2022, www.tei-c.org/Guidelines/P5/.
- Walsh, John A. "The Text Encoding Initiative (TEI) and the Study of Literature." *Digita*l Humanities Quarterly, vol. 3, no. 3, 2009, www.digitalhumanities.org/dhq/vol/3/3/000052/000052.html.
- "TEI By Example." Text Encoding Initiative, tbe.kantl.be/.
Work Cited
Suggested Readings
- Cummings, James. “The Text Encoding Initiative and the Study of Literature.” A Companion to Digital Literary Studies, edited by Susan Schreibman and Ray Siemens, Wiley, 2013, https://doi.org/10.1002/9781405177504.ch25.
- Terras, Melissa, et al.. TEI by Example, 2020, teibyexample.org/exist/tutorials/TBED02v00.htm.