Lesson 1.1: Overview of TEI
Introduction
The Text Encoding Initiative (TEI) is an internationally recognized standard for encoding texts to make them more accessible for research, analysis, and digital preservation. It allows scholars to annotate, categorize, and analyze textual elements using a standardized markup language while maintaining a high degree of flexibility and customizability. With TEI, texts become machine-readable across platforms, making it a valuable tool for digital archives, critical editions, and text analysis.
TEI is widely adopted across disciplines, from literary studies and historical document preservation to linguistics and cultural heritage work. By encoding texts with TEI, researchers can document editorial decisions, represent complex textual variations, and enable advanced search and analysis.
As Greta Franzini, Melissa Terras, and Simon Mahony point out in “A Catalogue of Digital Editions,” today’s [digital] editions seek to embrace crowds — from both a reception and production standpoint — whose goal is to socialise, to exchange views, to produce community knowledge and to help users read, thus advancing research.
This lesson will explain the value of the TEI in all of these aspects.
A Little Bit of History
Founded in 1987, the TEI was established by a collective of academics and librarians to create a standardized approach to encoding texts across disciplines. In 1994, the first complete version of the TEI Guidelines, known as TEI P3, was published, providing detailed encoding rules for literary and historical texts. Today, the TEI Consortium, a nonprofit organization composed of academic institutions, research projects, and individual scholars from around the world, continues to update and maintain the Guidelines, reflecting the evolving needs of the digital humanities community.
From encoding medieval manuscripts to mapping social networks in contemporary literature, the TEI has grown alongside digital humanities, adapting to the changing nature of texts and technologies. Its use has expanded from individual scholarly editions to collaborative archives, digital pedagogy, and born-digital works. Throughout this evolution, TEI has remained grounded in its original commitment to make textual encoding accessible, flexible, and intellectually rigorous
In their introduction to the book Digital Scholarly Editing: Theories and Practices, Matthew James Driscoll and Elena Pierazzo explain that TEI has made it possible to develop an international, trans-disciplinary community that is interested in digital editing,
while also helping to highlight areas where standardisation is yet to be found — or there are too many competing standards.
This dual function — community-building and critical self-reflection — continues to define the TEI’s role in shaping digital humanities practices.
The Role of TEI in Digital Scholarship
TEI plays a crucial role in the digital humanities by providing a common framework for encoding texts in a way that ensures:
-
Standardization and Interoperability:
- Standardized markup provides a consistent way to tag textual elements like paragraphs, headings, names, dates, and even complex textual structures such as footnotes, marginalia, and variations between editions.
- Documents can be shared, exchanged, and used across different platforms, tools, and research projects worldwide without losing their meaning or structure.
- Example: A TEI-encoded manuscript from one archive can be analyzed alongside another from a different institution without reformatting.
-
Long-Term Preservation:
- TEI-encoded texts remain accessible even as technology changes and digital formats evolve, unlike proprietary formats that may become obsolete.
- Example: A digital edition of a 17th-century diary encoded in TEI can be revisited and reprocessed decades later using modern tools, without needing to convert the file into a new format.
-
Integration with Digital Tools:
- TEI works with various digital tools for text visualization, analysis, and searchability (e.g., XML databases, TEI publishers, and digital editions).
- Example: Scholars can use a TEI-encoded text in a text-mining tool to analyze word frequencies across multiple documents.
-
Multi-Purpose Application:
- The same TEI-encoded text can be used for different outputs:
- Displayed as a web-based digital edition.
- Processed for linguistic or historical analysis.
- Converted into print or e-book formats.
- Example: A TEI-encoded edition of a historical speech can be displayed online for public engagement, formatted as an academic publication for classroom use, or mined for linguistic features such as rhetorical patterns or lexical diversity.
- The same TEI-encoded text can be used for different outputs:
-
Scholarly Transparency:
- Encoding decisions are documented, allowing other researchers to understand how a text was digitized.
- Example: In a collaborative edition of letters from an archival collection, contributors can be clearly acknowledged for their roles in transcribing, editing, or reviewing the text, and editorial decisions—such as changes, omissions, or interpretations—can be transparently documented for future readers.
-
Enhanced Accessibility:
- TEI enables markup that supports visually impaired readers, multilingual content, and cultural heritage preservation.
- TEI-encoded texts are machine-readable, making them searchable and analyzable in ways that plain text is not.
- Example: An oral narrative can be enoded with marked pauses, tone changes, or speaker gestures, helping to make the performance aspects of the text accessible to users with visual or hearing impairments through descriptive annotations.
Features of TEI
TEI enables users to capture detailed information about texts, making them searchable across digital platforms. Key features include:
- Complex Metadata & Bibliographic Information:
Enhancing the research potential of encoded documents by encoding details like author, title, publication date, source information, and edition history. - Structural Elements:
Marking headings, paragraphs, divisions, sections, lists, tables, footnotes, and other textual structures. - Linguistic Features:
Recording variants, corrections, alternative spellings, and linguistic analysis (e.g., morphology, syntax, phonetics). - Editorial Notes & Interpretations:
Supporting annotations, scholarly commentary, critical editions, and editorial decisions. - Textual Variations & Historical Transformations:
Tracking changes over time, including different editions, revisions, and manuscript variations. - Physical & Material Features:
Encoding manuscript descriptions, page layouts, marginalia, scribal hands, and annotations. - Named Entities & References:
Identifying people, places, dates, events, citations, and references to external sources.
Example: Identifying all mentions of a historical figure or location across texts. - Multilingual & Non-Latin Script Support:
Accommodating different languages and scripts, including right-to-left writing systems, transliterations, and glosses. - Speech & Performance Notation:
Capturing dialogue, stage directions, oral storytelling structures, and performance elements. - Linking & Cross-Referencing:
Allowing hypertext links within texts or between different documents, enabling cross-references and connections to external resources. - Semantic Markup for Meaning & Interpretation:
Enabling tagging for interpretive elements, such as themes, rhetorical devices, discourse structures, and conceptual relationships. - Customization & Adaptability:
TEI is flexible and extensible, meaning users can adapt it to their specific needs while maintaining standard structures.
Who Uses TEI?
- Humanities Scholars: To encode and analyze historical and literary texts.
- Libraries & Archives: For digital preservation and text digitization.
- Linguists: For corpus analysis and language documentation.
- Educators & Researchers: To teach and explore textual studies.
Real-World TEI Projects
Here are some examples of TEI in action:
- Perseus Digital Library – A vast collection of ancient texts, encoded in TEI, providing structured search and analysis.
- Digital Thoreau – A TEI-based project that presents annotated versions of Henry David Thoreau’s works.
- Parker Library on the Web – A digital archive of rare medieval manuscripts, encoded using TEI for accessibility and searchability.
- Women Writers Project – A TEI-driven digital collection of early modern women’s writing, making these texts more accessible for research.
- The Bodleian First Folio – A TEI-encoded diplomatic edition of the Bodleian copy of the First Folio of Shakespeare’s plays, making them accessible and searchable alongside their digital facsimiles.
- The William Blake Archive – A comprehensive digital archive that uses TEI to encode Blake’s poetry, prose, and visual art, supporting complex textual and editorial relationships.
- EpiDoc – A TEI-based encoding standard for epigraphic documents, including inscriptions, papyri, and ancient graffiti.
- Lexicon of Greek Personal Names – A historical linguistics project using TEI to encode biographical and geographical data from classical antiquity.
These projects demonstrate how TEI enhances the usability, searchability, and preservation of texts across disciplines.
Works Cited
- Driscoll, Matthew James, and Elena Pierazzo. “Introduction: Old Wine in New Bottles?” Digital Scholarly Editing: Theories and Practices, edited by Matthew James Driscoll and Elena Pierazzo, Open Book Publishers, 2016, https://doi.org/10.11647/OBP.0095.
- Franzini, Greta, et al. “A Catalogue of Digital Editions.” Digital Scholarly Editing: Theories and Practices, edited by Matthew James Driscoll and Elena Pierazzo, Open Book Publishers, 2016, https://doi.org/10.11647/OBP.0095.
Suggested Readings
- Burnard, Lou. What Is the Text Encoding Initiative? OpenEdition Press, 2014, https://doi.org/10.4000/books.oep.426.
- Burnard, Lou, and Michael Sperberg-McQueen. TEI Lite: Encoding for Interchange: an introduction to the TEI TEI Consortium, 2012, tei-c.org/release/doc/tei-p5-exemplars/html/tei_lite.doc.html.
- Driscoll, Matthew James et al., editors. Digital Scholarly Editing: Theories and Practices, Open Book Publishers, 2016, https://doi.org/10.11647/OBP.0095.
- Gabler, Hans Walter. “Theorizing the Digital Scholarly Edition” Text Genetics in Literary Modernism and Other Essays, edited by Hans Walter Gabler, Open Book Publishers, 2018, https://doi.org/10.11647/OBP.0120.06.
- Hockey, Susan M. Electronic Texts in the Humanities: Principles and Practice, Oxford UP, 2000.
- Pierazzo, Elena. “Textual Scholarship and Text Encoding.” A New Companion to Digital Humanities, edited by Susan Schreibman et al., Wiley, 2015, onlinelibrary.wiley.com/doi/10.1002/9781118680605.ch21.
- Renear, Allen H. “Text Encoding.” A Companion to Digital Humanities, edited by Susan Schreibman et al., Blackwell, 2004, companions.digitalhumanities.org/DH/?chapter=content/9781405103213_chapter_17.html.
- Smith, Martha Nell. “Electronic Scholarly Editing.” A Companion to Digital Humanities, edited by Susan Schreibman et al., Blackwell, 2004, companions.digitalhumanities.org/DH/?chapter=content/9781405103213_chapter_22.html.
- TEI Consortium. TEI: Guidelines for Electronic Text Encoding and Interchange, TEI Consortium, 2025, tei-c.org/guidelines/p5/.