Monday, April 03, 2006

WordHoard: Software for Corpus Linguistic Analysis

From Academic Technologies and the Library at Northwestern University is the new corpus linguistics analysis software WordHoard. While currently limited to the Early Greek epics (in the original and in translation), Chaucer, Shakespeare, and Spenser's Faerie Queene, WordHoard texts are tagged by by morphological, lexical, prosodic, and narratological criteria.

From What is WordHoard?:
The WordHoard project is named after an Old English phrase for the verbal treasure 'unlocked' by a wise speaker. It applies to highly canonical literary texts the insights and techniques of corpus linguistics, that is to say, the empirical and computer-assisted study of large bodies of written texts or transcribed speech. In the WordHoard environment, such texts are annotated or tagged by morphological, lexical, prosodic, and narratological criteria. They are mediated through a 'digital page' or user interface that lets scholarly but non-technical users explore the greatly increased query potential of textual data kept in such a form.

It is a basic assumption of WordHoard that new kinds of historical, literary, or broadly cultural analysis will be supported through the forms of data access that are made possible when literary texts are treated in the manner of linguistic corpora. Deeply tagged corpora of course support more finely grained inquiries at a verbal or stylistic level. But more importantly, access to the words of a text at such microscopic levels also lets you look in new ways at the imaginative worlds created by those words.

In its current release WordHoard contains the entire canon of Early Greek epic in the original and in translation, as well as all of Chaucer and Shakespeare, and Spenser's Faerie Queene. The section on Provenance, Copyrights, and Licenses provides detailed information about the texts.
| | | | | |


Post a Comment

Links to this post:

Create a Link

<< Home