Almost one year ago (May 2015) Doyle Calhoun, an honors student and senior in the Departments of Slavic & Eastern Languages and Literatures, and Romance Languages and Literatures majoring in Linguistics and French, approached the Digital Scholarship Group at the Boston College Libraries with an idea for a digital project, which would stem from his work on his senior thesis under his advisor Margaret Thomas. He was interested in analyzing a set of 24, mid-19th to early-20th century, dictionary/grammars compiled by French Catholic missionaries working in North and Central Africa. These texts are held in the Nicholas M. Williams Ethnological Collection of the John J. Burns Library at Boston College. They document over 18 different African languages (such as Banda, Koongo, Tachawit, and Wolof) and aimed to record the grammatical and lexical facts of indigenous languages.
In order to make the texts accessible and to use them in the context of a digital project, the first step was to digitize them and make them available online. The selection/review and digitization of texts was overseen by the Digital Library Program and Burns Library staff. One of the project goals was to create new data about the texts, which would be very valuable to scholars and students interested in linguistics and missionary grammars. Another goal was to make this data accessible and searchable, as well as to provide a medium for scholars, students, and other interested users to interact and use the data along with the digitized texts. Achieving these goals required a number of approaches, methods, and technologies as well as combined expertise and time from many people.
Making the data accessible and searchable required that we first create metadata about each primary source, which would later be ingested into a database (PostgreSQL) and presented in an HTML/Ruby environment. Calhoun worked with Nina Bogdanovsky (subject liaison to Linguistics) and Anna Kijas (Digital Scholarship Librarian) to identify bibliographic and target language metadata, identify target languages designated by the primary source and corresponding modern-day designations, and to categorize texts according to genre and subgenre. In addition, Calhoun identified the directionality for texts that were vocabularies or dictionaries and mapped each text’s organizational structure based on the hierarchical ordering in the primary source.
From this data, we were able to identify several prosopography opportunities, which led to the creation of an author-ography (data about each missionary), org-ography (data about each missionary order or organization), and a publication gazetteer (data about each text’s place of publication). In collaboration with Kijas, Calhoun text encoded the prosopographies according to the TEI P5 standards. These can be viewed as XML files on the project site, as well as in the TAPAS repository, which preserves and provides access to TEI XML documents. Several other text samples were also encoded, including the preface and a dictionary entry from Dictionnaire Français-Wolof et Wolof-Français. The purpose of encoding this data and text samples was two-fold: first it enabled a close reading of the content and provided a learning opportunity for Doyle, and secondly it acts as a proof-of-concept that could be at a later point applied to the entire corpus in order to render the TEI XML files with added linguistic analyses, cross-references, and visualizations of the text. Ben Florin (Web Developer) developed a PostgreSQL database running on a Ruby on Rails framework, which pulled in the metadata and digitized texts. A reader was created to display the texts alongside the hierarchy (linking to corresponding sections in the text) and metadata. The database can be explored by texts and language groups or through a search interface.
The final part of this project required creating an HTML site where all of the content could be accessed and searched. We modified an existing HTML5up theme (Helios) for this purpose and developed content, narrative, and documentation about this project, as well as connected it with the data. All of the code and files were uploaded onto GitHub where we continued to work collaboratively and without risk of accidently deleting or writing over each other’s work. All of the technical infrastructure (see Project Praxis) and code used and developed for this project is open source and the project is licensed under a CC-BY-NC license.
This project is a great example of the type of expertise, collaboration and support that the staff at the Boston College Libraries can provide for digital scholarship work. In addition, this opportunity provided Calhoun with new skillsets, including understanding and application of XML and TEI, MarkDown and GitHub, as well as visualization tools for mapping his data. It also exposed him to the aspects of project management as well as project planning, and development, which will be invaluable to him as he embarks on future digital activities beyond Boston College.