GenAI@BC Libraries: The Basics – Boston College Libraries News

The Task Force

As we know too well by now, developments in GenAI are so fast that today’s pronouncements can be stale by the end of the week. Librarians are by nature thorough and deliberative, so how do we respond in this quickly changing environment?

We assemble a task force. Here are the task force’s goals:

understand the basics of GenAI tools in the research process,
understand best use cases by scholars and researchers,
understand ethical and legal issues,
uncover other areas of concern, and
prepare documentation and workshops for the BC community

With the ground shifting so quickly, we recognize there is no way to provide definitive answers in any of the above categories. This blog article is the first in a series we’re calling “GenAI@BCLibraries,” which will share our task force members’ encounters with GenAI tools and learning experiences to serve broadening conversations about how our uses of GenAI at BC are evolving.

Generative AI and LLM’s

First of all, let’s get some definitions out of the way. We’re not trying to learn everything about all of AI: that’s much too broad a category. We’re concerned primarily with the Generative AI (GenAI) tools–especially chat–that are based on Large Language Model (LLM) training, and are most commonly used. Here at BC, the tools ITS has approved for data security are Microsoft Copilot, Google Gemini, and Notebook LM¹. Many students are also using ChatGPT, both free and paid versions, and any number of other proliferating tools.

These tools are cloud-based and trained on Large Language Models (LLM’s): huge datasets of language harvested from the open web and other sources such as book and newspaper publishers. (Companies are mute about the full range of data, leading to suspicions that some have circumvented copyright law). Training a GenAI tool consists of turning all the language data in the LLM into numeric data, and running statistical modeling that “teaches” complex algorithms to predict what “words” (numeric representations of words) are most likely to follow others. For more detail, here’s a good primer produced by the UK digital education nonprofit Jisc. Though we’ll cover ethics in later articles, it’s important to note that a significant amount of contingent human labor in the global south is involved in this LLM training.

Many AI products continuously refine their predictive models by asking users whether answers are satisfactory, and whether they can use your responses for continued training. Many tools (such as ChatGPT) also incorporate anything you upload–data, your own writing, articles, etc.–into its LLM text corpora. ITS has only signed contracts with providers that claim not to automatically ingest uploaded information. It’s important to remember that any AI tool does not understand language; it only gives a very convincing approximation of understanding language. One heavily cited conference paper characterized GenAI chat tools as “stochastic parrots”.

NotebookLM is somewhat different, insofar as its initial training may be on an LLM, but then applies its predictive text generation to smaller local datasets created by users, such as, say, a professor creating a dataset of all the articles students are assigned in a class; when students enter prompts, it creates answers based on the local dataset, not the LLM. These local-data tools both avoid the risk of your own data being absorbed into the LLM corpus and tend to have fewer accuracy problems.

Embedding in library tools

Database vendors that serve libraries, such as ProQuest, JSTOR, Scopus, and Clarivate, are beginning to add this latter type of AI Chat tool that queries their own database data. As of this writing, ProQuest is piloting its “Research Assistant” in some databases. JSTOR also offers its pilot chat tool by request. BC Libraries is watching these developments carefully. While some vendors offer pilots as free extensions of existing contracts, others, like Clarivate (Web of Science), have announced products that can be added for additional fees. Our library search engine, Primo, has also introduced AI chat as an optional add-on. The non-profit organization Ithaka (who brings us JSTOR) has developed a continually updated list of Generative AI products with brief descriptions.

Supporting faculty

In all ways but the pace of development, our response to these tools is the same as it ever was: we experiment with them, we research the affordances and limits and how other universities and libraries are using them, we gain familiarity, and we partner with faculty to understand your needs and concerns and facilitate your research and teaching. We look forward to hearing from you.

Author’s note about the images: The point isn’t so much that Google Gemini is poor at image generation, but that a novice user might have to experiment quite a lot to get anything like satisfactory results. Often, creating effective GenAI prompts is more an art than a science.

At the time of this writing, use of Notebook LM is limited to the standard version, and is intended for exploration, as it’s not a core Google service & could be removed or suspended by Google. IOW, don’t design necessary functions around it. ↩︎

The Task Force

Generative AI and LLM’s

Embedding in library tools

Supporting faculty

Published by Steve Runge