Sinking our Teeth into Metadata Improvement

Like many attempts at revisiting older materials, working with a couple dozen volumes of dental pamphlets started very simply but ended up being an interesting opportunity to explore the challenges of making the diverse range of materials held in libraries accessible to patrons in a digital environment. And while improving metadata may not sound glamorous, having sufficient metadata for users to be able to find what they are looking for is essential for the utility of digital libraries

Initial Steps

This project revisited some materials in the Dental Historic Collection, which is an online collection of over fifteen hundred volumes of “important and unusual dental publications.” The collection includes journals, books, and a wide range of self-published materials and industry publications, covering multiple countries and languages. When we first started with the dental pamphlet volumes, the charge was simple: determine the status of thirty bound volumes of dental pamphlets that were currently closed to patrons because we were not sure if any of the volumes were currently in copyright.  These volumes were only titled “Dental Pamphlets” and did not have a publication place or date listed in the catalog record, so the task was to look through the volumes and determine which of them were published long enough ago to no longer be in copyright, with special attention paid to materials published outside the US.

The Work

The catalog records for the volumes listed only the title “Dental Pamphlets,” so we did not have any information about the organization of the volumes or what particular pamphlets had been bound into each volume. Thus we determined that it would make the most sense to go through each pamphlet within each volume individually to find publication information.  Though these materials had been fully OCR’d, so publication information from the title pages should hypothetically be available through a text search, because the pamphlets covered such a wide range of years, publication locations, and languages, and thus diverse fonts and formatting within each volume, the OCR quality was highly variable -- especially for the title pages, where this information was contained.

Side-by-side comparison of title page and OCR for the English language pamphlet 'Aluminium and Aluminium Allows: Their Properties, Uses, and Methods of Working Them'

This is a sample of an English language pamphlet. Though not perfect, the OCR is relatively good -- especially the title, which is key for returning accurate search results. [Image][OCR]

Side-by-side view of title page and OCR for German language pamphlet 'Die Ursachen des ublen Mundgeruches'

This is a German language pamphlet from the same volume as the above English language pamphlet. The OCR engine appears to have been unable to parse the text here, and the results are largely incomprehensible. [Image][OCR]

Since we needed to manually check each pamphlet in each volume to determine publication information already, we decided that it would greatly improve access if the title and author for each pamphlet could also be made available to patrons -- this would help users find desired items much more easily than the generic title “Dental Pamphlets.” Extracting and making available the title, author, and publication information for each volume would allow us to determine appropriate access levels and improve findability for users. 

Previous view of dental pamphlets, volume 1

This is the old view of Volume 1 of the Dental Pamphlets. Pages are labelled only by number, and access is restricted to campus because of the undetermined publication information.

Current view of Dental Pamphlets, Volume 1

This is how Volume 1 of the Dental Pamphlets now appears. The title, author, and publication information for each individual pamphlet is provided, making it much easier for users to find and go directly to items of interest.

Once the metadata for each individual pamphlet had been gathered in a spreadsheet, we exported it as XML and spliced the new metadata into the existing XML for the volume using an XSLT process similar to one that had been developed previously to insert map transcriptions into the Michigan County atlases. This enhanced the encoding for the volumes from Level 1, a single division with the OCR per page for the whole volume, to Level 2, adding nested internal divisions with metadata as well as OCR.

The Implications 

Having materials scanned and available online has been extremely important for user access, especially with the ongoing pandemic limiting access to physical libraries. However, these pamphlets demonstrate some factors that can limit access to digital materials and difficulties that can arise in transferring the diversity of library materials into an online platform. Generally people think of libraries as a place for books, and online library platforms reflect this focus. While it is hardly a surprise that other types of materials are available through libraries as well, these items can easily get lost in the sea of books, technically available online but difficult for patrons to find or understand.

Books are meant to stand alone, with detailed item-level metadata -- each book has a title, author, and basic publication information. However, libraries also hold items like the dental pamphlets, which are not books in the truest sense. Each dental pamphlet volume is bound like a book and sits on the shelf like a book, but is actually a collection of smaller items, the majority of which are self-published -- what is known sometime as “bound ephemera”. Not only does each volume contain a variety of smaller self-contained items each with its own title, author, and publication information, because the materials were self-published, finding and deciphering that bibliographic metadata for each volume can be extremely challenging due to the wide range of formatting and conventions used.

While encountering an item like this on the shelf at a physical library can cause confusion for patrons, this is exacerbated by the digital environment. While bound volumes like these may not have as much contextual information around their collection and organization as can be found in a finding aid for archival materials, specialized subject material is generally held in an institution that focuses on certain subjects, collected intentionally by a subject specialist, and expert staff is available to field questions from users about the materials. Thus even though these items may be listed as “dental pamphlets” at a library or archives, a librarian or archivist would be able to help the user understand the context for the materials and gain a better sense of what they contain.

On the other hand, in a digital environment, these materials present challenges around both findability and understandability. While a user might be directed to these materials after a conversation with a reference librarian or while browsing related materials on the shelf, browsing in a digital environment is much more difficult. Materials are not always organized by subject like they are on the shelf -- items in our U-M Digital Collections are often organized alphabetically, for example -- making browsing harder to navigate. Organizing digital materials by subject requires that subject metadata be available for the material, and then properly digitized -- on the other hand, call numbers by default arrange books by subject. Users also tend to search for specific items and then interact with materials on the item-level, rather than looking at a shelf of related items, making it more difficult to find items with sparse descriptive or bibliographic metadata like the dental pamphlets. And while a user could certainly be directed to digital materials through the Ask-a-Librarian feature available through many library websites, when a user has discovered an item on their own, it can be less clear who they should direct questions to. Most online platforms have a helpdesk email, but these are often more oriented towards technical help rather than subject specialists.

There is often an assumption that putting an item online automatically makes the item available to a wider audience. And while it is true that the item is now available to patrons who would not have the opportunity to visit the physical institution, we can see with the dental pamphlets that just because an item is online, it can still be difficult for users to understand and even to find it at all. So while improving metadata may not be a glamorous process, robust and detailed metadata is essential for users to actually be able to find what they are looking for in digital libraries. For example, by adding the title, author, and publication information for each pamphlet, we made those terms discoverable to users through search queries, both through the local search interface for the specific digital collection and through search engines like Google for public collections. Staff are able to help patrons with navigating and understanding context, especially for specialized or nonconventional materials, and it is important to explore how to make sure that not only the materials themselves but also the context necessary to make sense of those materials is appropriately transferred into the digital environment.

Sample Items from the Dental Pamphlets

While extracting the relevant metadata from each pamphlet was something of a long process, there was a lot of interesting material in the collection. Here are a small sample, ranging from a thesis on mouth development in amphibians to a dentist's attempt to encourage excitement for dental hygiene in the general public.

Title page for Civilization not the Cause of Tooth Decay, by John J. R. Patrick

Civilization not the Cause of Tooth Decay, by John J. R. Patrick

Image of amphibian embryo development

An experimental study of the development of the mouth in the amphibian embryo, by Amy Elizabeth Adams

Title page from the pamphlet "Greatest Discovery of the Age!"

Greatest Discovery of the Age! Human Teeth Can Be Rendered as Durable as Fingers and Toes! A Plain and Complete Explanation of the Process, by Gilbert E. Corbin

Page from 'Good Teeth' discussing tooth development with images of various zoo animals with open mouths

Good Teeth: How They Grow & How to Keep Them, by SS. White Dental Manufacturing Co.

Thanks!

I would like to extend a special thank you to Chris Powell for her technical and contextual help with the dental pamphlets, and especially for improving my understanding of why and how this digital collection was created in the way that it was. I would also like to thank my supervisor Kat Hagedorn and project manager Lauren Havens for their patience and responsiveness throughout this process, always fielding my questions in investigating the ancient mysteries of older digital collections .