Our Journey Towards Implementing a New Finding Aids Solution

Summary

After the successful launch of our ArcLight finding aids application on April 19, 2023 - and the deprecating of our homegrown Digital Library eXtension Service (DLXS) finding aids application - we are sharing our reflections on the project with the wider community.

This blog post will describe the history of finding aids at the University of Michigan Library and what led us to develop the ArcLight finding aids application, starting in earnest in 2020. We will describe our goals for the project, the organization of the development team, and the modifications that we needed to make to effectively complete the project. We will give an overview of what a finding aids application does, and why we decided to use ArcLight as well as Docker and Kubernetes as our new containerization and hosting solution. We will discuss what was advantageous to us for this project as well as what was particularly challenging, and sum up what we learned from our archives partners and end-users, throughout the project.

You may also be interested in reading our original blog post about the ArcLight project from December 2022.

Background and history

The University of Michigan Library has a long history of working with finding aids, or archival collection guides, in the Encoded Archival Description (EAD) schema. In the fall of 1997, we gave our first demonstration of an EAD access system, created in conjunction with the Bentley Historical Library, to members of the campus library and archives community. Our collaboration with the Bentley grew with their creation of encoding guidelines, distribution of finding aid templates, and training for archives students and colleagues. They were also providing feedback to drive our work to make improvements to our homegrown Digital Library eXtension Service (DLXS) finding aids application. Slowly but surely, that original demonstration system became a production website hosting thousands of finding aids from partners across campus and across the state.

While DLXS provided many useful features and served us well for decades, it had the limitations that come with being a decades-old application. In 2017, we joined a community development effort led by Stanford to adapt the popular BlackLight discovery application for use with EAD-encoded finding aids. It was dubbed ArcLight. Other institutions, including Duke University, Indiana University, and Princeton University, joined for another round of work in 2019.

Beginning the work in earnest

In 2020, we began designing what would become our local ArcLight-based replacement for our DLXS finding aids application.

We wanted a solution that allowed end-users to search across all collections more easily; our DLXS application divided finding aids by archive and cross-collection searching was an almost-hidden function.
We wanted to build an application for the discovery of archival materials described in finding aids that, in particular, would allow end-users to search broadly and then narrow their results to items from a specific genre, time period, collection, or repository.
We wanted to make it easier to locate and view digital archival content described in finding aids.
We wanted to offer a modern, accessibility-compliant, mobile-friendly interface with features meeting or exceeding those of DLXS.

We began software development in earnest in the fall of 2022 using Duke’s implementation of ArcLight. We initially underestimated the challenge in front of us. Though we knew we wanted to bolster the underlying hosting infrastructure, this was more work than anticipated. There was also more application work necessary, to round out the functionality and meet archives partners’ expectations. We expected data issues, but there were surprises there, too. Once it became clear that more resources and time would be needed, we reprioritized our projects at the division level, and rallied around ArcLight.

The entire development effort took us six months, a not unreasonable timeline. The team of people who did this work all reside in the Library Information Technology (LIT) division of the University of Michigan Library. They included a product owner (Chris Powell), two project managers (John Weise, Kat Hagedorn), one user experience & testing specialist (Robyn Ness), one front-end developer (Bridget Burke), and four software developers (Anthony Thomas, Noah Botimer, Greg Kostin and Roger Espinosa). Two of those developers managed the development of the infrastructure noted above, and two of those developers managed feature development. In total, we needed a team of nine people to complete the work. Three were allocated to the project full-time, and others ranged from 25-75%.

In addition, we relied extensively on our archives partners to guide our development, which we will highlight below.

The expansion of our development team enabled us to dedicate effort to implementing our newest ideas about how to create infrastructural underpinnings Docker and Kubernetes to robustly support the hosting of the application for development, deployment, data flow, and operation. There was a steep, and time consuming, learning curve to that work that will pay off as the model is applicable to other applications’ architectures. This was considerably more work than we planned for at the outset, but ArcLight is becoming a shining example for how we want to host applications going forward.

In early November 2022, we released the beta ArcLight site for archives partners to begin working with. That beta site became our production launch in February 2023, and we turned off the DLXS finding aids application on April 19, 2023. You can see our fully launched implementation of ArcLight at https://findingaids.lib.umich.edu/.

What a finding aids application does

Finding aids are complex documents that serve multiple purposes for the users of archival collections as well as for the archivists, curators, and librarians managing those collections. They combine descriptive metadata about the collection as a whole with that for individual pieces of the collection, along with technical metadata about the size and composition of the collection and administrative metadata about where the material is stored and whether there are restrictions on access and use. There can also be long prose passages describing the creation of the collection, bibliographies related to the collection, and links to digital materials, whether born digital or digitized.

But digital materials are generally only a very small part of any collection, and metadata rarely describes each individual item in a collection. The primary purpose of a finding aid is to guide users to the physical material within the archive or library for their own exploration into the contents. This means it must also provide information about how to make requests to access the material (sometimes involving integration with online requesting software) and how to schedule visits, and allow for links to directions and policies for visitors.

Along with search and retrieval expected of other digital platforms, the finding aids application has to perform these various functions – many of which are specific to individual repositories – without overwhelming the users or being too simple to allow archivists and librarians to do their work with the collections. The challenge is in presenting the right amount of information at the right time, with opportunities to see the larger context or finer detail as needed.

Why we decided to use ArcLight

When we started our investigation into replacing DLXS, we considered other finding aids applications, including the ArchivesSpace Public User Interface. There are exceedingly few options, which contributed to the long life of our DLXS finding aids application. Despite some attractive features, only two of our seven partners use ArchivesSpace for finding aids creation and so this was quickly ruled out.

Our involvement in the ArcLight community gave us a sense of what adoption would entail and familiarity with the community of open-source developers. ArcLight uses Blacklight, which we use for other library systems. Since a number of different institutions had implemented ArcLight, we were able to demonstrate different approaches to customizing the underlying software. The mobile-friendly interface and the embedding of digital objects were especially popular with our archival partners.

ArcLight's compliance with current accessibility standards made it even more compelling after the adoption of University of Michigan’s SPG 601.20, Electronic and Information Technology Accessibility, in June 2022. One of our team members is leading an effort at the division level to test compliance in all our products, and we were fortunate for this product to be the first to receive this testing and, subsequently, to fix all violations and warnings we found before launch.

Why we decided to use Docker and Kubernetes

LIT has a diverse technology portfolio assembled over more than two decades, and is working to modernize and normalize all aspects. We are migrating towards Kubernetes as the preferred application hosting (container orchestration) environment, in line with general industry trends. Docker is our standard choice for developing in and building containers.

ArcLight presented an opportunity to significantly advance our use of these technologies and create a robust infrastructure for the application. We also employed Argo CD to deploy new releases, Prometheus to collect metrics, Yabeda to export metrics, Loki to collect logs, and Grafana to build and view dashboards and inspect metrics and logs. The Rook and Solr operators give us managed storage and search index support with high availability.

Despite the breadth of technologies and techniques that are relatively new to LIT, we are integrating them in a principled way to hew towards proven, well-adopted approaches and specialize only where library systems have genuinely unique requirements. We believe that this, in the long term, will give us better sustainability across the portfolio and equip our staff with transferable skills while easing maintenance, hiring, and onboarding complexity.

What we discovered (i.e., what was advantageous and what was particularly challenging)

It’s likely obvious, but an endeavor of this scope and scale requires many different moving pieces. It takes a lot of effort to bring together the pieces needed to launch a successful project. To be frank, we initially underestimated the degree to which our requirements for online finding aids, driven by a diverse and engaged group of archival partners, would differ from other implementations of ArcLight.

One of the most important advantages we had was support at the Library Information Technology division level to make this a priority project for the six months duration, and to utilize the resources of the division, even at the expense of other projects. We also had to extend the project, as originally designed, two more months, and the division supported that extension.

Another advantage was the Bentley's early leadership in creating EAD encoding guidelines, along with the number of University of Michigan School of Information graduates throughout the state, meant that there was consistency in the finding aid encoding practices from the seven different repositories. EAD is a schema that allows for a great amount of variability and we were fortunate to not have to deal with that. Structurally, our finding aids followed the same practice; they all contained numbered nested components, for example, and tended to treat optional sections the same way. We have heard from other institutions looking at ArcLight that this is not the case for them, and this creates the need for additional data analysis and interface testing.

We knew from a review of other institutions’ ArcLight systems that beginning with Duke’s implementation placed us in a much better starting position to build the functionality needed to achieve our goal of retiring DLXS. However, we did not fully realize the scope of the work ahead until we began exploring and discussing the features and functionality of our original beta site implementation with our archives partners. After early feedback, we performed an idea generation exercise with the partners - and then on our own! - to understand the scope and priorities of the requests. Plotting a course for bug fixes and enhancements also required the input of the entire development team, of course. This process of discussing and negotiating feasibility and priority of requested changes took time and, while necessary, it was time we had not budgeted for.

We performed a baseline accessibility evaluation on the most complex pages and global user interface components to identify critical violations under WCAG 2.1 A and AA standards. The extensive automated and manual testing, which included testing with keyboard and screen reader, resulted in the discovery of a number of critical violations, both in the base ArcLight implementation and our customizations.

Tuning the presentation and navigation of the ArcLight interface involved iterations between archives partners, end-user interviews, and performance testing. Typically, ArcLight presents collection description, containers, and access as a tabbed interface; the containers were presented as an expandable outline of titles. Building off Duke's implementation these elements are presented stacked without tabs. Duke's approach to navigating the content hierarchy was to present the expandable outline in the page sidebar. However, local testing with archivists revealed that they weren't connecting with the sidebar and were frustrated with the amount of clicking involved to explore finding aids with nested containers. To address this, we adapted the main content listing to be expandable, including automatically expanding if there's only one child component.

We also learned that end-users were taking advantage of DLXS functionality to make the finding aid more portable, either by saving the complete EAD as HTML or printing that to PDF. To address this need we opted to generate a PDF version of the full EAD during indexing workflow to be available to users as a download. The size and variety of our EADs presented several technical challenges in finding a solution that offered an acceptable PDF using an acceptable amount of computing resources.

Additionally, in implementing the Kubernetes infrastructure, team members worked through many decisions related to system architecture, deployment workflow, and hosting environments for preview/testing and production web sites. They encountered multiple hurdles as they did this work, primarily because we were using tools and methods that were new to us, and learning along the way. We didn’t account for the time this would take at the outset and it was challenging to adjust mid-project.

What the archives partners and end-users taught us

Our archives partners actively tested and shared many notes throughout the development process, from the moment a beta site was offered through the official launch. Their expertise in archival research and knowledge of specific collections with individual quirks was invaluable to our ongoing improvement, especially when dealing with a variety of content in the fluid XML-based format of EADs. The project team and archives partners frequently met together in project update and demo sessions and occasional drop-in testing sessions, spurring important live discussion about how they and their end users interacted with online finding aids and archival materials.

Additionally, having such an engaged group of testers surfaced many system-level issues. For instance, a few weeks before our official launch date, a sharp-eyed partner archivist discovered an unexpected, unintuitive behavior with ArcLight’s “Browse Repositories” features that limited the scope of searches to collection-level metadata (versus item-level), which artificially narrowed the results returned. Before launch, our developers were able to adjust this behavior to better support our multi-repository site.

In addition to the expert feedback and quality testing provided by our partners, two rounds of usability testing were performed. The first round, on an in-progress version of the site, involved experienced users of finding aids. The second round, much closer to launch, focused on students and new users who had less awareness and practice with using finding aids. During these rounds of testing, people were generally able to navigate the site, search results, and finding aids, though there were questions about the scope of content our site included and, for novices, about the nature and nuances of finding aids. Additionally, test participants really appreciated results filters (especially for date ranges), the ability to download a PDF, and the option to view related digital content.

Where we ended up

We are pleased with the end result! We’ve had excellent feedback from our archives partners, who were, as described, in close collaboration with us along the way. They are eager to utilize the new features of this application, and have promised to bring us questions, ideas - and bug reports! - as they continue to use it. We expect to revisit future feature development in our division’s planning process shortly.

Stanford University has recently released ArcLight version 1.0, and we will also be exploring how to align our implementation with that release. That is planned for later this year.

Some work is ongoing - such as developing documentation for archives partners and end-users, and improving our help text - and we will continue to work on that as part of our operational work. We are also still refining the workflow for loading and previewing new and updated finding aids each month.

We could not have done this without strong support at the divisional level - especially Bohyun Kim and Sebastien Korner - our archives partners - especially representatives from the Bentley Historical Library, William L. Clements Library, and Clarke Historical Library at Central Michigan University - and of course the entire development team - Anthony, Bridget, Chris, Greg, John, Kat, Noah, Robyn, and Roger. Thank you to everyone involved!