The Horrors of Good Intentions: Told through the story of a dark repository.

Thanks to colleagues who provided feedback to ensure this post correctly represents consensus opinion and who join me in sharing the lessons we have learned

The library manages a dark repository, named Dark Blue because I have no imagination, for material needing preservation but not public access such as preservation copies of digitized moving image and in-process born-digital material. You can read more about the implementation of this repository in this 2018 post. It is fair to say that Dark Blue had some growing pains over these past few years that include incorrect packaging of material and broken deposit and withdrawal workflows. While these sound like technical problems, the thesis of this post is that our troubles with Dark Blue are not based on bad systems or policies, but the limitation of people and time, and choosing to do the “nice” thing over the realistic thing. 

Backstory Dark Blue is a partnership between the Digital Preservation Unit and the repository’s development team in the library’s IT division (LIT). It was developed with a lightweight application layer handling ingest and such placed over a storage service. I am the Dark Blue service manager working to identify and develop requirements for different content types. I also work with the development team to ensure technical infrastructure is developed and implemented in line with the needs of the service. While the development team and I were already quite busy, establishing a dark archive was clearly a need for several important and emerging areas. Providing dedicated storage for audio/moving image digitization and digital archaeology would be a big step forward for those programs. None of us wanted to say “no” and be roadblocks to that work, so we decided that a “simple” repository design with a small number of content types would be doable, even with our limited bandwidth. I mean, what could go wrong?     

Dark repository clouds gather At first, things were going along swimmingly. Dark Blue was ingesting material, the application layer was working well, and staff was convinced that we could handle ongoing repository management as part of our ever-growing portfolio of work. However, after the system became stressed due to the size of digitized moving image files, problems arose with our ingest processes. This required an increased level of attention from our already very busy development team. When staff familiar with the operation of Dark Blue left the library, this situation only worsened. While the things already in storage were warm and cozy (i.e. safely stored and backed up), ultimately, ingest of almost all content types was frozen. This caused a ripple effect within the programs depending on Dark Blue for storage. For example, our digital archaeology work had to shift to a temporary, non-preservation storage solution. Eventually, the entire program was paused, including the hiring of interns who usually do the work of disk imaging and processing, creating a backlog that still exists today. These problems also led to pausing the development of workflows for all new content types. One of those new content types had to be rerouted to Dropbox. *shudders*

Things go from bad to @#$%&! As if all that was not gruesome enough, we found out through a partnership with an external organization that many of the packages stored in Dark Blue have incorrect metadata. Most of these errors are in vendor-generated METS files that were not properly validated when they were received. If there is something worse than finding out a bunch of stuff is wrong, it's finding it out through an external partner. Trust me on this one. 

Why oh why It is my firm belief that the trials and tribulations of Dark Blue are because those of us that were involved were just too darn nice to say “no,” or even “wait a bit for us to come up with a manageable solution.” I wanted Dark Blue to provide a service that would help my colleagues preserve their important work, even though I did not have the bandwidth to be more involved in the development and management of the solution. This is especially apparent in my failure to be more proactive in the development of validation processes. My colleagues in the development team were too nice to say they didn’t have time to fully support the solution as proposed. Things like the lack of troubleshooting time and documentation to transfer knowledge during staff turnover ultimately swamped the ship. The ship being Dark Blue… you get it. 

Onward Things have gotten better. We are currently incorporating a vended solution that will allow the Digital Preservation Unit, which has recently added a librarian, to have more hands-on management of things like ingest. This approach will give our partners on the development team a more reasonable responsibility for the system. We are also working on approaches to correct the errors in the package metadata. More info on how we are doing that will be posted on this blog, but we felt it was important to discuss the backstory and the primary principle of this work going forward: We must develop systems and responsibilities that are sustainable and able to adapt when things go wrong. Starting Dark Blue with a more realistic assessment of capacity would have likely led us to the solution we are at now, and with a lot fewer mistakes to correct.